Skip to content
Snippets Groups Projects
  1. Feb 22, 2019
  2. Feb 20, 2019
    • Steve Wise's avatar
      RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support · 3856ec4b
      Steve Wise authored
      
      Add support for new LINK messages to allow adding and deleting rdma
      interfaces.  This will be used initially for soft rdma drivers which
      instantiate device instances dynamically by the admin specifying a netdev
      device to use.  The rdma_rxe module will be the first user of these
      messages.
      
      The design is modeled after RTNL_NEWLINK/DELLINK: rdma drivers register
      with the rdma core if they provide link add/delete functions.  Each driver
      registers with a unique "type" string, that is used to dispatch messages
      coming from user space.  A new RDMA_NLDEV_ATTR is defined for the "type"
      string.  User mode will pass 3 attributes in a NEWLINK message:
      RDMA_NLDEV_ATTR_DEV_NAME for the desired rdma device name to be created,
      RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link being added, and
      RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface to use for this
      link.  The DELLINK message will contain the RDMA_NLDEV_ATTR_DEV_INDEX of
      the device to delete.
      
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      3856ec4b
    • Jason Gunthorpe's avatar
      RDMA/rxe: Close a race after ib_register_device · ca22354b
      Jason Gunthorpe authored
      
      Since rxe allows unregistration from other threads the rxe pointer can
      become invalid any moment after ib_register_driver returns. This could
      cause a user triggered use after free.
      
      Add another driver callback to be called right after the device becomes
      registered to complete any device setup required post-registration.  This
      callback has enough core locking to prevent the device from becoming
      unregistered.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      ca22354b
    • Jason Gunthorpe's avatar
      RDMA/device: Provide APIs from the core code to help unregistration · d0899892
      Jason Gunthorpe authored
      
      These APIs are intended to support drivers that exist outside the usual
      driver core probe()/remove() callbacks. Normally the driver core will
      prevent remove() from running concurrently with probe(), once this safety
      is lost drivers need more support to get the locking and lifetimes right.
      
      ib_unregister_driver() is intended to be used during module_exit of a
      driver using these APIs. It unregisters all the associated ib_devices.
      
      ib_unregister_device_and_put() is to be used by a driver-specific removal
      function (ie removal by name, removal from a netdev notifier, removal from
      netlink)
      
      ib_unregister_queued() is to be used from netdev notifier chains where
      RTNL is held.
      
      The locking is tricky here since once things become async it is possible
      to race unregister with registration. This is largely solved by relying on
      the registration refcount, unregistration will only ever work on something
      that has a positive registration refcount - and then an unregistration
      mutex serializes all competing unregistrations of the same device.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      d0899892
    • Jason Gunthorpe's avatar
      RDMA/device: Add ib_device_get_by_netdev() · 324e227e
      Jason Gunthorpe authored
      
      Several drivers need to find the ib_device from a given netdev. rxe needs
      this at speed in an unsleepable context, so choose to implement the
      translation using a RCU safe hash table.
      
      The hash table can have a many to one mapping. This is intended to support
      some future case where multiple IB drivers (ie iWarp and RoCE) connect to
      the same netdevs. driver_ids will need to be different to support this.
      
      In the process this makes the struct ib_device and ib_port_data RCU safe
      by deferring their kfrees.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      324e227e
    • Jason Gunthorpe's avatar
      RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev · c2261dd7
      Jason Gunthorpe authored
      
      The associated netdev should not actually be very dynamic, so for most
      drivers there is no reason for a callback like this. Provide an API to
      inform the core code about the net dev affiliation and use a core
      maintained data structure instead.
      
      This allows the core code to be more aware of the ndev relationship which
      will allow some new APIs based around this.
      
      This also uses locking that makes some kind of sense, many drivers had a
      confusing RCU lock, or missing locking which isn't right.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      c2261dd7
  3. Feb 19, 2019
  4. Feb 15, 2019
  5. Feb 08, 2019
    • Jason Gunthorpe's avatar
      RDMA/devices: Re-organize device.c locking · 921eab11
      Jason Gunthorpe authored
      
      The locking here started out with a single lock that covered everything
      and then has lately veered into crazy town.
      
      The fundamental problem is that several places need to iterate over a
      linked list, but also need to drop their locks to avoid deadlock during
      client callbacks.
      
      xarray's restartable iteration offers a simple solution to the
      problem. Once all the lists are xarrays we can drop locks in the places
      that need that and rely on xarray to provide consistency and locking for
      the data structure.
      
      The resulting simplification is that each of the three lists has a
      dedicated rwsem that must be held when working with the list it
      covers. One data structure is no longer covered by multiple locks.
      
      The sleeping semaphore is selected because the read side generally needs
      to be held over something sleeping, and using RCU reader locking in those
      cases is overkill.
      
      In the process this simplifies the entire registration/unregistration flow
      to be the expected list of setups and the reversed list of matching
      teardowns, and the registration lock 'refcount' can now be revised to be
      released after the ULPs are removed, providing a very sane semantic for
      this feature.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      921eab11
    • Jason Gunthorpe's avatar
      RDMA/devices: Use xarray to store the client_data · 0df91bb6
      Jason Gunthorpe authored
      
      Now that we have a small ID for each client we can use xarray instead of
      linearly searching linked lists for client data. This will give much
      faster and scalable client data lookup, and will lets us revise the
      locking scheme.
      
      Since xarray can store 'going_down' using a mark just entirely eliminate
      the struct ib_client_data and directly store the client_data value in the
      xarray. However this does require a special iterator as we must still
      iterate over any NULL client_data values.
      
      Also eliminate the client_data_lock in favour of internal xarray locking.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      0df91bb6
    • Jason Gunthorpe's avatar
      RDMA/devices: Use xarray to store the clients · e59178d8
      Jason Gunthorpe authored
      
      This gives each client a unique ID and will let us move client_data to use
      xarray, and revise the locking scheme.
      
      clients have to be add/removed in strict FIFO/LIFO order as they
      interdepend. To support this the client_ids are assigned to increase in
      FIFO order. The existing linked list is kept to support reverse iteration
      until xarray can get a reverse iteration API.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      e59178d8
    • Jason Gunthorpe's avatar
      RDMA/device: Get rid of reg_state · 652432f3
      Jason Gunthorpe authored
      
      This really has no purpose anymore, refcount can be used to tell if the
      device is still registered. Keeping it around just invites mis-use.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      652432f3
    • Leon Romanovsky's avatar
      RDMA: Handle PD allocations by IB/core · 21a428a0
      Leon Romanovsky authored
      
      The PD allocations in IB/core allows us to simplify drivers and their
      error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand
      in had with relevant update in .dealloc_pd().
      
      We will use this opportunity and convert .dealloc_pd() to don't fail, as
      it was suggested a long time ago, failures are not happening as we have
      never seen a WARN_ON print.
      
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      21a428a0
    • Leon Romanovsky's avatar
      RDMA/core: Share driver structure size with core · 30471d4b
      Leon Romanovsky authored
      
      Add new macros to be used in drivers while registering ops structure and
      IB/core while calling allocation routines, so drivers won't need to
      perform kzalloc/kfree in their paths.
      
      The change in allocation stage allows us to initialize common fields prior
      to calling to drivers (e.g. restrack).
      
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      30471d4b
    • Daniel Jurgens's avatar
      IB/core: Don't register each MAD agent for LSM notifier · c66f6741
      Daniel Jurgens authored
      
      When creating many MAD agents in a short period of time, receive packet
      processing can be delayed long enough to cause timeouts while new agents
      are being added to the atomic notifier chain with IRQs disabled.  Notifier
      chain registration and unregstration is an O(n) operation. With large
      numbers of MAD agents being created and destroyed simultaneously the CPUs
      spend too much time with interrupts disabled.
      
      Instead of each MAD agent registering for it's own LSM notification,
      maintain a list of agents internally and register once, this registration
      already existed for handling the PKeys. This list is write mostly, so a
      normal spin lock is used vs a read/write lock. All MAD agents must be
      checked, so a single list is used instead of breaking them down per
      device.
      
      Notifier calls are done under rcu_read_lock, so there isn't a risk of
      similar packet timeouts while checking the MAD agents security settings
      when notified.
      
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      c66f6741
    • Daniel Jurgens's avatar
      IB/core: Eliminate a hole in MAD agent struct · 805b754d
      Daniel Jurgens authored
      
      Move the security related fields above the u8s to eliminate a hole in the
      struct.
      
      pahole before:
      struct ib_mad_agent {
      ...
      u32                        hi_tid;               /*    48     4 */
      u32                        flags;                /*    52     4 */
      u8                         port_num;             /*    56     1 */
      u8                         rmpp_version;         /*    57     1 */
      
      /* XXX 6 bytes hole, try to pack */
      
      /* --- cacheline 1 boundary (64 bytes) --- */
      void *                     security;             /*    64     8 */
      bool                       smp_allowed;          /*    72     1 */
      bool                       lsm_nb_reg;           /*    73     1 */
      
      /* XXX 6 bytes hole, try to pack */
      
      struct notifier_block      lsm_nb;               /*    80    24 */
      
      /* XXX last struct has 4 bytes of padding */
      
      /* size: 104, cachelines: 2, members: 14 */
      ...
      };
      
      pahole after:
      struct ib_mad_agent {
      ...
      u32                        hi_tid;               /*    48     4 */
      u32                        flags;                /*    52     4 */
      void *                     security;             /*    56     8 */
      /* --- cacheline 1 boundary (64 bytes) --- */
      struct notifier_block      lsm_nb;               /*    64    24 */
      
      /* XXX last struct has 4 bytes of padding */
      
      u8                         port_num;             /*    88     1 */
      u8                         rmpp_version;         /*    89     1 */
      bool                       smp_allowed;          /*    90     1 */
      bool                       lsm_nb_reg;           /*    91     1 */
      
      /* size: 96, cachelines: 2, members: 14 */
      ...
      };
      
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      805b754d
    • Steve Wise's avatar
      RDMA/iwcm: add tos_set bool to iw_cm struct · 926ba19b
      Steve Wise authored
      
      This allows drivers to know the tos was actively set by the application.
      
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      926ba19b
    • Danit Goldberg's avatar
      IB/cma: Define option to set ack timeout and pack tos_set · 2c1619ed
      Danit Goldberg authored
      
      Define new option in 'rdma_set_option' to override calculated QP timeout
      when requested to provide QP attributes to modify a QP.
      
      At the same time, pack tos_set to be bitfield.
      
      Signed-off-by: default avatarDanit Goldberg <danitg@mellanox.com>
      Reviewed-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      2c1619ed
  6. Feb 05, 2019
  7. Feb 04, 2019
  8. Jan 31, 2019
Loading