https://github.com/torvalds/linux

sort by:
Revision Author Date Message Commit Date
47fb3c1 i40iw: Prevent multiple netdev event notifier registrations Netdev event notifier registration/de-registration is not synchronized with a lock and there is a possibility of a duplicate registration of notifier before the unregister completes. Register netdev event notifiers during module init and de-register them at module exit. This avoids the need to tie the registration to first netdev client interface open and de-registration to last client interface close and the synchronization to achieve it. This also fixes a crash due to duplicate registration. BUG: unable to handle kernel paging request at ffffffffa0d60388 IP: [<ffffffff8160f75d>] notifier_call_chain+0x3d/0x70 PGD 190d067 PUD 190e063 PMD 76c840067 PTE 0 Oops: 0000 [#1] SMP Modules linked in: i40e(OF-) fuse btrfs zlib_deflate raid6_pq xor vfat msdos [..] e1000e vxlan ip_tunnel ptp pps_core i2c_core video [last unloaded: i40iw] CPU: 1 PID: 27101 Comm: modprobe Tainted: GF W O-------------- 3.10.0-229.el7.x86_64 #1 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H, BIOS F7 01/17/2014 task: ffff88076e8a96c0 ti: ffff8806959c8000 task.ti: ffff8806959c8000 RIP: 0010:[<ffffffff8160f75d>] [<ffffffff8160f75d>] notifier_call_chain+0x3d/0x70 RSP: 0018:ffff8806959cbb38 EFLAGS: 00010282 RAX: ffffffffa0d60380 RBX: 00000000fffffffd RCX: 0000000000000000 0708] RDX: 0000000000000000 RSI: ffff88081227a000 RDI: 0000000000000002 RBP: ffff8806959cbb60 R08: 0000000000000246 R09: 000000000000700c R10: ffff88080e16ea40 R11: 00000000000ae8df R12: ffffffffa0d60380 R13: 0000000000000002 R14: ffff88076e738800 R15: 0000000000000000 FS: 00007f604ef4a740(0000) GS:ffff88083e240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa0d60388 CR3: 0000000753cd2000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffff819e73a0 0000000000000000 0000000000000002 ffff88076e738800 00000000ffffffff ffff8806959cbba0 ffffffff8109d61d 0000000000000000 0000000000000000 ffff88076e738800 0000000000000000 ffff88076e738800 Call Trace: [<ffffffff8109d61d>] __blocking_notifier_call_chain+0x4d/0x70 [<ffffffff8109d656>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff8156b9e4>] __inet_del_ifa+0x154/0x2b0 [<ffffffff8156d102>] inetdev_event+0x182/0x530 [<ffffffff8160f76c>] notifier_call_chain+0x4c/0x70 [<ffffffff8109d446>] raw_notifier_call_chain+0x16/0x20 [<ffffffff814f71fd>] call_netdevice_notifiers+0x2d/0x60 [<ffffffff814f8845>] rollback_registered_many+0x105/0x220 [<ffffffff814f89a0>] rollback_registered+0x40/0x70 [<ffffffff814f9c88>] unregister_netdevice_queue+0x48/0x80 [<ffffffff814f9cdc>] unregister_netdev+0x1c/0x30 [<ffffffffa0067139>] i40e_vsi_release+0x2a9/0x2b0 [i40e] [<ffffffffa00674e8>] i40e_remove+0x128/0x2b0 [i40e] [<ffffffff813092db>] pci_device_remove+0x3b/0xb0 [<ffffffff813d26ef>] __device_release_driver+0x7f/0xf0 [<ffffffff813d3068>] driver_detach+0xb8/0xc0 [<ffffffff813d22db>] bus_remove_driver+0x9b/0x120 [<ffffffff813d36dc>] driver_unregister+0x2c/0x50 [<ffffffff81307d4c>] pci_unregister_driver+0x2c/0x90 [<ffffffffa008f9d0>] i40e_exit_module+0x10/0x23 [i40e] [<ffffffff810dad0b>] SyS_delete_module+0x16b/0x2d0 [<ffffffff81013b0c>] ? do_notify_resume+0x9c/0xb0 [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b Code: e5 41 57 4d 89 c7 41 56 49 89 d6 41 55 49 89 f5 41 54 53 89 cb 75 14 eb 3d 0f 1f 44 00 00 83 eb 01 74 25 4d 85 e4 74 20 4c 89 e0 <4c> 8b 60 08 4c 89 f2 4c 89 ee 48 89 c7 ff 10 4d 85 ff 74 04 41 RIP [<ffffffff8160f75d>] notifier_call_chain+0x3d/0x70 Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 17:43:36 UTC
cd9100c i40iw: Fail open if there are no available MSI-X vectors Check number of available MSI-X vectors for i40iw. If there are no available vectors, fail the open. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 17:43:36 UTC
01df7f5 RDMA/vmw_pvrdma: Fix reporting correct opcodes for completion Since the IB_WC_BIND_MW opcode has been dropped, set the correct IB WC opcode explicitly. Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Reviewed-by: Aditya Sarwade <asarwade@vmware.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Bryan Tan <bryantan@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 17:32:22 UTC
e13547b IB/bnxt_re: Fix frame stack compilation warning Reduce stack size by dynamically allocating memory instead of declaring large struct on the stack: drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function ‘bnxt_re_query_qp’: drivers/infiniband/hw/bnxt_re/ib_verbs.c:1600:1: warning: the frame size of 1216 bytes is larger than 1024 bytes [-Wframe-larger-than=] } ^ Cc: Selvin Xavier <selvin.xavier@broadcom.com> Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 17:19:13 UTC
cbafad8 IB/mlx5: fix debugfs cleanup If delay_drop_debugfs_init() fails in any of the operations to create debugfs, it is calling delay_drop_debugfs_cleanup() as part of its cleanup. But delay_drop_debugfs_cleanup() checks for 'dbg' and since we have not yet pointed 'dbg' to the debugfs we need to cleanup, the cleanup fails and we are left with stray debugfs elements and also a memory leak. Fixes: 4a5fd5d2965c ("IB/mlx5: Add necessary delay drop assignment") Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 17:17:32 UTC
06564f6 IB/ocrdma: fix incorrect fall-through on switch statement In the case where mbox_status is OCRDMA_MBX_STATUS_FAILED and add_status is OCRDMA_MBX_STATUS_FAILED err_num is assigned -EAGAIN however the case OCRDMA_MBX_STATUS_FAILED is missing a break and falls through to the default case which then re-assigns err_num to -EFAULT. Fix this so that err_num is assigned to -EAGAIN for the add_status OCRDMA_MBX_STATUS_FAILED case and -EFAULT otherwise. Detected by CoverityScan CID#703125 ("Missing break in switch") Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 17:16:00 UTC
af3c79b IB/ipoib: Suppress the retry related completion errors IPoIB doesn't support transport/rnr retry schemes as per RFC so those errors are expected. No need to flood the log files with them. Tested-by: Michael Nowak <michael.nowak@oracle.com> Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com> Tested-by: Liwen Huang <liwen.huang@oracle.com> Tested-by: Hong Liu <hong.x.liu@oracle.com> Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com> Reported-by: Rajiv Raja <rajiv.raja@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 17:12:36 UTC
8b1bbf3 iw_cxgb4: remove the stid on listen create failure If a listen create fails, then the server tid (stid) is incorrectly left in the stid idr table, which can cause a touch-after-free if the stid is looked up and the already freed endpoint is touched. So make sure and remove it in the error path. Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 16:59:42 UTC
3c8415c iw_cxgb4: drop listen destroy replies if no ep found If the thread waiting for a CLOSE_LISTSRV_RPL times out and bails, then we need to handle a subsequent CPL if it arrives and the stid has been released. In this case silently drop it. Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 16:59:42 UTC
3d31860 iw_cxgb4: put ep reference in pass_accept_req() The listening endpoint should always be dereferenced at the end of pass_accept_req(). Fixes: f86fac79afec ("RDMA/iw_cxgb4: atomic find and reference for listening endpoints") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 16:59:42 UTC
e6f9bc3 IB/core: Fix for core panic Build with the latest patches resulted in panic: 11384.486289] BUG: unable to handle kernel NULL pointer dereference at (null) [11384.486293] IP: (null) [11384.486295] PGD 0 [11384.486295] P4D 0 [11384.486296] [11384.486299] Oops: 0010 [#1] SMP ......... snip ...... [11384.486401] CPU: 0 PID: 968 Comm: kworker/0:1H Tainted: G W O 4.13.0-a-stream-20170825 #1 [11384.486402] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015 [11384.486418] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [11384.486419] task: ffff880850579680 task.stack: ffffc90007fec000 [11384.486420] RIP: 0010: (null) [11384.486420] RSP: 0018:ffffc90007fef970 EFLAGS: 00010206 [11384.486421] RAX: ffff88084cfe8000 RBX: ffff88084dce4000 RCX: ffffc90007fef978 [11384.486422] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88084cfe8000 [11384.486422] RBP: ffffc90007fefab0 R08: 0000000000000000 R09: ffff88084dce4080 [11384.486423] R10: ffffffffa02d7f60 R11: 0000000000000000 R12: ffff88105af65a00 [11384.486423] R13: ffff88084dce4000 R14: 000000000000c000 R15: 000000000000c000 [11384.486424] FS: 0000000000000000(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000 [11384.486425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11384.486425] CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000001406f0 [11384.486426] Call Trace: [11384.486431] ? is_valid_mcast_lid.isra.21+0xfb/0x110 [ib_core] [11384.486436] ib_attach_mcast+0x6f/0xa0 [ib_core] [11384.486441] ipoib_mcast_attach+0x81/0x190 [ib_ipoib] [11384.486443] ipoib_mcast_join_complete+0x354/0xb40 [ib_ipoib] [11384.486448] mcast_work_handler+0x330/0x6c0 [ib_core] [11384.486452] join_handler+0x101/0x220 [ib_core] [11384.486455] ib_sa_mcmember_rec_callback+0x54/0x80 [ib_core] [11384.486459] recv_handler+0x3a/0x60 [ib_core] [11384.486462] ib_mad_recv_done+0x423/0x9b0 [ib_core] [11384.486466] __ib_process_cq+0x5d/0xb0 [ib_core] [11384.486469] ib_cq_poll_work+0x20/0x60 [ib_core] [11384.486472] process_one_work+0x149/0x360 [11384.486474] worker_thread+0x4d/0x3c0 [11384.486487] kthread+0x109/0x140 [11384.486488] ? rescuer_thread+0x380/0x380 [11384.486489] ? kthread_park+0x60/0x60 [11384.486490] ? kthread_park+0x60/0x60 [11384.486493] ret_from_fork+0x25/0x30 [11384.486493] Code: Bad RIP value. [11384.486493] Code: Bad RIP value. [11384.486496] RIP: (null) RSP: ffffc90007fef970 [11384.486497] CR2: 0000000000000000 [11384.486531] ---[ end trace b1acec6fb4ff6e75 ]--- [11384.532133] Kernel panic - not syncing: Fatal exception [11384.536541] Kernel Offset: disabled [11384.969491] ---[ end Kernel panic - not syncing: Fatal exception [11384.976875] sched: Unexpected reschedule of offline CPU#1! [11384.983646] ------------[ cut here ]------------ Rdma device driver may not have implemented (*get_link_layer)() so it can not be called directly. Should use appropriate helper function. Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Fixes: 523633359224 ("IB/core: Fix the validations of a multicast LID in attach or detach operations") Cc: stable@kernel.org # 4.13 Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 22 September 2017, 15:52:09 UTC
8eb19e8 IB/core: Expose ioctl interface through experimental Kconfig Add CONFIG_INFINIBAND_EXP_USER_ACCESS that enables the ioctl interface. This interface is experimental and is subject to change. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:14 UTC
5242711 IB/core: Assign root to all drivers In order to use the parsing tree, we need to assign the root to all drivers. Currently, we just assign the default parsing tree via ib_uverbs_add_one. The driver could override this by assigning a parsing tree prior to registering the device. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:14 UTC
9ee79fc IB/core: Add completion queue (cq) object actions Adding CQ ioctl actions: 1. create_cq 2. destroy_cq This requires adding the following: 1. A specification describing the method a. Handler b. Attributes specification Each attribute is one of the following: a. PTR_IN - input data Note: This could be encoded inlined for data < 64bit b. PTR_OUT - response data c. IDR - idr based object d. FD - fd based object Blobs attributes (clauses a and b) contain their type, while objects specifications (clauses c and d) contains the expected object type (for example, the given id should be UVERBS_TYPE_PD) and the required access (READ, WRITE, NEW or DESTROY). If a NEW is required, the new object's id will be assigned to this attribute. All attributes could get UA_FLAGS attribute. Currently we support stating that an attribute is mandatory or that the specification size corresponds to a lower bound (and that this attribute could be extended). We currently add both default attributes and the two generic UHW_IN and UHW_OUT driver specific attributes. 2. Handler A handler gets a uverbs_attr_bundle. The handler developer uses uverbs_attr_get to fetch an attribute of a given id. Each of these attribute groups correspond to the specification group defined in the action (clauses 1.b and 1.c respectively). The indices of these arrays corresponds to the attribute ids declared in the specifications (clause 2). The handler is quite simple. It assumes the infrastructure fetched all objects and locked, created or destroyed them as required by the specification. Pointer (or blob) attributes were validated to match their required sizes. After the handler finished, the infrastructure commits or rollbacks the objects. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:13 UTC
d70724f IB/core: Add legacy driver's user-data In this phase, we don't want to change all the drivers to use flexible driver's specific attributes. Therefore, we add two default attributes: UHW_IN and UHW_OUT. These attributes are optional in some methods and they encode the driver specific command data. We add a function that extract this data and creates the legacy udata over it. Driver's data should start from UVERBS_UDATA_DRIVER_DATA_FLAG. This turns on the first bit of the namespace, indicating this attribute belongs to the driver's namespace. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:13 UTC
64b19e1 IB/core: Export ioctl enum types to user-space Add a new ib_user_ioctl_verbs.h which exports all required ABI enums and structs to the user-space. Export the default types to user-space through this file. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:12 UTC
4da70da IB/core: Explicitly destroy an object while keeping uobject When some objects are destroyed, we need to extract their status at destruction. After object's destruction, this status (e.g. events_reported) relies in the uobject. In order to have the latest and correct status, the underlying object should be destroyed, but we should keep the uobject alive and read this information off the uobject. We introduce a rdma_explicit_destroy function. This function destroys the class type object (for example, the IDR class type which destroys the underlying object as well) and then convert the uobject to be of a null class type. This uobject will then be destroyed as any other uobject once uverbs_finalize_object[s] is called. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:11 UTC
3541030 IB/core: Add macros for declaring methods and attributes This patch adds macros for declaring objects, methods and attributes. These definitions are later used by downstream patches to declare some of the default types. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:11 UTC
118620d IB/core: Add uverbs merge trees functionality Different drivers support different features and even subset of the common uverbs implementation. Currently, this is handled as bitmask in every driver that represents which kind of methods it supports, but doesn't go down to attributes granularity. Moreover, drivers might want to add their specific types, methods and attributes to let their user-space counter-parts be exposed to some more efficient abstractions. It means that existence of different features is validated syntactically via the parsing infrastructure rather than using a complex in-handler logic. In order to do that, we allow defining features and abstractions as parsing trees. These per-feature parsing tree could be merged to an efficient (perfect-hash based) parsing tree, which is later used by the parsing infrastructure. To sum it up, this makes a parse tree unique for a device and represents only the features this particular device supports. This is done by having a root specification tree per feature. Before a device registers itself as an IB device, it merges all these trees into one parsing tree. This parsing tree is used to parse all user-space commands. A future user-space application could read this parse tree. This tree represents which objects, methods and attributes are supported by this device. This is based on the idea of Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:10 UTC
09e3ebf IB/core: Add DEVICE object and root tree structure This adds the DEVICE object. This object supports creating the context that all objects are created from. Moreover, it supports executing methods which are related to the device itself, such as QUERY_DEVICE. This is a singleton object (per file instance). All standard objects are put in the root structure. This root will later on be used in drivers as the source for their whole parsing tree. Later on, when new features are added, these drivers could mix this root with other customized objects. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:10 UTC
5009010 IB/core: Declare an object instead of declaring only type attributes Switch all uverbs_type_attrs_xxxx with DECLARE_UVERBS_OBJECT macros. This will be later used in order to embed the object specific methods in the objects as well. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:09 UTC
fac9658 IB/core: Add new ioctl interface In this ioctl interface, processing the command starts from properties of the command and fetching the appropriate user objects before calling the handler. Parsing and validation is done according to a specifier declared by the driver's code. In the driver, all supported objects are declared. These objects are separated to different object namepsaces. Dividing objects to namespaces is done at initialization by using the higher bits of the object ids. This initialization can mix objects declared in different places to one parsing tree using in this ioctl interface. For each object we list all supported methods. Similarly to objects, methods are separated to method namespaces too. Namespacing is done similarly to the objects case. This could be used in order to add methods to an existing object. Each method has a specific handler, which could be either a default handler or a driver specific handler. Along with the handler, a bunch of attributes are specified as well. Similarly to objects and method, attributes are namespaced and hashed by their ids at initialization too. All supported attributes are subject to automatic fetching and validation. These attributes include the command, response and the method's related objects' ids. When these entities (objects, methods and attributes) are used, the high bits of the entities ids are used in order to calculate the hash bucket index. Then, these high bits are masked out in order to have a zero based index. Since we use these high bits for both bucketing and namespacing, we get a compact representation and O(1) array access. This is mandatory for efficient dispatching. Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length. Attributes could be validated through some attributes, like: (*) Minimum size / Exact size (*) Fops for FD (*) Object type for IDR If an IDR/fd attribute is specified, the kernel also states the object type and the required access (NEW, WRITE, READ or DESTROY). All uobject/fd management is done automatically by the infrastructure, meaning - the infrastructure will fail concurrent commands that at least one of them requires concurrent access (WRITE/DESTROY), synchronize actions with device removals (dissociate context events) and take care of reference counting (increase/decrease) for concurrent actions invocation. The reference counts on the actual kernel objects shall be handled by the handlers. objects +--------+ | | | | methods +--------+ | | ns method method_spec +-----+ |len | +--------+ +------+[d]+-------+ +----------------+[d]+------------+ |attr1+-> |type | | object +> |method+-> | spec +-> + attr_buckets +-> |default_chain+--> +-----+ |idr_type| +--------+ +------+ |handler| | | +------------+ |attr2| |access | | | | | +-------+ +----------------+ |driver chain| +-----+ +--------+ | | | | +------------+ | | +------+ | | | | | | | | | | | | | | | | | | | | +--------+ [d] = Hash ids to groups using the high order bits The right types table is also chosen by using the high bits from the ids. Currently we have either default or driver specific groups. Once validation and object fetching (or creation) completed, we call the handler: int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile, struct uverbs_attr_bundle *ctx); ctx bundles attributes of different namespaces. Each element there is an array of attributes which corresponds to one namespaces of attributes. For example, in the usually used case: ctx core +----------------------------+ +------------+ | core: +---> | valid | +----------------------------+ | cmd_attr | | driver: | +------------+ |----------------------------+--+ | valid | | | cmd_attr | | +------------+ | | valid | | | obj_attr | | +------------+ | | drivers | +------------+ +> | valid | | cmd_attr | +------------+ | valid | | cmd_attr | +------------+ | valid | | obj_attr | +------------+ Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:09 UTC
14d6c3a RDMA/vmw_pvrdma: Fix a signedness Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:08 UTC
72f9b08 RDMA/vmw_pvrdma: Report network header type in WC We should report the network header type in the work completion so that the kernel can infer the right RoCE type headers. Reviewed-by: Bryan Tan <bryantan@vmware.com> Signed-off-by: Aditya Sarwade <asarwade@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:08 UTC
7936422 IB/core: Add might_sleep() annotation to ib_init_ah_from_wc() For RoCE, ib_init_ah_from_wc() can follow the path ib_init_ah_from_wc() -> rdma_addr_find_l2_eth_by_grh() -> rdma_resolve_ip() and rdma_resolve_ip() will sleep in kzalloc() and wait_for_completion(). However, developers will not see any warnings if they use ib_init_ah_from_wc() in an atomic context and test only on IB, because the function doesn't sleep in that case. Add a might_sleep() so that lockdep will catch bugs no matter what hardware is used to test. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:07 UTC
c761611 IB/cm: Fix sleeping in atomic when RoCE is used A couple of places in the CM do spin_lock_irq(&cm_id_priv->lock); ... if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg)) However when the underlying transport is RoCE, this leads to a sleeping function being called with the lock held - the callchain is cm_alloc_response_msg() -> ib_create_ah_from_wc() -> ib_init_ah_from_wc() -> rdma_addr_find_l2_eth_by_grh() -> rdma_resolve_ip() and rdma_resolve_ip() starts out by doing req = kzalloc(sizeof *req, GFP_KERNEL); not to mention rdma_addr_find_l2_eth_by_grh() doing wait_for_completion(&ctx.comp); to wait for the task that rdma_resolve_ip() queues up. Fix this by moving the AH creation out of the lock. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 31 August 2017, 12:35:07 UTC
f43dbeb IB/core: Add support to finalize objects in one transaction The new ioctl based infrastructure either commits or rollbacks all objects of the method as one transaction. In order to do that, we introduce a notion of dealing with a collection of objects that are related to a specific method. This also requires adding a notion of a method and attribute. A method contains a hash of attributes, where each bucket contains several attributes. The attributes are hashed according to their namespace which resides in the four upper bits of the id. For example, an object could be a CQ, which has an action of CREATE_CQ. This action has multiple attributes. For example, the CQ's new handle and the comp_channel. Each layer in this hierarchy - objects, methods and attributes is split into namespaces. The basic example for that is one namespace representing the default entities and another one representing the driver specific entities. When declaring these methods and attributes, we actually declare their specifications. When a method is executed, we actually allocates some space to hold auxiliary information. This auxiliary information contains meta-data about the required objects, such as pointers to their type information, pointers to the uobjects themselves (if exist), etc. The specification, along with the auxiliary information we allocated and filled is given to the finalize_objects function. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 30 August 2017, 14:30:38 UTC
a0aa309 IB/core: Add a generic way to execute an operation on a uobject The ioctl infrastructure treats all user-objects in the same manner. It gets objects ids from the user-space and by using the object type and type attributes mentioned in the object specification, it executes this required method. Passing an object id from the user-space as an attribute is carried out in three stages. The first is carried out before the actual handler and the last is carried out afterwards. The different supported operations are read, write, destroy and create. In the first stage, the former three actions just fetches the object from the repository (by using its id) and locks it. The last action allocates a new uobject. Afterwards, the second stage is carried out when the handler itself carries out the required modification of the object. The last stage is carried out after the handler finishes and commits the result. The former two operations just unlock the object. Destroy calls the "free object" operation, taking into account the object's type and releases the uobject as well. Creation just adds the new uobject to the repository, making the object visible to the application. In order to abstract these details from the ioctl infrastructure layer, we add uverbs_get_uobject_from_context and uverbs_finalize_object functions which corresponds to the first and last stages respectively. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 30 August 2017, 14:30:38 UTC
82fb342 Documentation: Hardware tag matching Add document providing definitions of terms and core explanations for tag matching (TM) protocols, eager and rendezvous, TM application header, tag list manipulations and matching process. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:21 UTC
3fd3307 IB/mlx5: Support IB_SRQT_TM Pass to mlx5_core flag to enable rendezvous offload, list_size and CQ when SRQ created with IB_SRQT_TM. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:20 UTC
5b3ec3f net/mlx5: Add XRQ support Add support to new XRQ(eXtended shared Receive Queue) hardware object. It supports SRQ semantics with addition of extended receive buffers topologies and offloads. Currently supports tag matching topology and rendezvouz offload. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:20 UTC
eb76189 IB/mlx5: Fill XRQ capabilities Provide driver specific values for XRQ capabilities. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:19 UTC
8d50505 IB/uverbs: Expose XRQ capabilities Make XRQ capabilities available via ibv_query_device() verb. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:18 UTC
38eb44f IB/uverbs: Add new SRQ type IB_SRQT_TM Add new SRQ type capable of new tag matching feature. When SRQ receives a message it will search through the matching list for the corresponding posted receive buffer. The process of searching the matching list is called tag matching. In case the tag matching results in a match, the received message will be placed in the address specified by the receive buffer. In case no match was found the message will be placed in a generic buffer until the corresponding receive buffer will be posted. These messages are called unexpected and their set is called an unexpected list. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:18 UTC
9382d4e IB/uverbs: Add XRQ creation parameter to UAPI Add tm_list_size parameter to struct ib_uverbs_create_xsrq. If SRQ type is tag-matching this field defines maximum size of tag matching list. Otherwise, it is expected to be zero. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:17 UTC
9c2c849 IB/core: Add new SRQ type IB_SRQT_TM This patch adds new SRQ type - IB_SRQT_TM. The new SRQ type supports tag matching and rendezvous offloads for MPI applications. When SRQ receives a message it will search through the matching list for the corresponding posted receive buffer. The process of searching the matching list is called tag matching. In case the tag matching results in a match, the received message will be placed in the address specified by the receive buffer. In case no match was found the message will be placed in a generic buffer until the corresponding receive buffer will be posted. These messages are called unexpected and their set is called an unexpected list. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:17 UTC
1a56ff6 IB/core: Separate CQ handle in SRQ context Before this change CQ attached to SRQ was part of XRC specific extension. Moving CQ handle out makes it available to other types extending SRQ functionality. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:16 UTC
6938fc1 IB/core: Add XRQ capabilities This patch adds following TM XRQ capabilities: * max_rndv_hdr_size - Max size of rendezvous request message * max_num_tags - Max number of entries in tag matching list * max_ops - Max number of outstanding list operations * max_sge - Max number of SGE in tag matching entry * flags - the following flags are currently defined: - IB_TM_CAP_RC - Support tag matching on RC transport Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:16 UTC
6e44636 net/mlx5: Update HW layout definitions * add offload_type field to mlx5_ifc_qpc_bits * update mlx5_ifc_xrqc_bits layout Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 29 August 2017, 12:30:15 UTC
5c50f1d IB/rxe: Handle NETDEV_CHANGE events Without this fix, ports configured on top of ixgbe miss link up notifications. ibv_query_port() will continue to return IBV_PORT_DOWN even though the port is up and working. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:36 UTC
13eb1e2 IB/rxe: Avoid ICRC errors by copying into the skb first The current process is to first calculate the CRC and then copy the client data into the packet. This leaves a window in which the packet contents and CRC can get out of sync, if the client changes the data after the CRC is calculated but before the data is copied. By copying the data into the packet and then calculating the CRC directly from the packet contents we eliminate the window. This can be seen with qperf's ud_bi_bw test. This seems like very strange/reckless client behavior, but whether the client has mangled its data or not RXE should be able to transfer it reliably. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:36 UTC
1223a1a IB/rxe: Another fix for broken receive queue draining This fixes another path in rxe_requester() that might overlook stale SKBs, preventing cleanup. Fixes: 1217197142d1 ("rxe: fix broken receive queue draining") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:35 UTC
2418ada IB/rxe: Remove unneeded initialization in prepare6() Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:35 UTC
825a51a IB/rxe: Fix up rxe_qp_cleanup() Replace sk_dst_get()/dst_release() in rxe_qp_cleanup() with sk_dst_reset(). sk_dst_get() takes a new reference on dst, so the dst_release() doesn't actually release the original reference, which was the design intent. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:34 UTC
48c22be IB/rxe: Add dst_clone() in prepare_ipv6_hdr() Otherwise the reference count goes negative as IPv6 packets complete. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:34 UTC
b9109b7 IB/rxe: Fix destination cache for IPv6 To successfully match an IPv6 path, the path cookie must match. Store it in the QP so that the IPv6 path can be reused. Replace open-coded version of dst_check() with the actual call, fixing the logic. The open-coded version skips the check call if dst->obsolete is 0 (DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE means that the route may continue to be used, though. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:33 UTC
d45d295 IB/rxe: Fix up the responder's find_resources() function The resource array is sized by max_dest_rd_atomic, not max_rd_atomic. Iterating over max_rd_atomic entries of qp->resp.resources[] will cause incorrect behavior when the two attributes are different (or even crash if max_rd_atomic is larger). Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:33 UTC
cffec53 IB/rxe: Remove dangling prototype Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Acked-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:32 UTC
bfc3ae0 IB/rxe: Disable completion upcalls when a CQ is destroyed This prevents the stack from accessing userspace objects while they are being torn down. One possible sequence of events: - Userspace program exits - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(), ib_destroy_cq(), etc. and releasing/freeing the UCQ - The QP still has tasklets running, so it isn't destroyed yet - The CQ is referenced by the QP, so the CQ isn't destroyed yet - The UCQ is kfree()'d anyway - A send work request completes - rxe_send_complete() calls cq->ibcq.comp_handler() - ib_uverbs_comp_handler() runs and crashes; the event queue is checked for is_closed, but it has no way to check the ib_ucq_object before accessing it The reference counting on the CQ doesn't protect against this since the CQ hasn't been destroyed yet. There's no available interface to deregister the UCQ from the CQ, and it didn't appear that attempting to add reference counting to the UCQ was going to be a good way to go since this solution is much simpler. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:32 UTC
9eb7f8e IB/rxe: Move refcounting earlier in rxe_send() The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to add the QP ref before output to avoid extra dereferences during network congestion. This could lead to unwanted destruction of the QP. Fix up the skb_out accounting, too. Fixes: fda85ce91240 ("IB/rxe: Fix kernel panic from skb destructor") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:31 UTC
0208da9 IB/rdmavt: Handle dereg of inuse MRs properly A destroy of an MR prior to destroying the QP can cause the following diagnostic if the QP is referencing the MR being de-registered: hfi1 0000:05:00.0: hfi1_0: rvt_dereg_mr timeout mr ffff8808562108 00 pd ffff880859b20b00 The solution is to when the a non-zero refcount is encountered when the MR is destroyed the QPs needs to be iterated looking for QPs in the same PD as the MR. If rvt_qp_mr_clean() detects any such QP references the rkey/lkey, the QP needs to be put into an error state via a call to rvt_qp_error() which will trigger the clean up of any stuck references. This solution is as specified in IBTA 1.3 Volume 1 11.2.10.5. [This is reproduced with the 0.4.9 version of qperf and the rc_bw test] Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:31 UTC
557fafe IB/qib: Convert qp_stats debugfs interface to use new iterator API Continue porting copy/paste code into rdmavt from qib. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:30 UTC
e5c197a IB/hfi1: Convert qp_stats debugfs interface to use new iterator API Continue moving copy/paste code into rdmavt. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:30 UTC
dff2fe7 IB/hfi1: Convert hfi1_error_port_qps() to use new QP iterator Change hfi1_error_port_qps() to use the new rvt_qp_iter() in its QP scanning. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:29 UTC
4734b4f IB/rdmavt: Add QP iterator API for QPs There are currently 3 spots in the qib and hfi1 driver that have knowledge of the internal QP hash list that should only be in scope to rdmavt QP code. Add an iterator API for processing all QPs to hide the nature of the RCU hashlist. The API consists of: - rvt_qp_iter_init() * For iterating QPs one at a time for seq_file semantics - rvt_qp_iter_next() * For iterating QPs one at a time for seq_file semantics - rvt_qp_iter() * For iterating all QPs The first two are used for things like seq_file prints. The last is for code that just needs to iterate all QPs in the system. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:28 UTC
4b9796b IB/hfi1: Use accessor to determine ring size The qp_stats print will soon be moving to rdmavt, so use the proper accessor to get the ring size rather than a driver supplied constant. Fixes: Commit ff8d836efe06 ("IB/hfi1: Add receiving queue info to qp_stats") Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:28 UTC
6167a5b IB/qib: Stricter bounds checking for copy to buffer Replace 'strcpy' with 'strncpy' to restrict the number of bytes copied to the buffer. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:27 UTC
34ab4de IB/hif1: Remove static tracing from SDMA hot path The hfi1_cdbg() macro can be instantiated in the hot path even when it is not in use. This shows up on perf profiles. Rework the macros (for SDMA and MMU), to use the trace interface directly to eliminate this performance hit. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:27 UTC
ba81a42 IB/hfi1: Acquire QSFP cable information on loopback Currently, QSFP information is not queried in cases where loopback was set up and QSFP module is present. Acquire QSFP information in case of loopback. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:26 UTC
cfeca08 i40iw: make some structures const Make some structures const as they are only used during a copy operation. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:26 UTC
733da3b IB/hfi1: constify vm_operations_struct vm_operations_struct are not supposed to change at runtime. vm_area_struct structure working with const vm_operations_struct. So mark the non-const vm_operations_struct structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:25 UTC
2d72d6c RDMA/bnxt_re: remove unnecessary call to memset call to memset to assign 0 value immediately after allocating memory with kzalloc is unnecesaary as kzalloc allocates the memory filled with 0 value. Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:25 UTC
d518a44 IB/usnic: check for allocation failure usnic_uiom_get_dev_list() can return ERR_PTR(-ENOMEM) so we should check for that. Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:24 UTC
280ad49 IB/hfi1: Add opcode states to qp_stats These fields allow for debugging send engine processing. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:24 UTC
642aaab IB/hfi1: Add received request info to qp_stats The rvt_ack_entry pointed to by s_tail_ack_queue provides important info about the request that has just been processed or is being processed on the responder side of a RC connection. This patch adds this info to the qp_stats to assist debugging. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:23 UTC
d68e68e IB/hfi1: Fix whitespace alignment issue for MAD Fix a tab alignment issue present in pr_err_ratelimited error message. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:23 UTC
32500f2 IB/hfi1: Move structure and MACRO definitions in user_sdma.c to user_sdma.h Clean up user_sdma.c by moving the structure and MACRO definitions into the header file user_sdma.h Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:22 UTC
637f460 IB/hfi1: Move structure definitions from user_exp_rcv.c to user_exp_rcv.h Clean up user_exp_rcv.c file by moving structure definitions into header file user_exp_rcv.h. Since these structure definitions depend on the structure definitions in mmu_rb.h, move #include "mmu_rb.h" above the include "user_exp_rcv.h" or include of header files that include user_exp_rcv.h Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:22 UTC
ddd3aff IB/hfi1: Remove duplicate definitions of num_user_pages() function num_user_pages() function has been defined in both user_exp_rcv.c file and user_sdma.c file. Move the function definition to a header file so there is only one definition in the source repo. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:21 UTC
04a646d IB/hfi1: Fix the bail out code in pin_vector_pages() function In pin_vector_pages() function, if there is any error while pinning the pages or while adding a pinned buffer to the cache, the bail out code needs to unpin any pinned pages that are not in the cache and adjust the n_locked counter that counts the total pages pinned. The current bail out code doesn't seem to be doing it right in two cases: 1. Before pinning required pages for a buffer, the SDMA pinned buffer cache is searched to see if the virtual address range that needs to be pinned is already pinned. If there isn't a hit in the cache, a new node is created for the buffer and is added to the cache after the buffer is pinned. If adding the new node to the cache fails, the n_locked count is decremented properly but the pinned pages are not freed. This commit fixes this issue. 2. If there is a hit in the SDMA cache, but the cached buffer doesn't have enough pages to cover the entire address range that needs to be pinned, the node for the cached buffer is extracted from the cache, remaining pages needed are pinned and added to the node. The node is finally added back into the cache. If there is an error pinning the extra pages, the bail out code frees all the pages in the node but the n_locked count is not being decremented by the no of pages in the node that are freed. This commit fixes this issue. This commit fixes the above two issues by creating a new function that frees the pages in a node and decrements the n_locked count by the number of pages freed. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:21 UTC
4c6c9aa IB/hfi1: Clean up pin_vector_pages() function Clean up pin_vector_pages() function by moving page pinning related code to a separate function since it really stands on its own. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:20 UTC
624b9ac IB/hfi1: Clean up user_sdma_send_pkts() function user_sdma_send_pkts() function is unnecessarily long. Clean it up by moving some of its code into separate functions. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:19 UTC
9dc1170 IB/hfi1: Clean up hfi1_user_exp_rcv_setup function Clean up hfi1_user_exp_rcv_setup function by moving page pinning and unpinning related code to separate functions. In order to reduce the number of parameters passed between functions, a new data structure struct tid_user_buf is defined and used. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:19 UTC
7956371 IB/hfi1: Improve local kmem_cache_alloc performance Performance analysis shows that the cache callback function sdma_kmem_cache_ctor contributes to 1/2 of the kmem_cache_allocs time. Since all of the fields in the allocated data structure are initialized in the code path, remove the _ctor function. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:18 UTC
de42de8 IB/hfi1: Ratelimit prints from sdma_interrupt Ratelimit error prints from sdma_interrupt function that could swarm dmesg otherwise. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Grzegorz Morys <grzegorz.morys@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:18 UTC
2714727 IB/qib: Stricter bounds checking for copy and array access Added checking on index value of array 'guids' in qib_ruc.c. Pass in corrrect size of array for memset operation in qib_mad.c. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:17 UTC
3b71693 IB/qib: Remove unnecessary memory allocation for boardname Remove all the memory allocation implemented for boardname and directly assign the defined string literal. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:17 UTC
5b0ef65 IB/{qib, hfi1}: Avoid flow control testing for RDMA write operation Section 9.7.7.2.5 of the 1.3 IBTA spec clearly says that receive credits should never apply to RDMA write. qib and hfi1 were doing that. The following situation will result in a QP hang: - A prior SEND or RDMA_WRITE with immmediate consumed the last credit for a QP using RC receive buffer credits - The prior op is acked so there are no more acks - The peer ULP fails to post receive for some reason - An RDMA write sees that the credits are exhausted and waits - The peer ULP posts receive buffers - The ULP posts a send or RDMA write that will be hung The fix is to avoid the credit test for the RDMA write operation. Cc: <stable@vger.kernel.org> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:16 UTC
3aaee8a IB/rdmavt: Use rvt_put_swqe() in rvt_clear_mr_ref() hfi1 and qib were converted in previous patches, do the same for rdmavt. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 28 August 2017, 23:12:16 UTC
a113969 Merge branch 'mellanox' into k.o/for-next Signed-off-by: Doug Ledford <dledford@redhat.com> 25 August 2017, 00:25:15 UTC
050da90 IB/mlx5: Report mlx5 enhanced multi packet WQE capability Expose enhanced multi packet WQE capability to user space through query_device by uhw. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:47:35 UTC
795b609 IB/mlx5: Allow posting multi packet send WQEs if hardware supports Set the field to allow posting multi packet send WQEs if hardware supports this feature. This doesn't mean the send WQEs will be for multi packet unless the send WQE was prepared according to multi packet send WQE format. User space shall use flag MLX5_IB_ALLOW_MPW to check if hardware supports MPW and allows MPW in SQ context. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:47:35 UTC
a550ddf IB/mlx5: Add support for multi underlay QP Set underlay QPN as part of flow rule when it's applicable. There is one root flow table in the NIC RX namespace and all the underlay QPs steer the traffic to this flow table. In order to prevent QP to get traffic which is not target to its underlay QP, we need to set the underlay QP number as part of the steering matching. Note: When multicast traffic is sent the QPN filtering is done by the firmware as some early step. Adding the QPN match on the flow table entry is wrong as by that time the target QPN holds the multicast address (e.g. FF(s)) and it won't match. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:47:34 UTC
7b4cdaa IB/mlx5: Fix integer overflow when page_shift == 31 Fix a bug where MR registration fails when mlx5_ib_cont_pages indicates that the MR can be mapped using 2GB pages (page_shift == 31). Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:47:34 UTC
5942d8a IB/mlx5: Fix memory leak in clean_mr error path In clean_mr error path the 'mr' should be freed. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:47:34 UTC
ff740ae IB/mlx5: Decouple MR allocation and population flows mlx5 compatible devices have two ways of populating the MTT table of an MKEY: using a FW command and using a UMR WQE. A UMR is much faster, so it should be used whenever possible. Unfortunately the code today uses UMR only if the MKEY was allocated from the MR cache. Fix the code to use UMR even for MKEYs that were allocated using a FW command. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:47:34 UTC
8b7ff7f IB/mlx5: Enable UMR for MRs created with reg_create This patch is the first step in decoupling UMR usage and allocation from the MR cache. The only functional change in this patch is to enables UMR for MRs created with reg_create. This change fixes a bug where ODP memory regions that were not allocated from the MR cache did not have UMR enabled. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:47:34 UTC
96dc3fc IB/mlx5: Expose software parsing for Raw Ethernet QP Software parsing (SWP) is a feature that can be used to instruct the device to stop using its internal parser and to parse packets on the transmit path according to offsets set for each packets. Through this feature, the device allows the handling of checksum and LSO by the hardware according to the location of IP and TCP/UDP headers. Enable SW parsing on Raw Ethernet send queue by default if firmware supports it and report these capabilities to user space. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:47:34 UTC
accbef5 RDMA/i40iw: Remove unused argument None of the calls to i40iw_netdev_vlan_ipv6 are using mac so let's remove it from func's args-list. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:34:57 UTC
847cb1a RDMA/qedr: fix spelling mistake: "invlaid" -> "invalid" Trivial fix to spelling mistake in DP_ERR error message Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:34:57 UTC
61e0962 IB: Avoid ib_modify_port() failure for RoCE devices IB CM calls ib_modify_port() irrespective of link layer. If the failure is returned, the mad agent gets unregistered for those devices. Recently, modify_port() hook was removed from some of the low level drivers as it was always returning success. This breaks rdma connection establishment over those devices. For ethernet devices, Qkey violation and port capabilities are not applicable. So returning success for RoCE when modify_port hook is is not implemented. Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:34:57 UTC
a31a2a3 RDMA/vmw_pvrdma: Update device query parameters and port caps Added support for two device caps - max_sge_rd, max_fast_reg_page_list_len and the IP_BASED_GIDS port cap flag. Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Reviewed-by: Bryan Tan <bryantan@vmware.com> Reviewed-by: Aditya Sarwade <asarwade@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:34:57 UTC
05297b6 RDMA/vmw_pvrdma: Add RoCEv2 support The driver version is bumped for compatibility purposes. Also, send correct GID type during register to device. Added compatibility check macros for the device. Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Reviewed-by: Aditya Sarwade <asarwade@vmware.com> Signed-off-by: Bryan Tan <bryantan@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:34:57 UTC
31a8236 IB/ipoib: Enable ioctl for to IPoIB rdma netdevs Adds support for ioctl callback in the RDMA netdevs to allow supporting functions not handled by the generic interface code. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 21:34:57 UTC
4b7ee67 RDMA/nes: Remove zeroed parameter from port query callback There is no need to explicitly zero parameters, because the structure requested to be filled already initialized to zeros. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 20:44:48 UTC
17bf1ad RDMA/mlx4: Properly annotate link layer variable The rdma_port_get_link_layer() returns enum rdma_link_layer as a return value, hence it is better to store the return value in specially annotated variable and not in int. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 20:44:48 UTC
84305d7 RDMA/mlx5: Limit scope of get vector affinity local function The mlx5_ib_get_vector_affinity() call is local to main.c file and there is no need to be declared globally visible. Fixes: 40b24403f33e ("mlx5: support ->get_vector_affinity") Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 20:44:48 UTC
fab773c IB/rxe: Make rxe_counter_name static rxe_counter_name is used in rxe_hw_counters.c only. Make it static. Fixes: 0b1e5b99a48b ('IB/rxe: Add port protocol stats') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 20:44:48 UTC
69956d8 IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock In order to avoid deadlock between sysfs functions (like create/delete child) and remove_one (both of them are using the sysfs lock and rtnl_lock) the driver will use a state mutex for sync. That will fix traces as the following: schedule+0x3e/0x90 kernfs_drain+0x75/0xf0 ? wait_woken+0x90/0x90 __kernfs_remove+0x12e/0x1c0 kernfs_remove+0x25/0x40 sysfs_remove_dir+0x57/0x90 kobject_del+0x22/0x60 device_del+0x195/0x230 pm_runtime_set_memalloc_noio+0xac/0xf0 netdev_unregister_kobject+0x71/0x80 rollback_registered_many+0x205/0x2f0 rollback_registered+0x31/0x40 unregister_netdevice_queue+0x58/0xb0 unregister_netdev+0x20/0x30 ipoib_remove_one+0xb7/0x240 [ib_ipoib] ib_unregister_device+0xbc/0x1b0 [ib_core] ib_unregister_mad_agent+0x29/0x30 [ib_core] mlx4_ib_remove+0x67/0x280 [mlx4_ib] INFO: task echo:24082 blocked for more than 120 seconds. Tainted: G OE 4.1.12-37.5.1.el6uek.x86_64 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Call Trace: schedule+0x3e/0x90 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0x95/0x110 ? _rcu_barrier+0x177/0x220 mutex_lock+0x23/0x40 rtnl_lock+0x15/0x20 netdev_run_todo+0x81/0x1f0 rtnl_unlock+0xe/0x10 ipoib_vlan_delete+0x12f/0x1c0 [ib_ipoib] delete_child+0x69/0x80 [ib_ipoib] dev_attr_store+0x20/0x30 sysfs_kf_write+0x41/0x50 Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 20:31:08 UTC
f9bfea9 IB/mlx4: Check that reserved fields in mlx4_ib_create_qp_rss are zero According to mlx4 convention, need to fail the command due to a non-zero value in the user data which is expected to be zero. Fixes: 3078f5f1bd8b ("IB/mlx4: Add support for RSS QP") Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> 24 August 2017, 20:27:11 UTC
back to top