linux

mirror of https://github.com/FEX-Emu/linux.git synced 2025-01-11 11:56:48 +00:00

Author	SHA1	Message	Date
Doug Ledford	7a226f9c32	staging/rdma: Remove the entire rdma subdirectory of staging This is no longer in use. Remove it. Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:59:34 -04:00
Christoph Lameter	b40f4757da	IB/core: Make device counter infrastructure dynamic In practice, each RDMA device has a unique set of counters that the hardware implements. Having a central set of counters that they must all adhere to is limiting and causes many useful counters to not be available. Therefore we create a dynamic counter registration infrastructure. The driver must implement a stats structure allocation routine, in which the driver must place the directory name it wants, a list of names for all of the counters, an array of u64 counters themselves, plus a few generic configuration options. We then implement a core routine to create a sysfs file for each of the named stats elements, and a core routine to retrieve the stats when any of the sysfs attribute files are read. To avoid excessive beating on the stats generation routine in the drivers, the core code also caches the stats for a short period of time so that someone attempting to read all of the stats in a given device's directory will not result in a stats generation call per file read. Future work will attempt to standardize just the shared stats elements, and possibly add a method to get the stats via netlink in addition to sysfs. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> [ Add caching, make structure names more informative, add i40iw support, other significant rewrites from the original patch ]	2016-05-26 12:52:51 -04:00
Doug Ledford	8779e7658d	Merge branch 'hfi1-2' into k.o/for-4.7	2016-05-26 12:50:05 -04:00
Jubin John	f158486527	IB/hfi1: Fix pio map initialization The pio map initialization function is off by 1 causing the last kernel send context that is allocated to not get mapped into the pio map which leads to the last kernel send context not being used by any of the qps. The send context reserved for VL15 is taken care of by setting the scontext variable that is used as the index into the kernel send context array to 1 and does not need to be accounted for in the kernel send context counting loop as it is currently done. Fix the kernel send context counting loop to account for all the allocated send contexts and map all of them to the different VLs. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Dean Luick	f6de3d39cf	IB/hfi1: Correct 8051 link parameter settings Two 8051 link settings, external device config and tuning method, were written in the wrong location and the previous settings were not cleared. For both, clear the old value and write the new value. Fixes: 8ebd4cf1852a ("staging/rdma/hfi1: Add active and optical cable support") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Sebastian Sanchez	ce8b2fd095	IB/hfi1: Update pkey table properly after link down or FM start When FM is disabled, and the HFI port on the switch is changed from MgmtAllowed=YES to MgmtAllowed=NO and the link is bounced, FULL_MGMT_P_KEY doesn't get cleared from the pkey table. This also occurs when the QSFP cable is moved from a switch port with MgmtAllowed=YES to a MgmtAllowed=NO port. Clear pkey entry properly. Also, when the driver is loaded and the switch port is set to MgmtAllowed=NO, FULL_MGMT_P_KEY shouldn't be added to pkey table after FM is started. Only set FULL_MGMT_P_KEY in the pkey table if switch port is configured to MgmtAllowed=YES. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Mike Marciniszyn	8b103e9cde	IB/rdamvt: Fix rdmavt s_ack_queue sizing rdmavt allows the driver to specify the size of the ack queue, but only uses it for the modify QP limit testing for setting the atomic limit value. The driver dependent size is now used to size the s_ack_queue ring dynamicially. Since the driver knows its size, the driver will use its define for any ring size dependent code. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Mike Marciniszyn	4c0b653335	IB/rdmavt: Max atomic value should be a u8 This matches the ib_qp_attr size and avoids a extremely large value when the lower level driver registers. As part of the patch, the u8 ordinals are moved to the end of the struct to reduce pahole noted excesses. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Mike Marciniszyn	7049de65c9	IB/hfi1: Fix hard lockup due to not using save/restore spin lock Commit b9b06cb6feda ("IB/hfi1: Fix missing lock/unlock in verbs drain callback") added a spin lock. Unfortunately, the new lock code can be called from a base level interrupt state, and an interrupt that can get stacked will attempt to get the same lock. Fix by using the flag save/restore spin lock variation. Cc: stable@vger.kernel.org # 4.6+ Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Jianxin Xiong	bdd8a98ce4	IB/hfi1: Add tracing support for send with invalidate opcode Enable trace generation for packets with the "Send Last with Invalidate" and "Send Only with Invalidate" opcodes. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Jianxin Xiong	23f7d0d29e	IB/hfi1, qib: Add ieth to the packet header definitions A new union member "ieth" (Invalidate Extended Transport Header) is added to the packet header definition in preparation of supporting the send with invalidate opcode. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 12:21:10 -04:00
Doug Ledford	e6f61130ed	Merge branches 'misc-4.7-2', 'ipoib' and 'ib-router' into k.o/for-4.7	2016-05-26 11:55:19 -04:00
Dennis Dalessandro	f48ad614c1	IB/hfi1: Move driver out of staging The TODO list for the hfi1 driver was completed during 4.6. In addition other objections raised (which are far beyond what was in the TODO list) have been addressed as well. It is now time to remove the driver from staging and into the drivers/infiniband sub-tree. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:35:14 -04:00
Dennis Dalessandro	e11ffbd575	IB/hfi1: Do not free hfi1 cdev parent structure early The deletion of a cdev is not a fence for holding off references to the structure. The driver attempts to delete the cdev and then proceeds to free the parent structure, the hfi1_devdata, or dd. This can potentially lead to a kernel panic in situations where a user has an FD for the cdev open, and the pci device gets removed. If the user then closes the FD there will be a NULL dereference when trying to do put on the cdev's kobject. Fix this by pointing the cdev's kobject.parent at a new kobject embedded in its parent structure. Also take a reference when the device is opened and put it back when it is closed. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:35:13 -04:00
Dennis Dalessandro	8a1882ebd4	IB/hfi1: Add trace message in user IOCTL handling Add a trace message to HFI1s user IOCTL handling. This allows debugging of which IOCTLs are being handled by the driver. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:35:13 -04:00
Dennis Dalessandro	380fb94288	IB/hfi1: Remove write(), use ioctl() for user cmds Remove the write() handler for user space commands now that ioctl handling is available. User apps will need to change to use ioctl from this point forward. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:35:13 -04:00
Dennis Dalessandro	8d970cf991	IB/hfi1: Add ioctl() interface for user commands IOCTL is more suited to what user space commands need to do than the write() interface. Add IOCTL definitions for all existing write commands and the handling for those. The write() interface will be removed in a follow on patch. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:35:06 -04:00
Dennis Dalessandro	ac56f162d4	IB/hfi1: Remove unused user command The HFI1_CMD_SDMA_STATUS_UPD command was never implemented it has no reason to live in the driver. Remove it. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:18 -04:00
Dennis Dalessandro	0f7b1f917c	IB/hfi1: Remove snoop/diag interface The snoop/diag interface is better served by an implementation which is more general and usable by other drivers perhaps. Go ahead and remove the code now and get rid of the char dev. We can put the feature back when we have a more agreeable solution. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:18 -04:00
Dennis Dalessandro	d079031742	IB/hfi1: Remove EPROM functionality from data device Remove EPROM handling from the cdev which is used for user application data traffic. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:18 -04:00
Dennis Dalessandro	7312f29d8e	IB/hfi1: Remove UI char device Remove UI char device which exposes direct access to registers for user space. This was put in to aid in debugging the hardware. We are looking into alternatives means of providing the same functionality. This removes another char device from HFI1's footprint. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:18 -04:00
Dennis Dalessandro	0eb626590d	IB/hfi1: Remove multiple device cdev hfi1 current exports a cdev that can be used to target all of the hfi's in the system. However there is a problem with this approach in that the devices could be on different subnets. This is a problem that user space can figure out and explicitly tell the driver on which device to create a context. Remove the multi-purpose cdev leaving a dedicated cdev for each port. Also remove the striping capability that is dependent upon the user choosing the multi-purpose cdev. It is now up to user space to determine how to stripe contexts. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:17 -04:00
Dennis Dalessandro	f3225c3f11	IB/hfi1: Remove anti-pattern in cdev init Remove the usage of an anti-pattern goto in hfi1_cdev_init to improve code readability. Suggested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:11 -04:00
Jianxin Xiong	b583faf4dc	IB/hfi1: Fix bug that blocks process on exit after port bounce During the processing of a user SDMA request, if there was an error before the request counter was increased, the state of the packet queue could be updated incorrectly, causing the counter to underflow. As the result, the process could get stuck later since the counter could never get back to 0. This patch adds a condition to guard the packet queue update so that the counter is only decreased if it has been increased before the error happens. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:11 -04:00
Jubin John	f70f5f6af3	IB/qib: Remove unused qib_7322_intr_msgs[] Building the qib driver with gcc version 6.1.0 raises the following build warning: drivers/infiniband/hw/qib/qib_iba7322.c:1311:39: warning: 'qib_7322_intr_msgs' defined but not used [-Wunused-const-variable=] static const struct qib_hwerror_msgs qib_7322_intr_msgs[] = { ^~~~~~~~~~~~~~~~~~ Remove the unused qib_7322_intr_msgs[] Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:11 -04:00
Ira Weiny	46aa5baf96	IB/hfi1: Remove unnecessary comment This comment was old, the MTU enums have been defined. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:10 -04:00
Jubin John	eac7193632	IB/hfi1: Fix sdma_event_names[] build warning sdma_event_names[] is only used within CONFIG_SDMA_VERBOSITY ifdefs, so when CONFIG_SDMA_VERBOSITY is disabled, it results in the following 0-day build warning: >> drivers/infiniband/hw/hfi1/sdma.c:137:27: warning: 'sdma_event_names' >> defined but not used [-Wunused-const-variable=] static const char * const sdma_event_names[] = { ^~~~~~~~~~~~~~~~ This occurs on the following compiler: compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 For more information check: https://lists.01.org/pipermail/kbuild-all/2016-May/020060.html Fix this warning by defining sdma_event_name[] only within the CONFIG_SDMA_VERBOSITY ifdefs. Reported-by: kbuild test robot <fengguang.wu@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:10 -04:00
Jubin John	49961f8fe8	IB/rdmavt: Use kzalloc_node Use kzalloc_node instead of kzalloc for rdmavt memory region segment allocation to optimize for performance on NUMA platforms. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:10 -04:00
Mike Marciniszyn	654b643670	IB/rdmavt: Insure QP vmalloc variants zero memory The usage of the various vmalloc APIs do not consistently zero memory when allocating the swqe. Insure zeroing variants are used. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:10 -04:00
Mitko Haralanov	9565c6a37a	IB/hfi1: Fix an interval RB node reference count leak Commit e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which may cause corruption") introduced a bug which may cause a reference count of a interval RB node to be leaked in the case where an SDMA transfer from that node completes at the same time as the node is being extended. If a node is being extended, it is first removed from the RB tree in order to be processed without the risk of an invalidation event removing the node at the same time. If a SDMA completion happens during that time, the completion handler will fail to find the node in the RB tree and, therefore, fail to correctly decrement its refcount. This leaves the node in the tree and its pages pinned for the duration of the user process. To prevent this from happening the io vector adds a reference to the RB node, which is used during the SDMA completion instead of looking up the node in the RB tree. This change adds a performance improvement as a side effect by avoiding the RB tree lookup. Fixes: e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which may cause corruption") Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:10 -04:00
Mark Bloch	492a7e67ff	IB/IPoIB: Allow setting the device address In IB networks, and specifically in IPoIB/rdmacm traffic, the device address of an IPoIB interface is used as a means to exchange information between nodes needed for communication. Currently an IPoIB interface will always be created with a device address based on its node GUID without a way to change that. This change adds the ability to set the device address of an IPoIB interface by value. We use the set mac address ndo to do that. The flow should be broken down to two: 1) The GID value is already in the GID table, in this case the interface will be able to set carrier up. 2) The GID value is not yet in the GID table, in this case the interface won't try to join the multicast group and will wait (listen on GID_CHANGE event) until the GID is inserted. In order to track those changes, we add a new flag: * IPOIB_FLAG_DEV_ADDR_SET. When set, it means the dev_addr is a based on a value in the gid table. this bit will be cleared upon a dev_addr change triggered by the user and set after validation. Per IB spec the port GUID can't change if the module is loaded. port GUID is the basis for GID at index 0 which is the basis for the default device address of a ipoib interface. The issue is that there are devices that don't follow the spec, they change the port GUID while HCA is powered on, so in order not to break userspace applications. We need to check if the user wanted to control the device address and we assume that if he sets the device address back to be based on GID index 0, he no longer wishs to control it. In order to track this, we add an additional flag: * IPOIB_FLAG_DEV_ADDR_CTRL When setting the device address, there is no validation of the upper twelve bytes of the device address (flags, qpn, subnet prefix) as those bytes are not under the control of the user. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-25 15:39:03 -04:00
Erez Shitrit	3b56113016	IB/ipoib: Support SendOnlyFullMember MCG for SendOnly join Check (via an SA query) if the SM supports the new option for SendOnly multicast joins. If the SM supports that option it will use the new join state to create such multicast group. If SendOnlyFullMember is supported, we wouldn't use faked FullMember state join for SendOnly MCG, use the correct state if supported. This check is performed at every invocation of mcast_restart task, to be sure that the driver stays in sync with the current state of the SM. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-25 15:39:03 -04:00
Erez Shitrit	cd6e9b7ef9	IB/core: Support new type of join-state for multicast There are four types for MCG, FullMember, NonMember, SendOnlyNonMember, and the new added type: SendOnlyFullMember. Add support for the new SendOnlyFullMember join state. The new type allows host to send join request as sendonly, it will cause the group to be created but without getting packets from this multicast back to the host. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-25 15:39:03 -04:00
Erez Shitrit	628e6f7515	IB/SA Agent: Add support for SA agent get ClassPortInfo New SA query function to return the ClassPortInfo struct from the SA. If the SM supports FullMemberSendOnly mode for MCG's, it sets a capability bit in the capability_mask2 field of the response. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-25 15:39:02 -04:00
Erez Shitrit	507f6afa3a	IB/core: Introduce capabilitymask2 field in ClassPortInfo mad Change struct ib_class_port_info to conform to IB Spec 1.3 That in order to get specific capability mask from ClassPortInfo mad. >From the IB Spec, ClassPortInfo section: "CapabilityMask2 Bits 0-26: Additional class-specific capabilities... RespTimeValue the rest 5 bits" The new struct now has one field for capabilitymask2 (previously was the reserved field) and the resp_time field. And it fixes up qib and srpt, use of the field repurposed to be used as capabilitymask2: IB/qib: Change pma_get_classportinfo IB/srpt: Adjust the use of ib_class_port_info Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-25 15:39:02 -04:00
Mark Bloch	ae43f82867	IB/core: Add IP to GID netlink offload There is an assumption that rdmacm is used only between nodes in the same IB subnet, this why ARP resolution can be used to turn IP to GID in rdmacm. When dealing with IB communication between subnets this assumption is no longer valid. ARP resolution will get us the next hop device address and not the peer node's device address. To solve this issue, we will check user space if it can provide the GID of the peer node, and fail if not. We add a sequence number to identify each request and fill in the GID upon answer from userspace. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-24 14:44:04 -04:00
Mark Bloch	735c631ae9	IB/core: Register SA ibnl client during ib_core initialization Move SA ibnl client registration to ib_core module init. This will allow us to register a single client to handle all RDMA_NL_LS operations and make it SA independent. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-24 14:43:43 -04:00
Mark Bloch	c34d376187	IB/netlink: Add a new local service operation This commits adds a new RDMA local service operation: - IP to GID resolution. The client request would include the ifindex of the outgoing interface and would place in an attribute (LS_NLA_TYPE_IPV4 or LS_NLA_TYPE_IPV6) the destnation IP. The local service would answer with a message that has the attribute: - LS_NLA_TYPE_DGID - The destination GID. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-24 14:42:48 -04:00
Mark Bloch	c2e49c9232	IB/SA: Integrate ib_sa module into ib_core module Consolidate ib_sa into ib_core, this commit eliminates ib_sa.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-24 14:42:36 -04:00
Mark Bloch	4c2cb42204	IB/MAD: Integrate ib_mad module into ib_core module Consolidate ib_mad into ib_core, this commit eliminates ib_mad.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-24 14:40:13 -04:00
Leon Romanovsky	e3f20f0286	IB/core: Integrate IB address resolution module into core IB address resolution is declared as a module (ib_addr.ko) which loads itself before IB core module (ib_core.ko). It causes to the scenario where IB netlink which is initialized by IB core can't be used by ib_addr.ko. In order to solve it, we are converting ib_addr.ko to be part of IB core module. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-24 14:40:13 -04:00
Honggang Li	0de4cbb3dd	RDMA/cxgb3: device driver frees DMA memory with different size [ 598.852037] ------------[ cut here ]------------ [ 598.856698] WARNING: at lib/dma-debug.c:887 check_unmap+0xf8/0x920() [ 598.863079] cxgb3 0000:01:00.0: DMA-API: device driver frees DMA memory with different size [device address=0x0000000003310000] [map size=17 bytes] [unmap size=16 bytes] [ 598.878265] Modules linked in: xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad kvm_amd kvm ipmi_devintf ipmi_ssif dcdbas pcspkr ipmi_si sg ipmi_msghandler acpi_power_meter amd64_edac_mod shpchp edac_core sp5100_tco k10temp edac_mce_amd i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic iw_cxgb3 pata_acpi ib_core ib_addr mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm pata_atiixp drm ahci libahci serio_raw i2c_core cxgb3 libata bnx2 mdio dm_mirror dm_region_hash dm_log dm_mod [ 598.946822] CPU: 3 PID: 11820 Comm: cmtime Not tainted 3.10.0-327.el7.x86_64.debug #1 [ 598.954681] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS 2.0.2 10/22/2012 [ 598.962193] ffff8808077479a8 000000000381a432 ffff880807747960 ffffffff81700918 [ 598.969663] ffff880807747998 ffffffff8108b6c0 ffff880807747a80 ffff8808063f55c0 [ 598.977132] ffffffff833ca850 0000000000000282 ffff88080b1bb800 ffff880807747a00 [ 598.984602] Call Trace: [ 598.987062] [<ffffffff81700918>] dump_stack+0x19/0x1b [ 598.992224] [<ffffffff8108b6c0>] warn_slowpath_common+0x70/0xb0 [ 598.998254] [<ffffffff8108b75c>] warn_slowpath_fmt+0x5c/0x80 [ 599.004033] [<ffffffff813903b8>] check_unmap+0xf8/0x920 [ 599.009369] [<ffffffff81025959>] ? sched_clock+0x9/0x10 [ 599.014702] [<ffffffff81390cee>] debug_dma_free_coherent+0x7e/0xa0 [ 599.021008] [<ffffffffa01ece2c>] cxio_destroy_cq+0xcc/0x160 [iw_cxgb3] [ 599.027654] [<ffffffffa01e8da0>] iwch_destroy_cq+0xf0/0x140 [iw_cxgb3] [ 599.034307] [<ffffffffa01c4bfe>] ib_destroy_cq+0x1e/0x30 [ib_core] [ 599.040601] [<ffffffffa04ff2d2>] ib_uverbs_close+0x302/0x4d0 [ib_uverbs] [ 599.047417] [<ffffffff812335a2>] __fput+0x102/0x310 [ 599.052401] [<ffffffff8123388e>] ____fput+0xe/0x10 [ 599.057297] [<ffffffff810bbde4>] task_work_run+0xb4/0xe0 [ 599.062719] [<ffffffff81092a84>] do_exit+0x304/0xc60 [ 599.067789] [<ffffffff81025905>] ? native_sched_clock+0x35/0x80 [ 599.073820] [<ffffffff81025959>] ? sched_clock+0x9/0x10 [ 599.079153] [<ffffffff8170a49c>] ? _raw_spin_unlock_irq+0x2c/0x50 [ 599.085358] [<ffffffff8109346c>] do_group_exit+0x4c/0xc0 [ 599.090779] [<ffffffff810a8661>] get_signal_to_deliver+0x2e1/0x960 [ 599.097071] [<ffffffff8101c497>] do_signal+0x57/0x6e0 [ 599.102229] [<ffffffff81714bd1>] ? sysret_signal+0x5/0x4e [ 599.107738] [<ffffffff8101cb7f>] do_notify_resume+0x5f/0xb0 [ 599.113418] [<ffffffff81714e7d>] int_signal+0x12/0x17 [ 599.118576] ---[ end trace 1e4653102e7e7019 ]--- [ 599.123211] Mapped at: [ 599.125577] [<ffffffff8138ed8b>] debug_dma_alloc_coherent+0x2b/0x80 [ 599.131968] [<ffffffffa01ec862>] cxio_create_cq+0xf2/0x1f0 [iw_cxgb3] [ 599.139920] [<ffffffffa01e9c05>] iwch_create_cq+0x105/0x4e0 [iw_cxgb3] [ 599.147895] [<ffffffffa0500584>] create_cq.constprop.14+0x184/0x2e0 [ib_uverbs] [ 599.156649] [<ffffffffa05027fb>] ib_uverbs_create_cq+0x10b/0x140 [ib_uverbs] Fixes: b955150ea784 ('RDMA/cxgb3: When a user QP is marked in error, also mark the CQs in error') Signed-off-by: Honggang Li <honli@redhat.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-24 11:19:46 -04:00
Matan Barak	c16d2750a0	IB/mlx5: Fire the CQ completion handler from tasklet Previously, mlx5_ib_cq_comp was executed from interrupt context. Under heavy load, this could cause the CPU core to be in an interrupt context too long. Instead of executing the handler from the interrupt context we execute it from a much friendly tasklet context. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-18 10:45:49 -04:00
Matan Barak	94c6825e0f	net/mlx5_core: Use tasklet for user-space CQ completion events Previously, we've fired all our completion callbacks straight from our ISR. Some of those callbacks were lightweight (for example, mlx5 Ethernet napi callbacks), but some of them did more work (for example, the user-space RDMA stack uverbs' completion handler). Besides that, doing more than the minimal work in ISR is generally considered wrong, it could even lead to a hard lockup of the system. Since when a lot of completion events are generated by the hardware, the loop over those events could be so long, that we'll get into a hard lockup by the system watchdog. In order to avoid that, add a new way of invoking completion events callbacks. In the interrupt itself, we add the CQs which receive completion event to a per-EQ list and schedule a tasklet. In the tasklet context we loop over all the CQs in the list and invoke the user callback. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-18 10:45:49 -04:00
Christoph Lameter	e3b6d8cf8d	IB/core: Do not require CAP_NET_ADMIN for packet sniffing In the Ethernet/TCP world, CAP_NET_RAW is sufficient to allow a program to listen to all incoming packets on a specific interface, and the higher CAP_NET_ADMIN is required to set the interface into promiscuous mode. We want to emulate that same basic division of privilege in the RDMA stack, so when dealing with Raw Ethernet QPs, allow apps with CAP_NET_RAW to listen to all incoming flows (and direct them as they see fit in their own listen stream). Do not require CAP_NET_ADMIN just to listen to traffic already incoming. Reserve CAP_NET_ADMIN if we attempt to set promiscuous mode. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-18 10:31:58 -04:00
shamir rabinovitch	04ef0f1a01	IB/mlx4: Fix unaligned access in send_reply_to_slave The problem is that the function 'send_reply_to_slave' gets the 'req_sa_mad' as a pointer whose address is only aliged to 4 bytes but is 8 bytes in size. This can result in unaligned access faults on certain architectures. Sowmini Varadhan pointed to this reply from Dave Miller that say that memcpy should not be used to solve alignment issues: https://lkml.org/lkml/2015/10/21/352 Optimization of memcpy to 'ldx' instruction can only happen if the compiler knows that the size of the data we are copying is 8 bytes and it assumes it is aligned to 8 bytes. If the compiler know the type is not aligned to 8 it must not optimize the 8 byte copy. Defining the data type as aligned to 4 forces the compiler to treat all accesses as though they aren't aligned and avoids the 'ldx' optimization. Full credit for the idea goes to Jason Gunthorpe <jgunthorpe@obsidianresearch.com>. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-18 10:25:56 -04:00
Doug Ledford	0651ec932a	Merge branches 'cxgb4-2', 'i40iw-2', 'ipoib', 'misc-4.7' and 'mlx5-fcs' into k.o/for-4.7	2016-05-13 19:40:38 -04:00
Majd Dibbiny	cff5a0f3a3	IB/mlx5: Report Scatter FCS device capability when supported Report Scatter FCS support when the Firmware supports as well. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:28 -04:00
Majd Dibbiny	358e42ea66	IB/mlx5: Add Scatter FCS support for Raw Packet QP Enable Scatter FCS in the RQ context when the user passes Scatter FCS create flag. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:28 -04:00
Majd Dibbiny	b531b90948	IB/core: Add Scatter FCS create flag Raw Packet QPs that were created with Scatter FCS flag, will scatter the FCS into the receive buffers. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:28 -04:00

1 2 3 4 5 ...

589800 Commits