linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-22 17:33:01 +00:00

Author	SHA1	Message	Date
Jubin John	d35cf74492	IB/hfi1: Serialize hrtimer function calls hrtimer functions do not guarantee serialization, so we extend the cca_timer_lock to cover the hrtimer_forward_now() in the hrtimer callback handler and the hrtimer_start() in process_becn(). This prevents races between these 2 functions to update the hrtimer state leading to problems such as: kernel BUG at kernel/hrtimer.c:1282! encountered during validation of the CCA feature. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:29 -04:00
Dean Luick	1cbaa67035	IB/hfi1: Fix MAD port poll for active cables A MAD directive to start polling must go through the normal link tuning and start steps in order to correctly handle active cables. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:29 -04:00
Dean Luick	015e91fbc9	IB/hfi1: Correctly report neighbor link down reason The code to save the link down reason for reporting to the SMA was in a location before the actual reason was read. Move the SMA link down reason assignment to a better location. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:29 -04:00
Dean Luick	feb831ddf2	IB/hfi1: Use the neighbor link down reason only when valid The 8051 uses a link down reason to inform the driver why the link went down. The neighbor planned link down reason code is only valid when a link down idle message is received by the 8051. Enhance the explanation on why the link went down. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:29 -04:00
Dean Luick	f9b5635cbe	IB/hfi1: Ignore link downgrade with 0 lanes Versions of the 8051 firmware < 0.38 may report a link failure as a link downgrade with a width of 0 followed by a link down notification. Ignore the zero width downgrade notification - the driver should follow the link down path. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:29 -04:00
Dean Luick	8f000f7f6e	IB/hfi1: Add RSM rule for user FECN handling Add a receive side mapping rule to extract expected user packets with the FECN bit set and place them in an eager buffer. This will allow user libraries to recognize that a FECN was sent when using header suppression and respond appropriately. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:29 -04:00
Dean Luick	b12349ae13	IB/hfi1: Create a routine to set a receive side mapping rule Move the rule setting code into its own routine for improved searchability and reuse. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Dean Luick	4a818bedf7	IB/hfi1: Move QOS decision logic into its own function The decision to use QOS affects other resource allocation. Move the QOS decision logic into its own function so it can be called by other interested parties. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Dean Luick	372cc85a13	IB/hfi1: Extract RSM map table init from QOS Refactor the allocation, tracking, and writing of the RSM map table into its own set of routines. This will allow the map table to be passed to multiple users to fill in as needed. Start with the original user, QOS. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Jianxin Xiong	44306f15f0	IB/hfi1: Reduce kernel context pio buffer allocation The pio buffers were pooled evenly among all kernel contexts and user contexts. However, the demand from kernel contexts is much lower than user contexts. This patch reduces the allocation for kernel contexts and thus makes more credits available for PSM, helping performance. This is especially useful on high core-count systems where large numbers of contexts are used. A new context type SC_VL15 is added to distinguish the context used for VL15 from other kernel contexts. The reason is that VL15 needs to support 2KB sized packet while other kernel contexts need only support packets up to the size determined by "piothreshold", which has a default value of 256. The new allocation method allows triple buffering of largest pio packets configured for these contexts. This is sufficient to maintain verbs performance. The largest pio packet size is 2048B for VL15 and "piothreshold" for other kernel contexts. A cap is applied to "piothreshold" to avoid excessive buffer allocation. The special case that SDMA is disable is handled differently. In that case, the original pooling allocation is used to better support the much higher pio traffic. Notice that if adaptive pio is disabled (piothreshold==0), the pio buffer size doesn't matter for non-VL15 kernel send contexts when SDMA is enabled because pio is not used at all on these contexts and thus the new allocation is still valid. If SDMA is disabled then pooling allocation is used as mentioned in previous paragraph. Adjustment is also made to the calculation of the credit return threshold for the kernel contexts. Instead of purely based on the MTU size, a percentage based threshold is also considered and the smaller one of the two is chosen. This is necessary to ensure that with the reduced buffer allocation credits are returned in time to avoid unnecessary stall in the send path. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mark Debbage <mark.debbage@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Jubin John	0852d241f4	IB/hfi1: Change default number of user contexts Change the default number of user contexts to the number of real (non-HT) cpu cores in order to reduce the division of hfi1 hardware contexts in the case of high core counts with hyper-threading enabled. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Mike Marciniszyn	b218f786ad	IB/hfi1: Use global defines for upper bits in opcode The awkward coding for setting the allowed_ops field was tripping an smatch warning. This patch uses the more appropriate defines from include/rdma to avoid the issue. As part of the patch remove a mask that was duplicated in rdmavt include files and use that mask as appropriate. Fixes: 8bea6b1cfe6f ("IB/rdmavt: Add create queue pair functionality") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Mike Marciniszyn	87717f0a75	IB/hfi1: Remove unreachable code Remove unreachable code from RC ack handling to fix an smatch error. Fixes: `633d273995` ("staging/rdma/hfi1: use mod_timer when appropriate") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Dean Luick	e4e0e39c8d	IB/hfi1: Fix double QSFP resource acquire on cache refresh The function refresh_qsfp_cache() acquires the i2c chain resource, but one caller already holds the resource. Change the acquire so all calls to refresh_qsfp_cache() are covered by the acquire and remove the acquire within refresh_qsfp_cache(). Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Dean Luick	90315ad86a	IB/hfi1: Guard against concurrent I2C access across all chains The discrete ASIC board design makes the two I2C chains not independent of each other. That is, only one chain can safely be accessed at a time. For discrete ASIC devices, adjust the resource locking so that access to one I2C chain will lock both of the chains. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Easwar Hariharan	623bba2d92	IB/hfi1: Remove module presence check outside pre-LNI checks The pre-LNI SerDes and channel tuning algorithm already checks for module presence assertion for the relevant port types. The extraneous check removed in this patch blocks link up for port types for which the module presence assertion is not relevant. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:28 -04:00
Easwar Hariharan	145dd2b399	IB/hfi1: Always turn on CDRs for low power QSFP modules Clock and data recovery mechanisms (CDRs) in active QSFP modules can be turned on or off to improve the bit error rate observed on the channel. Signal integrity and bit error rate requirements require us to always turn on any CDRs present in low power cables (power dissipation 2.5W or lower). However, we adhere to the platform designer's settings (provided in the platform configuration) for higher power cables (dissipation 3.5W or higher) if the platform designer has determined that the platform requires the CDRs to be turned on (or off) and is capable of supplying and cooling the higher power modules. This patch also introduces the get_qsfp_power_class function to centralize the bit twiddling required to determine the QSFP power class across the code. Reusing this function improves the readability of code that depends on knowing the power class of the cable, such as the active and optical channel tuning algorithm. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Sebastian Sanchez	e38d1e4f50	IB/hfi1: Check P_KEY for all sent packets from user mode Add the P_KEY check for user-context mechanism for both PIO and SDMA. For PIO, the SendCtxtCheckEnable.DisallowKDETHPackets is set by default. When the P_KEY is set, SendCtxtCheckEnable.DisallowKDETHPackets is cleared. For SDMA, a software check was included. This change requires user processes to set the P_KEY before sending any packets, otherwise, the sent packet will fail. The original submission didn't have this check but it's required. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mikto Haralanov <mitko.haralanov@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Sebastian Sanchez	ef699e849c	IB/hfi1: Adjust default MTU to be 10KB Increasing the default MTU size to 10KB improves performance for PSM. Change the default MTU to 10KB but constrain Verbs MTU to 8KB. Also update default MTU module parameter description to be HFI1_DEFAULT_MAX_MTU. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Dean Luick	60d585ad6e	IB/hfi1: Simplify init_qpmap_table() Make init_qpmap_table() easier to understand by simplifying the loop indexing and writing each register when it is "full", removing the need for a follow-on register write. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Dean Luick	de882ff5b8	IB/hfi1: Correctly obtain the full service class The function hdr2sc was using an unshifted mask to obtain the 5th bit of the service class. Correct the issue by using the shifted mask. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Dean Luick	33a9eb5271	IB/hfi1: Fix QOS rule mappings The QOS RSM rule mappings are off by one, referencing a kernel receive context that does not exist. Correctly start the QOS RSM map entries at FIRST_KERNEL_CONTEXT rather than MIN_KERNEL_KCTXTS. Remove the cruft that hid this. Change the QP map table so all traffic not caught by QOS RSM goes to the control context rather than the first QOS context. Correct comments to match the actual code operation and intent. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Dean Luick	35969d9b94	IB/hfi1: Remove invalid QOS check Remove an invalid compare of the number of QOS RSM map table entries against the number of physical receive contexts. The RSM map table has its own size and has no relation to the number of physical receive contexts. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Dean Luick	153d58cd8e	IB/hfi1: Fix QOS num_vl bit width The bit width for num_vls, n, needs to be calculated based on the pow2 rounded up of the number of vls. Otherwise num_vls of 3, 5, 6, and 7 will have misplaced QOS RSM map entries. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Dean Luick	f9c82a0b75	IB/hfi1: Fix i2c resource reservation checks The i2c and qsfp read/write routines should check for the resource reservation of the incoming argument target rather than the implicit target of the hardware HFI. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Dean Luick	4ee1585972	IB/hfi1: Fix sysfs file offset usage Two sysfs files do not pay attention to the file offset when reading data. Fix that. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Jubin John	ea0e4ce3bc	IB/rdmavt,hfi1,qib: Fix memory leak rdi->ports has memory allocated in rvt_alloc_device(), but does not get freed because the hfi1 and qib drivers drivers call ib_dealloc_device() directly instead of going through rdmavt. Add a rvt_dealloc_device() that frees rdi->ports and then calls ib_dealloc_device(). Switch hfi1 and qib drivers to calling rvt_dealloc_device() instead of ib_dealloc_device() directly. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Mitko Haralanov	e88c9271d9	IB/hfi1: Fix buffer cache races which may cause corruption There are two possible causes for node/memory corruption both of which are related to the cache eviction algorithm. One way to cause corruption is due to the asynchronous nature of the MMU invalidation and the locking used when invalidating node. The MMU invalidation routine would temporarily release the RB tree lock to avoid a deadlock. However, this would allow the eviction function to take the lock resulting in the removal of cache nodes. If the node being removed by the eviction code is the same as the node being invalidated, the result is use after free. The same is true in the other direction due to the temporary release of the eviction list lock in the eviction loop. Another corner case exists when dealing with the SDMA buffer cache that could cause memory corruption of kernel memory. The most common way, in which this corruption exhibits itself is a linked list node corruption. In that case, the kernel will complain that a node with poisoned pointers is being removed. The fact that the pointers are already poisoned means that the node has already been removed from the list. To root cause of this corruption was a mishandling of the eviction list maintained by the driver. In order for this to happen four conditions need to be satisfied: 1. A node describing a user buffer already exists in the interval RB tree, 2. The beginning of the current user buffer matches that node but is bigger. This will cause the node to be extended. 3. The amount of cached buffers is close or at the limit of the buffer cache size. 4. The node has dropped close to the end of the eviction list. This will cause the node to be considered for eviction. If all of the above conditions have been satisfied, it is possible for the eviction algorithm to evict the current node, which will free the node without the driver knowing. To solve both issues described above: - the locking around the MMU invalidation loop and cache eviction loop has been improved so locks are not released in the loop body, - a new RB function is introduced which will "atomically" find and remove the matching node from the RB tree, preventing the MMU invalidation loop from touching it, and - the node being extended by the pin_vector_pages() function is removed from the eviction list prior to calling the eviction function. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	f53af85e47	IB/hfi1: Extract and reinsert MMU RB node on lookup The page pinning function, which also maintains the pin cache, behaves one of two ways when an exact buffer match is not found: 1. If no node is not found (a buffer with the same starting address is not found in the cache), a new node is created, the buffer pages are pinned, and the node is inserted into the RB tree, or 2. If a node is found but the buffer in that node is a subset of the new user buffer, the node is extended with the new buffer pages. Both modes of operation require (re-)insertion into the interval RB tree. When the node being inserted is a new node, the operations are pretty simple. However, when the node is already existing and is being extended, special care must be taken. First, we want to guard against an asynchronous attempt to delete the node by the MMU invalidation notifier. The simplest way to do this is to remove the node from the RB tree, preventing the search algorithm from finding it. Second, the node needs to be re-inserted so it lands in the proper place in the tree and the tree is correctly re-balanced. This also requires the node to be removed from the RB tree. This commit adds the hfi1_mmu_rb_extract() function, which will search for a node in the interval RB tree matching an address and length and remove it from the RB tree if found. This allows for both of the above special cases be handled in a single step. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	de79093b28	IB/hfi1: Correctly compute node interval The computation of the interval of an interval RB node was incorrect leading to data corruption due to the RB search algorithm not properly finding the all RB nodes in an MMU invalidation interval. The problem stemmed from the fact that the beginning address of the node's range was being aligned to a page boundary. For certain buffer sizes, this would lead to a end address calculation that was off by 1 page. An important aspect of keeping the RB same is also updating the node's range in the case it's being extended. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	782f6697d2	IB/hfi1: Protect the interval RB tree when cleaning up The current implementation of the clean up function for the interval RB trees has two flaws which may cause problems in cases of concurrent executing of the function and MMU notifier. The flaws were due to the fact that deregistration of the MMU callbacks was done after the tree was emptied and, furthermore, the tree was not being locked. This commit fixes both of these flaws by, first, switch the order of operations, and, second, locking the tree while traversing it to prevent any other operations. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	0ad2d3d05b	IB/hfi1: Fix memory leak in user ExpRcv and SDMA The driver had two memory leaks - one in the user expected receive code and one in SDMA buffer cache. The leak in the expected receive code only showed up when the user/admin had set ulimit sufficiently low and the driver did not have enough room in the cache before hitting the limit of allowed cachable memory. When this condition occurred, the driver returned early signaling userland that it needed to free some buffers to free up room in the cache. The bug was that the driver was not cleaning up allocated memory prior to returning early. The leak in the SDMA buffer cache could occur (even though it never did), when the insertion of a buffer node in the interval RB tree failed. In this case, the driver failed to unpin the pages of the node instead erroneously returning success. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	4787bc5e17	IB/hfi1: Don't remove list entries if they are not in a list The SDMA cache logic maintains an eviction list which is ordered by most recently used user buffers. Upon errors or buffer freeing, the list nodes were unconditionally being deleted. This would lead to list corruption warnings if the nodes were never inserted in the eviction list to begin with. This commit prevents this by checking that the nodes are already part of the eviction list. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mike Marciniszyn	747f4d7a9d	IB/qib, IB/hfi1: Fix up UD loopback use of irq flags The dual lock patch moved locking around and missed an issue with handling irq flags when processing UD loopback packets. This issue was revealed by smatch. Fix for both qib and hfi1 to pass the saved flags to the UD request builder and handle the changes correctly. Fixes: `46a80d62e6` ("IB/qib, staging/rdma/hfi1: add s_hlock for use in post send") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Doug Ledford	e29bff46b9	Merge branch 'k.o/for-4.6-rc' into testing/4.6	2016-04-28 15:16:32 -04:00
Jason Gunthorpe	e6bd18f57a	IB/security: Restrict use of the write() interface The drivers/infiniband stack uses write() as a replacement for bi-directional ioctl(). This is not safe. There are ways to trigger write calls that result in the return structure that is normally written to user space being shunted off to user specified kernel memory instead. For the immediate repair, detect and deny suspicious accesses to the write API. For long term, update the user space libraries and the kernel API to something that doesn't present the same security vulnerabilities (likely a structured ioctl() interface). The impacted uAPI interfaces are generally only available if hardware from drivers/infiniband is installed in the system. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [ Expanded check to all known write() entry points ] Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 12:03:16 -04:00
Dean Luick	7723d8c244	IB/hfi1: Use kernel default llseek for ui device The ui device llseek had a mistake with SEEK_END and did not fully follow seek semantics. Correct all this by using a kernel supplied function for fixed size devices. Cc: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 12:00:39 -04:00
Mitko Haralanov	94158442eb	IB/hfi1: Don't attempt to free resources if initialization failed Attempting to free resources which have not been allocated and initialized properly led to the following kernel backtrace: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1] PGD 852a43067 PUD 85d4a6067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 2831 Comm: osu_bw Tainted: G IO 3.12.18-wfr+ #1 task: ffff88085b15b540 ti: ffff8808588fe000 task.ti: ffff8808588fe000 RIP: 0010:[<ffffffffa09658fe>] [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1] RSP: 0018:ffff8808588ffde0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff880858a31800 RCX: 0000000000000000 RDX: ffff88085d971bc0 RSI: ffff880858a318f8 RDI: ffff880858a318c0 RBP: ffff8808588ffe20 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88087ffd6f40 R11: 0000000001100348 R12: ffff880852900000 R13: ffff880858a318c0 R14: 0000000000000000 R15: ffff88085d971be8 FS: 00007f4674e83740(0000) GS:ffff88087f400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000085c377000 CR4: 00000000001407f0 Stack: ffffffffa0941a71 ffff880858a318f8 ffff88085d971bc0 ffff880858a31800 ffff880852900000 ffff880858a31800 00000000003ffff7 ffff88085d971bc0 ffff8808588ffe60 ffffffffa09663fc ffff8808588ffe60 ffff880858a31800 Call Trace: [<ffffffffa0941a71>] ? find_mmu_handler+0x51/0x70 [hfi1] [<ffffffffa09663fc>] hfi1_user_exp_rcv_free+0x6c/0x120 [hfi1] [<ffffffffa0932809>] hfi1_file_close+0x1a9/0x340 [hfi1] [<ffffffff8116c189>] __fput+0xe9/0x270 [<ffffffff8116c35e>] ____fput+0xe/0x10 [<ffffffff81065707>] task_work_run+0xa7/0xe0 [<ffffffff81002969>] do_notify_resume+0x59/0x80 [<ffffffff814ffc1a>] int_signal+0x12/0x17 This commit re-arranges the context initialization code in a way that would allow for context event flags to be used to determine whether the context has been successfully initialized. In turn, this can be used to skip the resource de-allocation if they were never allocated in the first place. Fixes: `3abb33ac65` ("staging/hfi1: Add TID cache receive init and free funcs") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com. Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 12:00:39 -04:00
Mike Marciniszyn	b9b06cb6fe	IB/hfi1: Fix missing lock/unlock in verbs drain callback The iowait_sdma_drained() callback lacked locking to protect the qp s_flags field. This causes the s_flags to be out of sync on multiple CPUs, potentially corrupting the s_flags. Fixes: `a545f5308b` ("staging/rdma/hfi: fix CQ completion order issue") Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 12:00:39 -04:00
Mitko Haralanov	849e3e9398	IB/hfi1: Prevent unpinning of wrong pages The routine used by the SDMA cache to handle already cached nodes can extend an already existing node. In its error handling code, the routine will unpin pages when not all pages of the buffer extension were pinned. There was a bug in that part of the routine, which would mistakenly unpin pages from the original set rather than the newly pinned pages. This commit fixes that bug by offsetting the page array to the proper place pointing at the beginning of the newly pinned pages. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 12:00:38 -04:00
Mitko Haralanov	de82bdff62	IB/hfi1: Fix deadlock caused by locking with wrong scope The locking around the interval RB tree is designed to prevent access to the tree while it's being modified. The locking in its current form is too overzealous, which is causing a deadlock in certain cases with the following backtrace: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 CPU: 0 PID: 5836 Comm: IMB-MPI1 Tainted: G O 3.12.18-wfr+ #1 0000000000000000 ffff88087f206c50 ffffffff814f1caa ffffffff817b53f0 ffff88087f206cc8 ffffffff814ecd56 0000000000000010 ffff88087f206cd8 ffff88087f206c78 0000000000000000 0000000000000000 0000000000001662 Call Trace: <NMI> [<ffffffff814f1caa>] dump_stack+0x45/0x56 [<ffffffff814ecd56>] panic+0xc2/0x1cb [<ffffffff810d4370>] ? restart_watchdog_hrtimer+0x50/0x50 [<ffffffff810d4432>] watchdog_overflow_callback+0xc2/0xd0 [<ffffffff81109b4e>] __perf_event_overflow+0x8e/0x2b0 [<ffffffff8110a714>] perf_event_overflow+0x14/0x20 [<ffffffff8101c906>] intel_pmu_handle_irq+0x1b6/0x390 [<ffffffff814f927b>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff814f8ad8>] nmi_handle.isra.3+0x88/0x180 [<ffffffff814f8d39>] do_nmi+0x169/0x310 [<ffffffff814f8177>] end_repeat_nmi+0x1e/0x2e [<ffffffff81272600>] ? unmap_single+0x30/0x30 [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40 [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40 [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40 <<EOE>> <IRQ> [<ffffffffa056c4a8>] hfi1_mmu_rb_search+0x38/0x70 [hfi1] [<ffffffffa05919cb>] user_sdma_free_request+0xcb/0x120 [hfi1] [<ffffffffa0593393>] user_sdma_txreq_cb+0x263/0x350 [hfi1] [<ffffffffa057fad7>] ? sdma_txclean+0x27/0x1c0 [hfi1] [<ffffffffa0593130>] ? user_sdma_send_pkts+0x1710/0x1710 [hfi1] [<ffffffffa057fdd6>] sdma_make_progress+0x166/0x480 [hfi1] [<ffffffff810762c9>] ? ttwu_do_wakeup+0x19/0xd0 [<ffffffffa0581c7e>] sdma_engine_interrupt+0x8e/0x100 [hfi1] [<ffffffffa0546bdd>] sdma_interrupt+0x5d/0xa0 [hfi1] [<ffffffff81097e57>] handle_irq_event_percpu+0x47/0x1d0 [<ffffffff81098017>] handle_irq_event+0x37/0x60 [<ffffffff8109aa5f>] handle_edge_irq+0x6f/0x120 [<ffffffff810044af>] handle_irq+0xbf/0x150 [<ffffffff8104c9b7>] ? irq_enter+0x17/0x80 [<ffffffff8150168d>] do_IRQ+0x4d/0xc0 [<ffffffff814f7c6a>] common_interrupt+0x6a/0x6a <EOI> [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0 [<ffffffff814f56c6>] __schedule+0x3b6/0x7e0 [<ffffffff810763a6>] __cond_resched+0x26/0x30 [<ffffffff814f5eda>] _cond_resched+0x3a/0x50 [<ffffffff814f4f82>] down_write+0x12/0x30 [<ffffffffa0591619>] hfi1_release_user_pages+0x69/0x90 [hfi1] [<ffffffffa059173a>] sdma_rb_remove+0x9a/0xc0 [hfi1] [<ffffffffa056c00d>] __mmu_rb_remove.isra.5+0x5d/0x70 [hfi1] [<ffffffffa056c536>] hfi1_mmu_rb_remove+0x56/0x70 [hfi1] [<ffffffffa059427b>] hfi1_user_sdma_process_request+0x74b/0x1160 [hfi1] [<ffffffffa055c763>] hfi1_aio_write+0xc3/0x100 [hfi1] [<ffffffff8116a14c>] do_sync_readv_writev+0x4c/0x80 [<ffffffff8116b58b>] do_readv_writev+0xbb/0x230 [<ffffffff811a9da1>] ? fsnotify+0x241/0x320 [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0 [<ffffffff8116b795>] vfs_writev+0x35/0x60 [<ffffffff8116b8c9>] SyS_writev+0x49/0xc0 [<ffffffff810cd876>] ? __audit_syscall_exit+0x1f6/0x2a0 [<ffffffff814ff992>] system_call_fastpath+0x16/0x1b As evident from the backtrace above, the process was being put to sleep while holding the lock. Limiting the scope of the lock only to the RB tree operation fixes the above error allowing for proper locking and the process being put to sleep when needed. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 12:00:38 -04:00
Mitko Haralanov	f19bd643db	IB/hfi1: Prevent NULL pointer deferences in caching code There is a potential kernel crash when the MMU notifier calls the invalidation routines in the hfi1 pinned page caching code for sdma. The invalidation routine could call the remove callback for the node, which in turn ends up dereferencing the current task_struct to get a pointer to the mm_struct. However, the mm_struct pointer could be NULL resulting in the following backtrace: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1] 15 task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000 RIP: 0010:[<ffffffffa041f75a>] [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1] RSP: 0000:ffff88085c245878 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830 RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0 RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0 R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40 FS: 0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0 Stack: ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0 ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000 0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282 Call Trace: [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1] [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1] [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1] [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70 [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470 [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120 [<ffffffff81141fad>] try_to_unmap+0x4d/0x60 [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0 [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490 [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640 [<ffffffff81121641>] shrink_zone+0x31/0x100 [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0 [<ffffffff811229e3>] kswapd+0x403/0x7e0 [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0 [<ffffffff81068ac0>] kthread+0xc0/0xd0 [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40 [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40 To correct this, the mm_struct passed to us by the MMU notifier is used (which is what should have been done to begin with). This avoids the broken derefences and ensures that the correct mm_struct is used. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 12:00:38 -04:00
Markus Böhme	6d79b6c761	staging/rdma/hfi1: select CRC32 The function parse_platform_config in firmware.c calls crc32_le. Building without CRC32 selected causes a link error: drivers/built-in.o: In function `parse_platform_config': (.text+0x92ffa): undefined reference to `crc32_le' Signed-off-by: Markus Böhme <markus.boehme@mailbox.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-04 13:47:15 -07:00
Linus Torvalds	b8ba452683	Round two of 4.6 merge window patches - A few minor core fixups needed for the next patch series - The IB SRIOV series. This has bounced around for several versions. Of note is the fact that the first patch in this series effects the net core. It was directed to netdev and DaveM for each iteration of the series (three versions total). Dave did not object, but did not respond either. I've taken this as permission to move forward with the series. - The new Intel X722 iWARP driver - A huge set of updates to the Intel hfi1 driver. Of particular interest here is that we have left the driver in staging since it still has an API that people object to. Intel is working on a fix, but getting these patches in now helps keep me sane as the upstream and Intel's trees were over 300 patches apart. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW8HR9AAoJELgmozMOVy/dDYMP+wSBALhIdV/pqVzdLCGfIUbK H5agonm/3b/Oj74W30w2JYqXBFfZC2LGVJy6OwocJ3wK04v/KfZbA9G+QsOuh2hQ Db+tFn1eoltvzrcx3k/a7x6zHGC4YyxyH9OX2B3QfRsNHeE7PG9KGp5dfEs2OH1r WGp3jMLAsHf7o8uKpa0jyTEUEErATaTlG+YoaJ+BGHwurgCNy8ni+wAn+EAFiJ3w iEJhcXB6KY69vkLsrLYuT9xxJn4udFJ3QEk8xdPkpLKsu+6Ue5i/eNQ19VfbpZgR c6fTc8genfIv5S+fis+0P44u1oA7Kl2JT6IZYLi35gJ60ZmxTD+7GruWP3xX/wJ2 zuR3sTj5fjcFWenk087RSIU/EK87ONPD4g9QPdZpf3FtgleTVKk3YDlqwjqf8pgv cO6gQ1BcOBnixJvhjNFiX1c2hvNhb3CkgObly1JBwhcCzZhLkV7BNFPbZuDHAeAx VqzNEUse4hupkgiiuiGgudcJ4fsSxMW37kyfX9QC/qyk6YVuUDbrekcWI+MAKot7 5e5dHqFExpbn1Zgvc8yfvh88H2MUQAgaYwjanWF/qpppOPRd01nTisVQIOJn7s5C arcWzvocpQe0GL2UsvDoWwAABXznL3bnnAoCyTWOES2RhOOcw0Ibw46Jl8FQ8gnl 2IRxQ+ltNEscb2cwi5wE =t2Ko -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "Round two of 4.6 merge window patches. This is a monster pull request. I held off on the hfi1 driver updates (the hfi1 driver is intimately tied to the qib driver and the new rdmavt software library that was created to help both of them) in my first pull request. The hfi1/qib/rdmavt update is probably 90% of this pull request. The hfi1 driver is being left in staging so that it can be fixed up in regards to the API that Al and yourself didn't like. Intel has agreed to do the work, but in the meantime, this clears out 300+ patches in the backlog queue and brings my tree and their tree closer to sync. This also includes about 10 patches to the core and a few to mlx5 to create an infrastructure for configuring SRIOV ports on IB devices. That series includes one patch to the net core that we sent to netdev@ and Dave Miller with each of the three revisions to the series. We didn't get any response to the patch, so we took that as implicit approval. Finally, this series includes Intel's new iWARP driver for their x722 cards. It's not nearly the beast as the hfi1 driver. It also has a linux-next merge issue, but that has been resolved and it now passes just fine. Summary: - A few minor core fixups needed for the next patch series - The IB SRIOV series. This has bounced around for several versions. Of note is the fact that the first patch in this series effects the net core. It was directed to netdev and DaveM for each iteration of the series (three versions total). Dave did not object, but did not respond either. I've taken this as permission to move forward with the series. - The new Intel X722 iWARP driver - A huge set of updates to the Intel hfi1 driver. Of particular interest here is that we have left the driver in staging since it still has an API that people object to. Intel is working on a fix, but getting these patches in now helps keep me sane as the upstream and Intel's trees were over 300 patches apart" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits) IB/ipoib: Allow mcast packets from other VFs IB/mlx5: Implement callbacks for manipulating VFs net/mlx5_core: Implement modify HCA vport command net/mlx5_core: Add VF param when querying vport counter IB/ipoib: Add ndo operations for configuring VFs IB/core: Add interfaces to control VF attributes IB/core: Support accessing SA in virtualized environment IB/core: Add subnet prefix to port info IB/mlx5: Fix decision on using MAD_IFC net/core: Add support for configuring VF GUIDs IB/{core, ulp} Support above 32 possible device capability flags IB/core: Replace setting the zero values in ib_uverbs_ex_query_device net/mlx5_core: Introduce offload arithmetic hardware capabilities net/mlx5_core: Refactor device capability function net/mlx5_core: Fix caching ATOMIC endian mode capability ib_srpt: fix a WARN_ON() message i40iw: Replace the obsolete crypto hash interface with shash IB/hfi1: Add SDMA cache eviction algorithm IB/hfi1: Switch to using the pin query function IB/hfi1: Specify mm when releasing pages ...	2016-03-22 15:48:44 -07:00
Mitko Haralanov	5511d78107	IB/hfi1: Add SDMA cache eviction algorithm This commit adds a cache eviction algorithm for the SDMA user buffer cache. Besides the interval RB tree used for node lookup, the cache nodes are also arranged in a doubly-linked list. When a node is used, it is put at the beginning of the list. Less frequently used nodes naturally move to the tail of the list. When the cache limit is reached, the eviction code starts traversing the linked list in reverse, freeing buffers until enough space has been freed to fit the new user buffer. This guarantees that only the least used cache nodes will be removed from the cache. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 15:55:25 -04:00
Mitko Haralanov	a7922f7ddf	IB/hfi1: Switch to using the pin query function Use the new function to query whether the expected receive user buffer can be pinned successfully. This requires that a new variable be added to the hfi1_filedata structure used to hold the number of pages pinned by the expected receive code. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 15:55:25 -04:00
Mitko Haralanov	bd3a8947de	IB/hfi1: Specify mm when releasing pages This change adds a pointer to the process mm_struct when calling hfi1_release_user_pages(). Previously, the function used the mm_struct of the current process to adjust the number of pinned pages. However, is some cases, namely when unpinning pages due to a MMU notifier call, we want to drop into that code block as it will cause a deadlock (the MMU notifiers take the process' mmap_sem prior to calling the callbacks). By allowing to caller to specify the pointer to the mm_struct, the caller has finer control over that part of hfi1_release_user_pages(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 15:55:25 -04:00
Mitko Haralanov	2c97ce4f3c	IB/hfi1: Add pin query function System administrators can use the locked memory ulimit setting to set the maximum amount of memory a user can lock/pin. However, this setting alone is not enough to guarantee good operation of the hfi1 driver due to the fact that the setting does not have fine enough granularity to account for the limit being used by multiple user processes and caches. Therefore, a better limiting algorithm is needed. This is where the new hfi1_can_pin_pages() function and the cache_size module parameter come in. The function works by looking at the ulimit and cache_size value to compute a cache size. The algorithm examines the ulimit value and, if it is not "unlimited", computes a per-cache limit based on the number of configured user contexts. After that, the lower of the two - cache_size and computed per-cache limit - is used. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 15:55:24 -04:00
Mitko Haralanov	5cd3a88d7f	IB/hfi1: Implement SDMA-side buffer caching Add support for caching of user buffers used for SDMA transfers. This change improves performance by avoiding repeatedly pinning the pages of buffers, which are being re-used by the application. While the cost of the pinning operation has been made heavier by adding the extra code to search the cache tree, re-allocate pages arrays, and future cache evictions, that cost will be amortized against the savings when the same buffer is re-used. It is also worth noting that in most cases, the cost of pinning should be much lower due to the buffer already being in the cache. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 15:55:24 -04:00
Mitko Haralanov	a489876010	IB/hfi1: Adjust last address values for intervals Last address values for intervals in the interval RB tree nodes should be non-inclusive in order to avoid confusing ranges. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 15:55:23 -04:00

1 2 3 4 5 ...

464 Commits