iSER currently has a couple places that set max_sectors in either the host
template or SCSI host, and all of them get it wrong.
This patch instead uses a single assignment that (hopefully) gets it right:
the max_sectors value must be derived from the number of segments in the
FR or FMR structure, but actually be one lower than the page size multiplied
by the number of sectors, as it has to handle the case of non-aligned I/O.
Without this I get trivial to reproduce hangs when running xfstests
(on XFS) over iSER to Linux targets.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Alternatively one could free the skb, OTOH I don't think this test is
useful so just remove it.
Cc: <linux-rdma@vger.kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix for removing a quad hash entry when the
corresponding quad hash entry hasn't been added,
which is the case in loopback connections
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix for checking if the QP associated with a completion
has been destroyed while processing CQ elements.
If that is the case, move the CQ head to the next element
and continue completion processing.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A check is added to validate the requested sge number.
iWARP doesn't support multiple sg elements for
RDMA READ work requests.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix to calculate the SQ size based on the max
frag_count, requested by the application instead
of overwriting it with the max supported frag_count
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
STag index mask is calculated incorrectly, missing
the 14 bits minimum requirement. Add max macro to use
either # of MRs or 14 bits in the mask size calculation.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Invalidation after every WQE write is changed to invalidate
only if required. NOPs are padded so that WQE writes are
aligned to 64B boundary.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Adding sq and rq drain functions, which block until all
previously posted wr-s in the specified queue have completed.
A completion object is signaled to unblock the thread,
when the last cqe for the corresponding queue is processed.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Correct SD calculation by using base address returned from commit FPM.
This alleviates any assumptions on resource ordering and alignment
requirement. Also consolidate SD estimation code into i40iw_est_sd().
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix endian warnings and errors due to u32 stored to u16.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Implement fast register mr, Local invalidate, send with
invalidate and RDMA read with invalidate.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Initialize max enabled vfs to max rdma vfs instead of 0.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Move return code check to immediately after i40iw_hmc_sd_one call
where it is set instead of outside the then statement.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Queue users of virtual channel on a waitqueue until the channel is
clear instead of failing the call when the channel is occupied.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a check for cq_poll_info.error before setting vendor_err
instead of always setting it.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
QP may be freed during Async Event processing.
Add a lock around QP table to prevent it.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
iwqp->allocated_buffer is a self-referencing pointer to iwqp.
Do not set iwqp->allocated_buffer to NULL after freeing it.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make sure cm_node is setup before sending SYN packet and
ORD/IRD negotiation.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Include inline data size as part of SQ size calculation.
RQ size calculation uses only number of SGEs and does not
support 96 byte WQE size.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Change region_length to u64 as a region can be > 4GB.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
hrtimer functions do not guarantee serialization, so we extend the
cca_timer_lock to cover the hrtimer_forward_now() in the hrtimer
callback handler and the hrtimer_start() in process_becn(). This
prevents races between these 2 functions to update the hrtimer state
leading to problems such as:
kernel BUG at kernel/hrtimer.c:1282!
encountered during validation of the CCA feature.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A MAD directive to start polling must go through the normal
link tuning and start steps in order to correctly handle
active cables.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The code to save the link down reason for reporting to the SMA
was in a location before the actual reason was read. Move the
SMA link down reason assignment to a better location.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The 8051 uses a link down reason to inform the driver why the
link went down. The neighbor planned link down reason code is
only valid when a link down idle message is received by the 8051.
Enhance the explanation on why the link went down.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Versions of the 8051 firmware < 0.38 may report a link failure
as a link downgrade with a width of 0 followed by a link down
notification. Ignore the zero width downgrade notification -
the driver should follow the link down path.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a receive side mapping rule to extract expected user packets with
the FECN bit set and place them in an eager buffer. This will allow
user libraries to recognize that a FECN was sent when using header
suppression and respond appropriately.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Move the rule setting code into its own routine for improved
searchability and reuse.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The decision to use QOS affects other resource allocation.
Move the QOS decision logic into its own function so it can
be called by other interested parties.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Refactor the allocation, tracking, and writing of the RSM map table
into its own set of routines. This will allow the map table to be
passed to multiple users to fill in as needed. Start with the original
user, QOS.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The pio buffers were pooled evenly among all kernel contexts and
user contexts. However, the demand from kernel contexts is much
lower than user contexts. This patch reduces the allocation for
kernel contexts and thus makes more credits available for PSM,
helping performance. This is especially useful on high core-count
systems where large numbers of contexts are used.
A new context type SC_VL15 is added to distinguish the context used
for VL15 from other kernel contexts. The reason is that VL15 needs
to support 2KB sized packet while other kernel contexts need only
support packets up to the size determined by "piothreshold", which
has a default value of 256.
The new allocation method allows triple buffering of largest pio
packets configured for these contexts. This is sufficient to maintain
verbs performance. The largest pio packet size is 2048B for VL15
and "piothreshold" for other kernel contexts. A cap is applied to
"piothreshold" to avoid excessive buffer allocation.
The special case that SDMA is disable is handled differently. In
that case, the original pooling allocation is used to better
support the much higher pio traffic.
Notice that if adaptive pio is disabled (piothreshold==0), the pio
buffer size doesn't matter for non-VL15 kernel send contexts when
SDMA is enabled because pio is not used at all on these contexts
and thus the new allocation is still valid. If SDMA is disabled then
pooling allocation is used as mentioned in previous paragraph.
Adjustment is also made to the calculation of the credit return
threshold for the kernel contexts. Instead of purely based on
the MTU size, a percentage based threshold is also considered and
the smaller one of the two is chosen. This is necessary to ensure
that with the reduced buffer allocation credits are returned in
time to avoid unnecessary stall in the send path.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mark Debbage <mark.debbage@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Change the default number of user contexts to the number of real
(non-HT) cpu cores in order to reduce the division of hfi1 hardware
contexts in the case of high core counts with hyper-threading enabled.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The awkward coding for setting the allowed_ops field
was tripping an smatch warning.
This patch uses the more appropriate defines from include/rdma
to avoid the issue.
As part of the patch remove a mask that was duplicated
in rdmavt include files and use that mask as appropriate.
Fixes: 8bea6b1cfe6f ("IB/rdmavt: Add create queue pair functionality")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove unreachable code from RC ack handling to fix an
smatch error.
Fixes: 633d27399514 ("staging/rdma/hfi1: use mod_timer when appropriate")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The function refresh_qsfp_cache() acquires the i2c chain resource,
but one caller already holds the resource. Change the acquire so
all calls to refresh_qsfp_cache() are covered by the acquire and
remove the acquire within refresh_qsfp_cache().
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The discrete ASIC board design makes the two I2C chains not
independent of each other. That is, only one chain can safely
be accessed at a time. For discrete ASIC devices, adjust the
resource locking so that access to one I2C chain will lock both
of the chains.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The pre-LNI SerDes and channel tuning algorithm already checks for
module presence assertion for the relevant port types. The extraneous
check removed in this patch blocks link up for port types for which
the module presence assertion is not relevant.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Clock and data recovery mechanisms (CDRs) in active QSFP modules
can be turned on or off to improve the bit error rate observed on
the channel. Signal integrity and bit error rate requirements require
us to always turn on any CDRs present in low power cables (power
dissipation 2.5W or lower). However, we adhere to the platform
designer's settings (provided in the platform configuration) for
higher power cables (dissipation 3.5W or higher) if the platform
designer has determined that the platform requires the CDRs to be
turned on (or off) and is capable of supplying and cooling the higher
power modules.
This patch also introduces the get_qsfp_power_class function to
centralize the bit twiddling required to determine the QSFP power class
across the code. Reusing this function improves the readability of code
that depends on knowing the power class of the cable, such as the
active and optical channel tuning algorithm.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add the P_KEY check for user-context mechanism for
both PIO and SDMA. For PIO, the
SendCtxtCheckEnable.DisallowKDETHPackets is set by
default. When the P_KEY is set,
SendCtxtCheckEnable.DisallowKDETHPackets is cleared.
For SDMA, a software check was included. This change
requires user processes to set the P_KEY before sending
any packets, otherwise, the sent packet will fail. The
original submission didn't have this check but it's
required.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mikto Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Increasing the default MTU size to 10KB improves performance
for PSM. Change the default MTU to 10KB but constrain
Verbs MTU to 8KB. Also update default MTU module parameter
description to be HFI1_DEFAULT_MAX_MTU.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make init_qpmap_table() easier to understand by simplifying
the loop indexing and writing each register when it is "full",
removing the need for a follow-on register write.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The function hdr2sc was using an unshifted mask to obtain
the 5th bit of the service class. Correct the issue by using
the shifted mask.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The QOS RSM rule mappings are off by one, referencing a kernel receive
context that does not exist.
Correctly start the QOS RSM map entries at FIRST_KERNEL_CONTEXT rather
than MIN_KERNEL_KCTXTS. Remove the cruft that hid this.
Change the QP map table so all traffic not caught by QOS RSM goes to
the control context rather than the first QOS context.
Correct comments to match the actual code operation and intent.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove an invalid compare of the number of QOS RSM map table entries
against the number of physical receive contexts. The RSM map table
has its own size and has no relation to the number of physical receive
contexts.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The bit width for num_vls, n, needs to be calculated based on
the pow2 rounded up of the number of vls. Otherwise num_vls of 3,
5, 6, and 7 will have misplaced QOS RSM map entries.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The i2c and qsfp read/write routines should check for the resource
reservation of the incoming argument target rather than the implicit
target of the hardware HFI.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>