2714 Commits

Author SHA1 Message Date
Eli Cohen
c3aa9b186b IPoIB: Set dev_id field of net_device
Use the net device's dev_id field to encode the port number of the pci
device.  This can be used to to associate a net device with the pci
device's port. The encoding is: dev_id = port - 1.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-23 13:35:48 -07:00
Linus Torvalds
5f05647dd8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
  bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
  vlan: Calling vlan_hwaccel_do_receive() is always valid.
  tproxy: use the interface primary IP address as a default value for --on-ip
  tproxy: added IPv6 support to the socket match
  cxgb3: function namespace cleanup
  tproxy: added IPv6 support to the TPROXY target
  tproxy: added IPv6 socket lookup function to nf_tproxy_core
  be2net: Changes to use only priority codes allowed by f/w
  tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
  tproxy: added tproxy sockopt interface in the IPV6 layer
  tproxy: added udp6_lib_lookup function
  tproxy: added const specifiers to udp lookup functions
  tproxy: split off ipv6 defragmentation to a separate module
  l2tp: small cleanup
  nf_nat: restrict ICMP translation for embedded header
  can: mcp251x: fix generation of error frames
  can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
  can-raw: add msg_flags to distinguish local traffic
  9p: client code cleanup
  rds: make local functions/variables static
  ...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
2010-10-23 11:47:02 -07:00
David Dillow
bb12588a38 IB/srp: Implement SRP_CRED_REQ and SRP_AER_REQ
This patch adds support for SRP_CRED_REQ to avoid a lockup by targets
that use that mechanism to return credits to the initiator. This
prevents a lockup observed in the field where we would never add the
credits from the SRP_CRED_REQ to our current count, and would therefore
never send another command to the target.

Minimal support for SRP_AER_REQ is also added, as these messages can
also be used to convey additional credits to the initiator.

Based upon extensive debugging and code by Bart Van Assche and a bug
report by Chris Worley.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:19:10 -07:00
Bart Van Assche
dd5e6e38b2 IB/srp: Preparation for transmit ring response allocation
The transmit ring in ib_srp (srp_target.tx_ring) is currently only used
for allocating requests sent by the initiator to the target. This patch
prepares using that ring for allocation of both requests and responses.
Also, this patch differentiates the uses of SRP_SQ_SIZE, increases the
size of the IB send completion queue by one element and reserves one
transmit ring slot for SRP_TSK_MGMT requests.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:19:10 -07:00
Jason Gunthorpe
5715f5d44b IB/qib: Process RDMA WRITE ONLY with IMMEDIATE properly
See table 35 in IBA - the header order for RDMA_WRITE_ONLY_WITH_IMMEDIATE
and SEND_LAST_WITH_IMMEDIATE is different: the RDMA_WRITE_ONLY has
a RETH header before the immediate data, so we need a different code path
to extract the immediate data.

I tested this with a userspace app that does RDMA_WRITE with immediate
on a QLE7140.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:12:15 -07:00
Steve Wise
b955150ea7 RDMA/cxgb3: When a user QP is marked in error, also mark the CQs in error
The flushing of work requests for user QPs is implemented entirely in
the user mode library.  The only kernel interaction is to mark the
user QP object indicating it is in error when the QP exits RTS.  When
the user QP operations are called by the application (eg: post_send,
post_recv), the QP in error bit is checked and if set, the library
flushes the QP.  If, however, the application is not doing IO, but
rather just polling the CQ, it will never get flushed work requests.
This breaks some classes of applications.

This patch adds logic to mark user CQs in error when a QP that is bound
to the CQ is marked in error.  The library poll code can then notice
the CQ is in error and flush all the in error QPs bound to that CQ.

Design:

 - add 1 extra CQE entry to the CQ memory that will be used to indicate
   in error status.
 - return the desired CQ memory size that should be mapped by the library
 - bump the ABI since the create_cq uverbs response changes.
 - detect older libraries and reduce the mmap size accordingly.
   (The ABI bump doesn't break old libraries, since they didn't check
   the ABI field anyway)

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:00:53 -07:00
Steve Wise
da411ba1da RDMA/cxgb4: Use cxgb4 service for packet gl to skb
Remove the local service t4_pktgl_to_skb() and use cxgb4_pktgl_to_skb()
exported by cxgb4.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 21:58:50 -07:00
Steve Wise
de5dd81b49 RDMA/cxgb4: Export T4 TCP MIB
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 21:57:26 -07:00
Linus Torvalds
092e0e7e52 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
  vfs: make no_llseek the default
  vfs: don't use BKL in default_llseek
  llseek: automatically add .llseek fop
  libfs: use generic_file_llseek for simple_attr
  mac80211: disallow seeks in minstrel debug code
  lirc: make chardev nonseekable
  viotape: use noop_llseek
  raw: use explicit llseek file operations
  ibmasmfs: use generic_file_llseek
  spufs: use llseek in all file operations
  arm/omap: use generic_file_llseek in iommu_debug
  lkdtm: use generic_file_llseek in debugfs
  net/wireless: use generic_file_llseek in debugfs
  drm: use noop_llseek
2010-10-22 10:52:56 -07:00
Justin P. Mattock
631dd1a885 Update broken web addresses in the kernel.
The patch below updates broken web addresses in the kernel

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-18 11:03:14 +02:00
Randy Dunlap
e00ce92e0b infiniband: fix mlx4 kconfig dependency warning
Fix kconfig dependency warning to satisfy dependencies:

warning: (MLX4_EN && NETDEVICES && NETDEV_10000 && PCI && INET || MLX4_INFINIBAND && INFINIBAND) selects MLX4_CORE which has unmet direct dependencies (NETDEVICES && NETDEV_10000 && PCI)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16 11:13:25 -07:00
Arnd Bergmann
6038f373a3 llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-10-15 15:53:27 +02:00
Eli Cohen
ff7f5aab35 IB/pack: IBoE UD packet packing support
Add support for packing IBoE packet headers.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>

[ Clean up and fix ib_ud_header_init() a bit.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-14 12:41:29 -07:00
Eli Cohen
3c86aa70bf RDMA/cm: Add RDMA CM support for IBoE devices
Add support for IBoE device binding and IP --> GID resolution.  Path
resolving and multicast joining are implemented within cma.c by
filling in the responses and running callbacks in the CMA work queue.

IP --> GID resolution always yields IPv6 link local addresses; remote
GIDs are derived from the destination MAC address of the remote port.
Multicast GIDs are always mapped to multicast MACs as is done in IPv6.
(IPv4 multicast is enabled by translating IPv4 multicast addresses to
IPv6 multicast as described in
<http://www.mail-archive.com/ipng@sunroof.eng.sun.com/msg02134.html>.)

Some helper functions are added to ib_addr.h.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13 15:46:43 -07:00
Eli Cohen
fac70d5191 IB/mad: IBoE supports only QP1 (no QP0)
Since IBoE is using Ethernet as its link layer, there is no central
management entity so there is need for QP0.  QP1 is still needed since
it handles communications between CM agents.  This patch will skip QP0
and create only QP1 for IBoE ports.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13 09:38:11 -07:00
Eli Cohen
7b4c876961 IPoIB: Skip IBoE ports
IPoIB is IP-over-Infiniband link layer. In the case of IBoE, the link
layer is Ethernet and IP can work directly over Ethernet, so disable
IPoIB for non-IB_LINK_LAYER_INFINIBAND ports.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13 09:38:11 -07:00
Eric Dumazet
29b4433d99 net: percpu net_device refcount
We tried very hard to remove all possible dev_hold()/dev_put() pairs in
network stack, using RCU conversions.

There is still an unavoidable device refcount change for every dst we
create/destroy, and this can slow down some workloads (routers or some
app servers, mmap af_packet)

We can switch to a percpu refcount implementation, now dynamic per_cpu
infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
per device.

On x86, dev_hold(dev) code :

before
        lock    incl 0x280(%ebx)
after:
        movl    0x260(%ebx),%eax
        incl    fs:(%eax)

Stress bench :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE)

Before:

real    1m1.662s
user    0m14.373s
sys     12m55.960s

After:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-12 12:35:25 -07:00
Animesh K Trivedi
26012f0750 RDMA/iwcm: Fix hang in uninterruptible wait on cm_id destroy
A process can get stuck in an uninterruptible wait in the
kernel while destroying a cm_id when iw_cm_connect() fails:

For example, When creation of a PD fails but the user continues with
an attempt to connect to the server without checking the return value,
in iw_cm_connect() a NULL qp is found so the call fails.  However the
IWCM_F_CONNECT_WAIT bit is not cleared.  destroy_cm_id() then waits
forever for IWCM_F_CONNECT_WAIT to be cleared.

The same problem exists on the passive side with the accept call.

Fix this by clearing the bit and waking up any waiters in the
appropriate spots.

Signed-off-by: Animesh Trivedi <atr@zurich.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-11 20:24:04 -07:00
Steve Wise
3160977a6e RDMA/cxgb4: Use simple_read_from_buffer() for debugfs handlers
We can replace our equivalent open-coded version.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-11 20:15:14 -07:00
Steve Wise
8bbac892fb RDMA/cxgb4: Add default_llseek to debugfs files
Incorporate BKL removal changes.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-11 20:14:00 -07:00
Eli Cohen
5a0fd09428 IB/mlx4: Limit size of fast registration WRs
Fix the limit on the size of max fast registration WRs that can be
posted to match hardware capabilities.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-11 14:33:17 -07:00
Joe Perches
0f2f930a67 IB/qib: Remove unnecessary casts of private_data
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-06 14:43:51 -07:00
Maciej Sosnowski
52106bd24c RDMA/nes: Turn carrier off on ifdown
This lets the bonding driver to detect when an interface goes down.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-06 14:42:32 -07:00
Chien Tung
2932772156 RDMA/nes: Report correct port state if interface is down
With commit cd6860eb ("RDMA/nes: Fix hangs on ifdown") we no longer
remove nes interfaces on ifdown.  On nes_query_port(), add an
additional check of the netdev queue and report IB_PORT_DOWN if the
queue is not running.

Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-06 12:59:56 -07:00
Sonny Rao
625fbd3a36 IB/ehca: Fix driver on relocatable kernel
the eHCA driver registers a MR for all of kernel memory, but makes the
assumption that valid memory exists at KERNELBASE.  This assumption
may not be true in the case of a relocatable kernel, so use KERNELBASE
+ PHYSICAL_START to get the true beginning of usable kernel memory.

cc: Joachim Fenkes <fenkes@de.ibm.com>
cc: Christoph Raisch <raisch@de.ibm.com>
cc: Hoan-Ham Hguyen <hnguyen@de.ibm.com>
Signed-off-by: Sonny Rao <sonnyrao@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-06 12:57:07 -07:00
Thomas Gleixner
557d0540b9 IB/umad: Make user_mad semaphore a real one
Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 20:52:21 -07:00
Joe Perches
fc4ec9bd82 RDMA/amso1100: Remove KERN_<level> from pr_<level> use
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 20:51:20 -07:00
Dan Carpenter
4d8d6389b2 RDMA/nes: Remove unneeded variable
Just a small cleanup.  The "passive_state" variable isn't used any
more after commit dae58728dc ("RDMA/nes: Fix double CLOSE event
indication crash")

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 20:50:07 -07:00
Christoph Lameter
fed1db33fe IPoIB: Set pkt_type correctly for multicast packets (fix IGMP breakage)
IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for multicast traffic.  All incoming
packets are set to PACKET_HOST which means that igmp_recv() will
ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily.  In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 11:09:23 -07:00
Steve Wise
40dbf6ee38 RDMA/cxgb4: Fastreg NSMR fixes
- Remove dsgl support - doesn't work in T4.
- Wrap the immediate PBL as needed when building it in the wr.
- Adjust max pbl depth allowed based on ulptx alignment requirements.
- Bump the slots per SQ to 5 to allow up to 128MB fast registers.
- Advertise fastreg support by default.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:53:50 -07:00
Steve Wise
410ade4c26 RDMA/cxgb4: Don't set completion flag for read requests
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:53:49 -07:00
Steve Wise
98ae68b7ee RDMA/cxgb4: Set the default TCP send window to 128KB
This helps with large IO throughput.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:53:48 -07:00
Steve Wise
2f5b48c3ad RDMA/cxgb4: Use a mutex for QP and EP state transitions
Move the connection setup/teardown paths to the workq thread removing
spin lock/irq disable requirements for these paths.  This allows calls
down to the LLD for EP and QP state transition actions to be atomic
with respect to processing CPL messages coming up from the HW.
Namely, calls to rdma_init() and rdma_fini() can now be called with
the mutex held avoiding many race conditions with the abort path.

The QP spinlock is still used but only to manipulate the qp state.  This
allows the fastpaths, poll, post_send, and pos_recv, to run in the
irq context.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:53:48 -07:00
Steve Wise
c6d7b26791 RDMA/cxgb4: Support on-chip SQs
T4 support on-chip SQs to reduce latency.  This patch adds support for
this in iw_cxgb4:

 - Manage ocqp memory like other adapter mem resources.
 - Allocate user mode SQs from ocqp mem if available.
 - Map ocqp mem to user process using write combining.
 - Map PCIE_MA_SYNC reg to user process.

Bump uverbs ABI.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:35 -07:00
Steve Wise
aadc4df308 RDMA/cxgb4: Centralize the wait logic
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:34 -07:00
Steve Wise
9e8d1fa342 RDMA/cxgb4: debugfs files for dumping active stags
Add "stags" debugfs file.  This is useful for examining the TPTE and
PBL entries in adapter memory.  It allows scripts to dump just the
active entries.

Also clean up the "qps" file handlers and shared common code.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:33 -07:00
Steve Wise
05fb962947 RDMA/cxgb4: Log HW lack-of-resource errors
This helps debug cases where HW resources are depleted.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:33 -07:00
Steve Wise
0e42c1f430 RDMA/cxgb4: Handle CPL_RDMA_TERMINATE messages
T4 FW sends up CPL_RDMA_TERMINATE to indicate a peer TERM.  This
triggers the QP moving to TERMINATE state.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:32 -07:00
Steve Wise
6ff0e343b3 RDMA/cxgb4: Ignore TERMINATE CQEs
T4 incorrectly inserts TERM CQEs into the CQ.  Silently ignore them.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:31 -07:00
Steve Wise
7459486133 RDMA/cxgb4: Ignore positive return values from cxgb4_*_send() functions
The cxgb4_*_send() functions return NET_XMIT_ values, which are
positive integers or negative errno values.  So don't treat positive
return values as an error.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:31 -07:00
Steve Wise
13fecb83b4 RDMA/cxgb4: Zero out ISGL padding
The HW design requires zeroing any pad in SGLs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:30 -07:00
Steve Wise
af93fb5dcc RDMA/cxgb4: Don't use null ep ptr
In c4iw_modify_qp() error path, only use qhp->ep if ep is not already set.
Otherwise qhp->ep can be NULL and we crash.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 10:46:29 -07:00
Roland Dreier
183ae74bda RDMA/nes: Fix cast-to-pointer warnings on 32-bit
Fix:

  drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_alloc_fast_reg_page_list':
  drivers/infiniband/hw/nes/nes_verbs.c:477: warning: cast to pointer from integer of different size
  drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_post_send':
  drivers/infiniband/hw/nes/nes_verbs.c:3486: warning: cast to pointer from integer of different size
  drivers/infiniband/hw/nes/nes_verbs.c:3486: warning: cast to pointer from integer of different size

by printing u64 quantities by casting to unsigned long and long and
using %llx, rather than casting to void* and using %p.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 17:51:33 -07:00
Eli Cohen
a3f5adaf49 IB/core: Add link layer property to ports
This patch allows ports to have different link layers:
IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET.  This is required
for adding IBoE (InfiniBand-over-Ethernet, aka RoCE) support.  For
devices that do not provide an implementation for querying the link
layer property of a port, we return a default value based on the
transport: RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND
and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 17:51:10 -07:00
Roland Dreier
c8e081a1bf RDMA/cxgb4: Fix warnings about casts to/from pointers of different sizes
Fix:

  drivers/infiniband/hw/cxgb4/qp.c: In function ‘create_qp’:
  drivers/infiniband/hw/cxgb4/qp.c:147: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_fini’:
  drivers/infiniband/hw/cxgb4/qp.c:988: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_init’:
  drivers/infiniband/hw/cxgb4/qp.c:1063: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/mem.c: In function ‘write_adapter_mem’:
  drivers/infiniband/hw/cxgb4/mem.c:74: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/cq.c: In function ‘destroy_cq’:
  drivers/infiniband/hw/cxgb4/cq.c:58: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/cq.c: In function ‘create_cq’:
  drivers/infiniband/hw/cxgb4/cq.c:135: warning: cast from pointer to integer of different size
  drivers/infiniband/hw/cxgb4/cm.c: In function ‘fw6_msg’:
  drivers/infiniband/hw/cxgb4/cm.c:2326: warning: cast to pointer from integer of different size

by casting pointers to unsigned long instead of u64.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 17:51:04 -07:00
Steve Wise
bec658ff31 RDMA/cxgb3: Turn off RX coalescing for iWARP connections
The HW by default has RX coalescing on.  For iWARP connections, this
causes a 100ms delay in connection establishement due to the ingress
MPA Start message being stalled in HW.  So explicitly turn RX
coalescing off when setting up iWARP connections.

This was causing very bad performance for NP64 gather operations using
Open MPI, due to the way it sets up connections on larger jobs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 09:28:55 -07:00
Joe Perches
ea3f0e6bc5 drivers/infiniband: Remove unnecessary casts of private_data
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-09-23 13:38:28 +02:00
Roland Dreier
17859d07c8 Merge branches 'cxgb3' and 'nes' into for-linus 2010-09-08 14:43:28 -07:00
Faisal Latif
29da03b9d1 RDMA/nes: Fix hang with modified FIN handling on A0 cards
Changing state to CLOSING when FIN is received causes A0 cards to
hang.  Fix this by checking for A0 cards in FIN handling.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-08 14:38:23 -07:00
Faisal Latif
67d7072115 RDMA/nes: Change state to closing after FIN
When the driver receives an AE for FIN received, it closes the
connection without changing the state of the connection in the
hardware to closing.  By changing the state to closing, hardware will
do a normal close sequence.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-08 14:38:04 -07:00