11424 Commits

Author SHA1 Message Date
Herbert Xu
e1c096e251 vlan: Add GRO interfaces
This patch adds GRO interfaces for hardware-assisted VLAN reception.
With this in place we're now at parity with LRO as far as the
interface is concerned.  That is, you can now take any LRO driver
and convert it over to GRO.

As the CB memory clashes with GRO's use of CB, I've removed it
entirely by storing dev in skb->dev.  This is OK because VLAN
gets called first thing in netif_receive_skb and skb->dev is
not used in between us calling netif_rx and netif_receive_skb
getting called.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 10:50:09 -08:00
Herbert Xu
96e93eab20 gro: Add internal interfaces for VLAN
Previously GRO's only entry point from the outside is through
napi_gro_receive and napi_gro_frags.  These interfaces are for
device drivers.

This patch rearranges things to provide a new set of interfaces
for VLANs.  These interfaces are for internal use only.  The
VLAN code itself can then provide a set of entry points for
device drivers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 10:49:34 -08:00
Stephen Hemminger
61294e2e27 sch_teql: convert to net_device_ops
Convert this driver to net_device_ops.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 10:45:57 -08:00
Mark McLoughlin
035da16fb5 s390: remove s390_root_dev_*()
Replace s390_root_dev_register() with root_device_register() etc.

[Includes fix from Cornelia Huck]

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:34 -08:00
Stephen Hemminger
148bc4303f wireless: convert wireless ioctl to net_device_ops
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 10:42:24 -08:00
Dan Williams
aa1e6f1a38 dmaengine: kill struct dma_client and supporting infrastructure
All users have been converted to either the general-purpose allocator,
dma_find_channel, or dma_request_channel.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:17 -07:00
Dan Williams
209b84a88f dmaengine: replace dma_async_client_register with dmaengine_get
Now that clients no longer need to be notified of channel arrival
dma_async_client_register can simply increment the dmaengine_ref_count.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:17 -07:00
Dan Williams
f67b459992 net_dma: convert to dma_find_channel
Use the general-purpose channel allocation provided by dmaengine.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:15 -07:00
Dan Williams
2ba05622b8 dmaengine: provide a common 'issue_pending_all' implementation
async_tx and net_dma each have open-coded versions of issue_pending_all,
so provide a common routine in dmaengine.

The implementation needs to walk the global device list, so implement
rcu to allow dma_issue_pending_all to run lockless.  Clients protect
themselves from channel removal events by holding a dmaengine reference.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:14 -07:00
Dan Williams
6f49a57aa5 dmaengine: up-level reference counting to the module level
Simply, if a client wants any dmaengine channel then prevent all dmaengine
modules from being removed.  Once the clients are done re-enable module
removal.

Why?, beyond reducing complication:
1/ Tracking reference counts per-transaction in an efficient manner, as
   is currently done, requires a complicated scheme to avoid cache-line
   bouncing effects.
2/ Per-transaction ref-counting gives the false impression that a
   dma-driver can be gracefully removed ahead of its user (net, md, or
   dma-slave)
3/ None of the in-tree dma-drivers talk to hot pluggable hardware, but
   if such an engine were built one day we still would not need to notify
   clients of remove events.  The driver can simply return NULL to a
   ->prep() request, something that is much easier for a client to handle.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:14 -07:00
Trond Myklebust
69b6ba3712 SUNRPC: Ensure the server closes sockets in a timely fashion
We want to ensure that connected sockets close down the connection when we
set XPT_CLOSE, so that we don't keep it hanging while cleaning up all the
stuff that is keeping a reference to the socket.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:57 -05:00
Jeff Layton
c9233eb7b0 sunrpc: add sv_maxconn field to svc_serv (try #3)
svc_check_conn_limits() attempts to prevent denial of service attacks
by having the service close old connections once it reaches a
threshold. This threshold is based on the number of threads in the
service:

	(serv->sv_nrthreads + 3) * 20

Once we reach this, we drop the oldest connections and a printk pops
to warn the admin that they should increase the number of threads.

Increasing the number of threads isn't an option however for services
like lockd. We don't want to eliminate this check entirely for such
services but we need some way to increase this limit.

This patch adds a sv_maxconn field to the svc_serv struct. When it's
set to 0, we use the current method to calculate the max number of
connections. RPC services can then set this on an as-needed basis.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:47 -05:00
Frederik Schwarzer
025dfdafe7 trivial: fix then -> than typos in comments and documentation
- (better, more, bigger ...) then -> (...) than

Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06 11:28:06 +01:00
Linus Torvalds
15b0669072 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (44 commits)
  qlge: Fix sparse warnings for tx ring indexes.
  qlge: Fix sparse warning regarding rx buffer queues.
  qlge: Fix sparse endian warning in ql_hw_csum_setup().
  qlge: Fix sparse endian warning for inbound packet control block flags.
  qlge: Fix sparse warnings for byte swapping in qlge_ethool.c
  myri10ge: print MAC and serial number on probe failure
  pkt_sched: cls_u32: Fix locking in u32_change()
  iucv: fix cpu hotplug
  af_iucv: Free iucv path/socket in path_pending callback
  af_iucv: avoid left over IUCV connections from failing connects
  af_iucv: New error return codes for connect()
  net/ehea: bitops work on unsigned longs
  Revert "net: Fix for initial link state in 2.6.28"
  tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
  tcp: don't mask EOF and socket errors on nonblocking splice receive
  dccp: Integrate the TFRC library with DCCP
  dccp: Clean up ccid.c after integration of CCID plugins
  dccp: Lockless integration of CCID congestion-control plugins
  qeth: get rid of extra argument after printk to dev_* conversion
  qeth: No large send using EDDP for HiperSockets.
  ...
2009-01-05 18:44:59 -08:00
Linus Torvalds
520c853466 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  inotify: fix type errors in interfaces
  fix breakage in reiserfs_new_inode()
  fix the treatment of jfs special inodes
  vfs: remove duplicate code in get_fs_type()
  add a vfs_fsync helper
  sys_execve and sys_uselib do not call into fsnotify
  zero i_uid/i_gid on inode allocation
  inode->i_op is never NULL
  ntfs: don't NULL i_op
  isofs check for NULL ->i_op in root directory is dead code
  affs: do not zero ->i_op
  kill suid bit only for regular files
  vfs: lseek(fd, 0, SEEK_CUR) race condition
2009-01-05 18:32:06 -08:00
Jarek Poplawski
6f57321422 pkt_sched: cls_u32: Fix locking in u32_change()
New nodes are inserted in u32_change() under rtnl_lock() with wmb(),
so without tcf_tree_lock() like in other classifiers (e.g. cls_fw).
This isn't enough without rmb() on the read side, but on the other
hand adding such barriers doesn't give any savings, so the lock is
added instead.

Reported-by: m0sia <m0sia@plotinka.ru>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:14:19 -08:00
Heiko Carstens
f1d3e4dca3 iucv: fix cpu hotplug
If the iucv module is compiled in/loaded but no user is registered cpu
hot remove doesn't work. Reason for that is that the iucv cpu hotplug
notifier on CPU_DOWN_PREPARE checks if the iucv_buffer_cpumask would
be empty after the corresponding bit would be cleared. However the bit
was never set since iucv wasn't enable. That causes all cpu hot unplug
operations to fail in this scenario.
To fix this use iucv_path_table as an indicator wether iucv is enabled
or not.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:09:02 -08:00
Hendrik Brueckner
65dbd7c277 af_iucv: Free iucv path/socket in path_pending callback
Free iucv path after iucv_path_sever() calls in iucv_callback_connreq()
(path_pending() iucv callback).
If iucv_path_accept() fails, free path and free/kill newly created socket.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:08:23 -08:00
Ursula Braun
18becbc547 af_iucv: avoid left over IUCV connections from failing connects
For certain types of AFIUCV socket connect failures IUCV connections
are left over. Add some cleanup-statements to avoid cluttered IUCV
connections.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:07:46 -08:00
Hendrik Brueckner
55cdea9ed9 af_iucv: New error return codes for connect()
If the iucv_path_connect() call fails then return an error code that
corresponds to the iucv_path_connect() failure condition; instead of
returning -ECONNREFUSED for any failure.

This helps to improve error handling for user space applications
(e.g.  inform the user that the z/VM guest is not authorized to
connect to other guest virtual machines).

The error return codes are based on those described in connect(2).

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 18:07:07 -08:00
David S. Miller
c276e098d3 Revert "net: Fix for initial link state in 2.6.28"
This reverts commit 22604c866889c4b2e12b73cbf1683bda1b72a313.

We can't fix this issue in this way, because we now can try
to take the dev_base_lock rwlock as a writer in software interrupt
context and that is not allowed without major surgery elsewhere.

This initial link state problem needs to be solved in some other
way.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 16:01:51 -08:00
Al Viro
56ff5efad9 zero i_uid/i_gid on inode allocation
... and don't bother in callers.  Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.

i_mode is not worth the effort; it has no common default value.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05 11:54:28 -05:00
David S. Miller
7945cc6464 tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
In splice TCP receive, the SPLICE_F_NONBLOCK flag is used
to compute the "timeo" value.  So checking it again inside
of the main receive loop to trigger -EAGAIN processing is
entirely unnecessary.

Noticed by Jarek P. and Lennert Buytenhek.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 00:59:00 -08:00
Lennert Buytenhek
4f7d54f59b tcp: don't mask EOF and socket errors on nonblocking splice receive
Currently, setting SPLICE_F_NONBLOCK on splice from a TCP socket
results in masking of EOF (RDHUP) and error conditions on the socket
by an -EAGAIN return.  Move the NONBLOCK check in tcp_splice_read()
to be after the EOF and error checks to fix this.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-05 00:00:12 -08:00
Gerrit Renker
129fa44785 dccp: Integrate the TFRC library with DCCP
This patch integrates the TFRC library, which is a dependency of CCID-3 (and
CCID-4), with the new use of CCIDs in the DCCP module.		

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 21:45:33 -08:00
Gerrit Renker
e5fd56ca4e dccp: Clean up ccid.c after integration of CCID plugins
This patch cleans up after integrating the CCID modules and, in addition,

 * moves the if/else cases from ccid_delete() into ccid_hc_{tx,rx}_delete();
 * removes the 'gfp' argument to ccid_new() - since it is always gfp_any().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 21:43:23 -08:00
Gerrit Renker
ddebc973c5 dccp: Lockless integration of CCID congestion-control plugins
Based on Arnaldo's earlier patch, this patch integrates the standardised
CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:

 * enables a faster connection path by eliminating the need to always go 
   through the CCID registration lock;

 * updates the implementation to use only a single array whose size equals
   the number of configured CCIDs instead of the maximum (256);

 * since the CCIDs are now fixed array elements, synchronization is no
   longer needed, simplifying use and implementation.

CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 21:42:53 -08:00
Oliver Hartkopp
6e5c172cf7 can: update can-bcm for hrtimer hardirq callbacks
Since commit ca109491f612aab5c8152207631c0444f63da97f ("hrtimer:
removing all ur callback modes") the hrtimer callbacks are processed
only in hardirq context.

This patch moves some functionality into tasklets to run in softirq
context.

Additionally some duplicated code was removed in bcm_rx_thr_flush()
and an avoidable memcpy was removed from bcm_rx_handler().

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:31:18 -08:00
Roel Kluin
858eb711ba DCB: fix kfree(skb)
Use kfree_skb instead of kfree for struct sk_buff pointers.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:29:21 -08:00
Ilpo Järvinen
914d11647b ipv6: IPV6_PKTINFO relied userspace providing correct length
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:27:31 -08:00
Michael Marineau
22604c8668 net: Fix for initial link state in 2.6.28
From: Michael Marineau <mike@marineau.org>

Commit b47300168e770b60ab96c8924854c3b0eb4260eb "Do not fire linkwatch
events until the device is registered." was made as a workaround for
drivers that call netif_carrier_off before registering the device.
Unfortunately this causes these drivers to incorrectly report their
link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING
flag when the interface is first brought up. This issues was
previously pointed out[1] but was dismissed saying that IFF_RUNNING is
not related to the link status. From my digging IFF_RUNNING, as
reported to userspace, is based on the link state. It is set based on
__LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3],
and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is
not reported to user space so it may well be independent of the link,
I don't know if and when it may get set.)

The end result depends slightly depending on the driver. The the two I
tested were e1000e and b44. With e1000e if the system is booted
without a network cable attached the interface will falsely report
RUNNING when it is brought up causing NetworkManager to attempt to
start it and eventually time out. With b44 when the system is booted
with a network cable attached and brought up with dhcpcd it will time
out the first time.

The attached patch that will still set the operstate variable
correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called
but then return rather than skipping the linkwatch_fire_event call
entirely as the previous fix did. (sorry it isn't inline, I don't have
a patch friendly email client at the moment)

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:18:51 -08:00
Simon Holm Thøgersen
f32f8b72e0 net/rfkill/rfkill.c: fix unused rfkill_led_trigger() warning
commit 4dec9b807be757780ca3611a959ac22c28d292a7 ("rfkill: strip pointless
notifier chain") removed the only user of rfkill_led_trigger() that was not
guarded by #ifdef CONFIG_RFKILL_LEDS. Therefore, move rfkill_led_trigger()
completely inside #ifdef CONFIG_RFKILL_LEDS and avoid the compile time
warning:

net/rfkill/rfkill.c:59: warning: 'rfkill_led_trigger' defined but not used

Signed-off-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 17:11:24 -08:00
Herbert Xu
5d38a079ce gro: Add page frag support
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
in one skb, rather than using the less efficient frag_list.

It also adds a new interface, napi_gro_frags to allow drivers
to inject page frags directly into the stack without allocating
an skb.  This is intended to be the GRO equivalent for LRO's
lro_receive_frags interface.

The existing GSO interface can already handle page frags with
or without an appended frag_list so nothing needs to be changed
there.

The merging itself is rather simple.  We store any new frag entries
after the last existing entry, without checking whether the first
new entry can be merged with the last existing entry.  Making this
check would actually be easy but since no existing driver can
produce contiguous frags anyway it would just be mental masturbation.

If the total number of entries would exceed the capacity of a
single skb, we simply resort to using frag_list as we do now.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:13:40 -08:00
Herbert Xu
b530256d2e gro: Use gso_size to store MSS
In order to allow GRO packets without frag_list at all, we need to
store the MSS in the packet itself.  The obvious place is gso_size.
The only thing to watch out for is if the packet ends up not being
GRO then we need to clear gso_size before pushing the packet into
the stack.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:13:19 -08:00
David S. Miller
14deae4156 ipv6: Fix sporadic sendmsg -EINVAL when sending to multicast groups.
Thanks to excellent diagnosis by Eduard Guzovsky.

The core problem is that on a network with lots of active
multicast traffic, the neighbour cache can fill up.  If
we try to allocate a new route and thus neighbour cache
entry, the bog-standard GC attempt the neighbour layer does
in ineffective because route entries hold a reference
to the existing neighbour entries and GC can only liberate
entries with no references.

IPV4 already has a way to handle this, by doing a route cache
GC in such situations (when neigh attach returns -ENOBUFS).

So simply mimick this on the ipv6 side.

Tested-by: Eduard Guzovsky <eguzovsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:04:39 -08:00
Al Viro
157cf649a7 sanitize audit_fd_pair()
* no allocations
* return void

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:41 -05:00
Al Viro
f3298dc4f2 sanitize audit_socketcall
* don't bother with allocations
* now that it can't fail, make it return void

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:39 -05:00
Alan Cox
ad36b88e2d tty: Fix an ircomm warning and note another bug
Roel Kluin noted that line is unsigned so one test is unneccessary. Also
add a warning for another flaw I noticed while making this change.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:43 -08:00
Kentaro Takeda
be6d3e56a6 introduce new LSM hooks where vfsmount is available.
Add new LSM hooks for path-based checks.  Call them on directory-modifying
operations at the points where we still know the vfsmount involved.

Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:37 -05:00
Paul Moore
6c2e8ac095 netlabel: Update kernel configuration API
Update the NetLabel kernel API to expose the new features added in kernel
releases 2.6.25 and 2.6.28: the static/fallback label functionality and network
address based selectors.

Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-12-31 12:54:11 -05:00
Linus Torvalds
f57fa1d6a6 Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (70 commits)
  fs/nfs/nfs4proc.c: make nfs4_map_errors() static
  rpc: add service field to new upcall
  rpc: add target field to new upcall
  nfsd: support callbacks with gss flavors
  rpc: allow gss callbacks to client
  rpc: pass target name down to rpc level on callbacks
  nfsd: pass client principal name in rsc downcall
  rpc: implement new upcall
  rpc: store pointer to pipe inode in gss upcall message
  rpc: use count of pipe openers to wait for first open
  rpc: track number of users of the gss upcall pipe
  rpc: call release_pipe only on last close
  rpc: add an rpc_pipe_open method
  rpc: minor gss_alloc_msg cleanup
  rpc: factor out warning code from gss_pipe_destroy_msg
  rpc: remove unnecessary assignment
  NFS: remove unused status from encode routines
  NFS: increment number of operations in each encode routine
  NFS: fix comment placement in nfs4xdr.c
  NFS: fix tabs in nfs4xdr.c
  ...
2008-12-30 17:45:45 -08:00
Trond Myklebust
08cc36cbd1 Merge branch 'devel' into next 2008-12-30 16:51:43 -05:00
Herbert Xu
eb4dea5853 net: Fix percpu counters deadlock
When we converted the protocol atomic counters such as the orphan
count and the total socket count deadlocks were introduced due to
the mismatch in BH status of the spots that used the percpu counter
operations.

Based on the diagnosis and patch by Peter Zijlstra, this patch
fixes these issues by disabling BH where we may be in process
context.

Reported-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 23:04:08 -08:00
Rusty Russell
0f23174aa8 cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits: net
In future all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids.  So use that instead of NR_CPUS in iterators
and other comparisons.

This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 22:44:47 -08:00
Li Zefan
68ce9c0e34 cls_cgroup: clean up Kconfig
cls_cgroup can't be compiled as a module, since it's not supported by
cgroup.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 19:40:46 -08:00
Li Zefan
8e8ba85417 cls_cgroup: clean up for cgroup part
- It's better to use container_of() instead of casting cgroup_subsys_state *
  to cgroup_cls_state *.
- Add helper function task_cls_state().
- Rename net_cls_state() to cgrp_cls_state().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 19:40:45 -08:00
Li Zefan
2f068bf871 cls_cgroup: fix an oops when removing a cgroup
When removing a cgroup, an oops was triggered immediately. The cause
is wrong kfree() in cgrp_destroy().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 19:40:44 -08:00
Simon Horman
68888d1053 IPVS: Make "no destination available" message more consistent between schedulers
Acked-by: Graeme Fowler <graeme@graemef.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 18:37:36 -08:00
Eric W. Biederman
8eb7986396 netns: foreach_netdev_safe is insufficient in default_device_exit
During network namespace teardown we either move or delete
all of the network devices associated with a network namespace.
In the case of veth devices deleting one will also delete it's
pair device.  If both devices are in the same network namespace
then for_each_netdev_safe is insufficient as next may point
to the second veth device we have deleted.

To avoid problems I do what we do in __rtnl_kill_links and
restart the scan of the device list, after we have deleted
a device.

Currently dev_change_netnamespace does not appear to suffer from
this problem, but wireless devices are also paired and likely
should be moved between network namespaces together.  So I have
errored on the side of caution and restart the scan of the network
devices in that case as well.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 18:21:48 -08:00
Rusty Russell
91b208c7c1 net: make xfrm_statistics_seq_show use generic snmp_fold_field
No reason to roll our own here.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-29 18:20:06 -08:00