Commit Graph

234476 Commits

Author SHA1 Message Date
Eric Dumazet
d276055c4e net_sched: reduce fifo qdisc size
Because of various alignements [SLUB / qdisc], we use 512 bytes of
memory for one {p|b}fifo qdisc, instead of 256 bytes on 64bit arches and
192 bytes on 32bit ones.

Move the "u32 limit" inside "struct Qdisc" (no impact on other qdiscs)

Change qdisc_alloc(), first trying a regular allocation before an
oversized one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03 11:10:02 -08:00
Patrick McHardy
c53fa1ed92 netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms
Netlink message processing in the kernel is synchronous these days, the
session information can be collected when needed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03 10:55:40 -08:00
David S. Miller
06dc94b1ed ipv4: Fix crash in dst_release when udp_sendmsg route lookup fails.
As reported by Eric:

[11483.697233] IP: [<c12b0638>] dst_release+0x18/0x60
 ...
[11483.697741] Call Trace:
[11483.697764]  [<c12fc9d2>] udp_sendmsg+0x282/0x6e0
[11483.697790]  [<c12a1c01>] ? memcpy_toiovec+0x51/0x70
[11483.697818]  [<c12dbd90>] ? ip_generic_getfrag+0x0/0xb0

The pointer passed to dst_release() is -EINVAL, that's because
we leave an error pointer in the local variable "rt" by accident.

NULL it out to fix the bug.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03 10:38:01 -08:00
Dimitris Michailidis
1558310d49 cxgb{3,4}*: improve Kconfig dependencies
- Remove the dependency of cxgb4 and cxgb4vf on INET.  cxgb3 really
  depends on INET, keep it but add it directly to the driver's Kconfig
  entry.
- Make the iSCSI drivers cxgb3i and cxgb4i available in the SCSI menu
  without requiring any options in the net driver menu to be enabled
  first.  Add needed selects so the iSCSI drivers can build their
  corresponding net drivers.
- Remove CHELSIO_T*_DEPENDS.

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 22:22:51 -08:00
Shmulik Ravid
dc6ed1df5a dcbnl: add support for retrieving peer configuration - cee
This patch adds the support for retrieving the remote or peer DCBX
configuration via dcbnl for embedded DCBX stacks supporting the CEE DCBX
standard.

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 21:58:55 -08:00
Shmulik Ravid
eed84713bc dcbnl: add support for retrieving peer configuration - ieee
These 2 patches add the support for retrieving the remote or peer DCBX
configuration via dcbnl for embedded DCBX stacks. The peer configuration
is part of the DCBX MIB and is useful for debugging and diagnostics of
the overall DCB configuration. The first patch add this support for IEEE
802.1Qaz standard the second patch add the same support for the older
CEE standard. Diff for v2 - the peer-app-info is CEE specific.

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 21:58:54 -08:00
Vlad Dogaru
23b41168fc netdevice: make initial group visible to userspace
INIT_NETDEV_GROUP is needed by userspace, move it outside __KERNEL__
guards.

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 21:55:52 -08:00
David S. Miller
5bfa787fb2 ipv4: ip_route_output_key() is better as an inline.
This avoid a stack frame at zero cost.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 14:56:30 -08:00
David S. Miller
b23dd4fe42 ipv4: Make output route lookup return rtable directly.
Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 14:31:35 -08:00
David S. Miller
452edd598f xfrm: Return dst directly from xfrm_lookup()
Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 13:27:41 -08:00
David S. Miller
3872b28408 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-03-02 11:30:24 -08:00
Herbert Xu
07df5294a7 inet: Replace left-over references to inet->cork
The patch to replace inet->cork with cork left out two spots in
__ip_append_data that can result in bogus packet construction.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 23:00:58 -08:00
Stephen Hemminger
7f6daa635c pfkey: fix warning
If CONFIG_NET_KEY_MIGRATE is not defined the arguments of
pfkey_migrate stub do not match causing warning.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 22:51:52 -08:00
David S. Miller
b42835dbe8 ipv6: Make icmp route lookup code a bit clearer.
The route lookup code in icmpv6_send() is slightly tricky as a result of
having to handle all of the requirements of RFC 4301 host relookups.

Pull the route resolution into a seperate function, so that the error
handling and route reference counting is hopefully easier to see and
contained wholly within this new routine.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 22:07:37 -08:00
David S. Miller
f6d460cf0e ipv4: Make icmp route lookup code a bit clearer.
The route lookup code in icmp_send() is slightly tricky as a result of
having to handle all of the requirements of RFC 4301 host relookups.

Pull the route resolution into a seperate function, so that the error
handling and route reference counting is hopefully easier to see and
contained wholly within this new routine.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 15:49:55 -08:00
David S. Miller
2774c131b1 xfrm: Handle blackhole route creation via afinfo.
That way we don't have to potentially do this in every xfrm_lookup()
caller.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:59:04 -08:00
David S. Miller
69ead7afdf ipv6: Normalize arguments to ip6_dst_blackhole().
Return a dst pointer which is potentitally error encoded.

Don't pass original dst pointer by reference, pass a struct net
instead of a socket, and elide the flow argument since it is
unnecessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:45:33 -08:00
David S. Miller
80c0bc9e37 xfrm: Kill XFRM_LOOKUP_WAIT flag.
This can be determined from the flow flags instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:36:37 -08:00
David S. Miller
a1414715f0 ipv6: Change final dst lookup arg name to "can_sleep"
Since it indicates whether we are invoked from a sleepable
context or not.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:32:04 -08:00
David S. Miller
273447b352 ipv4: Kill can_sleep arg to ip_route_output_flow()
This boolean state is now available in the flow flags.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:27:04 -08:00
David S. Miller
5df65e5567 net: Add FLOWI_FLAG_CAN_SLEEP.
And set is in contexts where the route resolution can sleep.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:22:19 -08:00
David S. Miller
420d44daa7 ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep"
Since that is what the current vague "flags" argument means.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:19:23 -08:00
David S. Miller
abdf7e7239 ipv4: Can final ip_route_connect() arg to boolean "can_sleep".
Since that's what the current vague "flags" thing means.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:15:24 -08:00
David S. Miller
68d0c6d34d ipv6: Consolidate route lookup sequences.
Route lookups follow a general pattern in the ipv6 code wherein
we first find the non-IPSEC route, potentially override the
flow destination address due to ipv6 options settings, and then
finally make an IPSEC search using either xfrm_lookup() or
__xfrm_lookup().

__xfrm_lookup() is used when we want to generate a blackhole route
if the key manager needs to resolve the IPSEC rules (in this case
-EREMOTE is returned and the original 'dst' is left unchanged).

Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
resolution is necessary, we simply fail the lookup completely.

All of these cases are encapsulated into two routines,
ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow.  The latter of which
handles unconnected UDP datagram sockets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 13:19:07 -08:00
Herbert Xu
903ab86d19 udp: Add lockless transmit path
The UDP transmit path has been running under the socket lock
for a long time because of the corking feature.  This means that
transmitting to the same socket in multiple threads does not
scale at all.

However, as most users don't actually use corking, the locking
can be removed in the common case.

This patch creates a lockless fast path where corking is not used.

Please note that this does create a slight inaccuracy in the
enforcement of socket send buffer limits.  In particular, we
may exceed the socket limit by up to (number of CPUs) * (packet
size) because of the way the limit is computed.

As the primary purpose of socket buffers is to indicate congestion,
this should not be a great problem for now.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 12:35:42 -08:00
Herbert Xu
f6b9664f8b udp: Switch to ip_finish_skb
This patch converts UDP to use the new ip_finish_skb API.  This
would then allows us to more easily use ip_make_skb which allows
UDP to run without a socket lock.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 12:35:03 -08:00
Herbert Xu
1c32c5ad6f inet: Add ip_make_skb and ip_finish_skb
This patch adds the helper ip_make_skb which is like ip_append_data
and ip_push_pending_frames all rolled into one, except that it does
not send the skb produced.  The sending part is carried out by
ip_send_skb, which the transport protocol can call after it has
tweaked the skb.

It is meant to be called in cases where corking is not used should
have a one-to-one correspondence to sendmsg.

This patch also adds the helper ip_finish_skb which is meant to
be replace ip_push_pending_frames when corking is required.
Previously the protocol stack would peek at the socket write
queue and add its header to the first packet.  With ip_finish_skb,
the protocol stack can directly operate on the final skb instead,
just like the non-corking case with ip_make_skb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 12:35:03 -08:00
Herbert Xu
1470ddf7f8 inet: Remove explicit write references to sk/inet in ip_append_data
In order to allow simultaneous calls to ip_append_data on the same
socket, it must not modify any shared state in sk or inet (other
than those that are designed to allow that such as atomic counters).

This patch abstracts out write references to sk and inet_sk in
ip_append_data and its friends so that we may use the underlying
code in parallel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 12:35:02 -08:00
Herbert Xu
5a2ef92023 inet: Remove unused sk_sndmsg_* from UFO
UFO doesn't really use the sk_sndmsg_* parameters so touching
them is pointless.  It can't use them anyway since the whole
point of UFO is to use the original pages without copying.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 12:35:02 -08:00
David S. Miller
9836f4080f Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next-2.6 2011-03-01 12:24:04 -08:00
Ben Hutchings
6d84b986b2 sfc: Bump version to 3.1
All features originally planned for version 3.1 (and some that
weren't) have been implemented.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-28 23:57:24 +00:00
Ben Hutchings
5fb6b06d4e sfc: Remove configurable FIFO thresholds for pause frame generation
In Falcon we can configure the fill levels of the RX data FIFO which
trigger the generation of pause frames (if enabled), and we have
module parameters for this.

Siena does not allow the levels to be configured (or, if it does, this
is done by the MC firmware and is not configurable by drivers).

So far as I can tell, the module parameters are not used by our
internal scripts and have not been documented (with the exception of
the short parameter descriptions).  Therefore, remove them and always
initialise Falcon with the default values.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-28 23:57:24 +00:00
Ben Hutchings
119226c563 sfc: Expose TX push and TSO counters through ethtool statistics
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-28 23:57:24 +00:00
Ben Hutchings
0a6f40c66b sfc: Update copyright dates
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-28 23:57:24 +00:00
Ben Hutchings
a461103ba2 sfc: Do not read STAT1.FAULT in efx_mdio_check_mmd()
This field does not exist in all MMDs we want to check, and all
callers allow it to be set (fault_fatal = 0).

Remove the loopback condition, as STAT2.DEVPRST should be valid
regardless of any fault.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-28 23:57:23 +00:00
Ben Hutchings
e5f0fd2780 sfc: Read MC firmware version when requested through ethtool
We currently make no use of siena_nic_data::fw_{version,build} except
to format the firmware version for ethtool_get_drvinfo().  Since we
only read the version at start of day, this information is incorrect
after an MC firmware update.  Remove the cached version information
and read it via MCDI whenever it is requested.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-28 23:57:23 +00:00
Steve Hodgson
a526f140b2 sfc: Reduce size of efx_rx_buffer further by removing data member
Instead calculate the KVA of receive data. It's not like it's a hard sum.

[bwh: Fixed to work with GRO.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-28 23:57:23 +00:00
Steve Hodgson
8ba5366ada sfc: Reduce size of efx_rx_buffer by unionising skb and page
[bwh: Forward-ported to net-next-2.6.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-28 23:57:23 +00:00
Amerigo Wang
e364a3416d bonding: use the correct size for _simple_hash()
Clearly it should be the size of ->ip_dst here.
Although this is harmless, but it still reads odd.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 13:21:28 -08:00
Roopa Prabhu
8da83f8e73 enic: Flush driver cache of registered addr lists during port profile disassociate
During a port profile disassociate all address registrations for the interface
are blown away from the adapter. This patch resets the driver cache of
registered address lists to zero after a port profile disassociate.

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:42:18 -08:00
Ben Dooks
85e6b8c5d8 DM9000: Allow randomised ethernet address
Allow randomised ethernet address if the device does not have a valid
EEPROM or pre-set MAC address.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:42:17 -08:00
stephen hemminger
6f2e154b68 qla3xxx: add missing __iomem annotation
Add necessary annotations about pointer to io memory space
that is checked by sparse.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:42:16 -08:00
stephen hemminger
4ec952b8ab bonding: fix sparse warning
Fix use of zero where NULL expected. And wrap long line.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:39:58 -08:00
Anders Berggren
a693e69897 net: TX timestamps for IPv6 UDP packets
Enabling TX timestamps (SO_TIMESTAMPING) for IPv6 UDP packets, in
the same fashion as for IPv4. Necessary in order for NICs such as
Intel 82580 to timestamp IPv6 packets.

Signed-off-by: Anders Berggren <anders@halon.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:32:11 -08:00
Sergei Shtylyov
eaaa3a7c4d sis900: use pci_dev->revision
This driver uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so it wasn't
converted by commit 44c10138fd (PCI: Change all
drivers to use pci_device->revision).

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:29:34 -08:00
Changli Gao
696ea472e1 llc: avoid skb_clone() if there is only one handler
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:28:50 -08:00
Shmulik Ravid
9850767201 bnx2x: use dcb_setapp to manage negotiated application tlvs
With this patch the bnx2x uses the generic dcbnl application tlv list
instead of implementing its own get-app handler. When the driver is
alerted to a change in the DCB negotiated parameters, it calls
dcb_setapp to update the dcbnl application tlvs list making it available
to user mode applications and registered notifiers.   

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 12:19:55 -08:00
Sergei Shtylyov
ff938e43d3 net: use pci_dev->revision, again
Several more network drivers that read the device's revision ID
from the PCI configuration register were merged after the commit
44c10138fd (PCI: Change all drivers
to use pci_device->revision), so it's time to do another pass of
conversion to using the 'revision' field of 'struct pci_dev'...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 11:57:33 -08:00
David S. Miller
63d8ea7f93 net: Forgot to commit net/core/dev.c part of Jiri's ->rx_handler patch.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-28 10:48:59 -08:00
Pablo Neira Ayuso
8a80c79a77 netfilter: nf_ct_tcp: fix out of sync scenario while in SYN_RECV
This patch fixes the out of sync scenarios while in SYN_RECV state.

Quoting Jozsef, what it happens if we are out of sync if the
following:

> > b. conntrack entry is outdated, new SYN received
> >    - (b1) we ignore it but save the initialization data from it
> >    - (b2) when the reply SYN/ACK receives and it matches the saved data,
> >      we pick up the new connection
This is what it should happen if we are in SYN_RECV state. Initially,
the SYN packet hits b1, thus we save data from it. But the SYN/ACK
packet is considered a retransmission given that we're in SYN_RECV
state. Therefore, we never hit b2 and we don't get in sync. To fix
this, we ignore SYN/ACK if we are in SYN_RECV. If the previous packet
was a SYN, then we enter the ignore case that get us in sync.

This patch helps a lot to conntrackd in stress scenarios (assumming a
client that generates lots of small TCP connections). During the failover,
consider that the new primary has injected one outdated flow in SYN_RECV
state (this is likely to happen if the conntrack event rate is high
because the backup will be a bit delayed from the primary). With the
current code, if the client starts a new fresh connection that matches
the tuple, the SYN packet will be ignored without updating the state
tracking, and the SYN+ACK in reply will blocked as it will not pass
checkings III or IV (since all state tracking in the original direction
is not initialized because of the SYN packet was ignored and the ignore
case that get us in sync is not applied).

I posted a couple of patches before this one. Changli Gao spotted
a simpler way to fix this problem. This patch implements his idea.

Cc: Changli Gao <xiaosuo@gmail.com>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-28 18:02:33 +01:00