Commit Graph

20022 Commits

Author SHA1 Message Date
stephen hemminger
0c03150e7e bridge: send proper message_age in config BPDU
A bridge topology with three systems:

      +------+  +------+
      | A(2) |--| B(1) |
      +------+  +------+
           \    /
          +------+
          | C(3) |
          +------+

What is supposed to happen:
 * bridge with the lowest ID is elected root (for example: B)
 * C detects that A->C is higher cost path and puts in blocking state

What happens. Bridge with lowest id (B) is elected correctly as
root and things start out fine initially. But then config BPDU
doesn't get transmitted from A -> C. Because of that
the link from A-C is transistioned to the forwarding state.

The root cause of this is that the configuration messages
is generated with bogus message age, and dropped before
sending.

In the standardmessage_age is supposed to be:
  the time since the generation of the Configuration BPDU by
  the Root that instigated the generation of this Configuration BPDU.

Reimplement this by recording the timestamp (age + jiffies) when
recording config information. The old code incorrectly used the time
elapsed on the ageing timer which was incorrect.

See also:
  https://bugzilla.vyatta.com/show_bug.cgi?id=7164

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-22 17:01:12 -07:00
David S. Miller
415b3334a2 icmp: Fix regression in nexthop resolution during replies.
icmp_route_lookup() uses the wrong flow parameters if the reverse
session route lookup isn't used.

So do not commit to the re-decoded flow until we actually make a
final decision to use a real route saved in 'rt2'.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-22 06:22:10 -07:00
Bill Sommerfeld
d9be4f7a6f ipv4: Constrain UFO fragment sizes to multiples of 8 bytes
Because the ip fragment offset field counts 8-byte chunks, ip
fragments other than the last must contain a multiple of 8 bytes of
payload.  ip_ufo_append_data wasn't respecting this constraint and,
depending on the MTU and ip option sizes, could create malformed
non-final fragments.

Google-Bug-Id: 5009328
Signed-off-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 21:31:41 -07:00
Eric Dumazet
87c48fa3b4 ipv6: make fragment identifications less predictable
IPv6 fragment identification generation is way beyond what we use for
IPv4 : It uses a single generator. Its not scalable and allows DOS
attacks.

Now inetpeer is IPv6 aware, we can use it to provide a more secure and
scalable frag ident generator (per destination, instead of system wide)

This patch :
1) defines a new secure_ipv6_id() helper
2) extends inet_getid() to provide 32bit results
3) extends ipv6_select_ident() with a new dest parameter

Reported-by: Fernando Gont <fernando@gont.com.ar>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 21:25:58 -07:00
Eric Dumazet
21efcfa0ff ipv6: unshare inetpeers
We currently cow metrics a bit too soon in IPv6 case : All routes are
tied to a single inetpeer entry.

Change ip6_rt_copy() to get destination address as second argument, so
that we fill rt6i_dst before the dst_copy_metrics() call.

icmp6_dst_alloc() must set rt6i_dst before calling dst_metric_set(), or
else the cow is done while rt6i_dst is still NULL.

If orig route points to readonly metrics, we can share the pointer
instead of performing the memory allocation and copy.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 21:24:25 -07:00
Ben Hutchings
67ae7cf1ee ethtool: Allow zero-length register dumps again
Some drivers (ab)use the ethtool_ops::get_regs operation to expose
only a hardware revision ID.  Commit
a77f5db361 ('ethtool: Allocate register
dump buffer with vmalloc()') had the side-effect of breaking these, as
vmalloc() returns a null pointer for size=0 whereas kmalloc() did not.

For backward-compatibility, allow zero-length dumps again.

Reported-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [2.6.37+]
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 15:25:30 -07:00
Dan Carpenter
1511022c9a skbuff: fix error handling in pskb_copy()
There are two problems:
1) "n" was allocated with alloc_skb() so we should free it with
   kfree_skb() instead of regular kfree().
2) We return the freed pointer instead of NULL.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 14:47:54 -07:00
Jiri Pirko
536d1d4a07 vlan: move vlan_group_[gs]et_device to public header
there are no users outside vlan code

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:58 -07:00
Jiri Pirko
7890a5b9cb vlan: kill ndo_vlan_rx_register
has no users so remove it

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:58 -07:00
Jiri Pirko
ffcf9b7672 vlan: kill vlan_gro_frags and vlan_gro_receive
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:57 -07:00
Jiri Pirko
a4aeb26628 vlan: kill __vlan_hwaccel_rx and vlan_hwaccel_rx
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:56 -07:00
Jiri Pirko
9fea03302a lro: do vlan cleanup
- remove useless vlan parameters and pointers

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:54 -07:00
Jiri Pirko
0f7257281d lro: kill lro_vlan_hwaccel_receive_frags
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:54 -07:00
Jiri Pirko
7756a96e19 lro: kill lro_vlan_hwaccel_receive_skb
no longer used

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:54 -07:00
Jiri Pirko
cec9c13363 vlan: introduce __vlan_find_dev_deep()
Since vlan_group_get_device and vlan_group is not going to be accessible
from device drivers, introduce function which substitutes it.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:53 -07:00
David S. Miller
033b1142f4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/bluetooth/l2cap_core.c
2011-07-21 13:38:42 -07:00
David S. Miller
f5caadbb3d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-07-21 12:39:35 -07:00
Jozsef Kadlecsik
89dc79b787 netfilter: ipset: hash:net,iface fixed to handle overlapping nets behind different interfaces
If overlapping networks with different interfaces was added to
the set, the type did not handle it properly. Example

    ipset create test hash:net,iface
    ipset add test 192.168.0.0/16,eth0
    ipset add test 192.168.0.0/24,eth1

Now, if a packet was sent from 192.168.0.0/24,eth0, the type returned
a match.

In the patch the algorithm is fixed in order to correctly handle
overlapping networks.

Limitation: the same network cannot be stored with more than 64 different
interfaces in a single set.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-07-21 12:06:18 +02:00
Linus Torvalds
e6625fa48e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix file mode calculation
2011-07-19 22:10:28 -07:00
Sage Weil
38be7a79f7 ceph: fix file mode calculation
open(2) must always include one of O_RDONLY, O_WRONLY, or O_RDWR.  No need
for any O_APPEND special case.

Passing O_WRONLY|O_RDWR is undefined according to the man page, but the
Linux VFS interprets this as O_RDWR, so we'll do the same.

This fixes open(2) with flags O_RDWR|O_APPEND, which was incorrectly being
translated to readonly.

Reported-by: Fyodor Ustinov <ufm@ufm.su>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-19 11:25:04 -07:00
Florian Westphal
97d32cf944 netfilter: nfnetlink_queue: batch verdict support
Introduces a new nfnetlink type that applies a given
verdict to all queued packets with an id <= the id in the verdict
message.

If a mark is provided it is applied to all matched packets.

This reduces the number of verdicts that have to be sent.
Applications that make use of this feature need to maintain
a timeout to send a batchverdict periodically to avoid starvation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-07-19 11:46:33 +02:00
Eric Dumazet
5863702a34 netfilter: nfnetlink_queue: assert monotonic packet ids
Packet identifier is currently setup in nfqnl_build_packet_message(),
using one atomic_inc_return().

Problem is that since several cpus might concurrently call
nfqnl_enqueue_packet() for the same queue, we can deliver packets to
consumer in non monotonic way (packet N+1 being delivered after packet
N)

This patch moves the packet id setup from nfqnl_build_packet_message()
to nfqnl_enqueue_packet() to guarantee correct delivery order.

This also removes one atomic operation.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-07-19 11:44:17 +02:00
Eric Dumazet
5c74501f76 ipv4: save cpu cycles from check_leaf()
Compiler is not smart enough to avoid double BSWAP instructions in
ntohl(inet_make_mask(plen)).

Lets cache this value in struct leaf_info, (fill a hole on 64bit arches)

With route cache disabled, this saves ~2% of cpu in udpflood bench on
x86_64 machine.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-18 10:41:18 -07:00
Eric Dumazet
84a797dd0b netfilter: nfnetlink_queue: provide rcu enabled callbacks
nenetlink_queue operations on SMP are not efficent if several queues are
used, because of nfnl_mutex contention when applications give packet
verdict.

Use new call_rcu field in struct nfnl_callback to advertize a callback
that is called under rcu_read_lock instead of nfnl_mutex.

On my 2x4x2 machine, I was able to reach 2.000.000 pps going through
user land returning NF_ACCEPT verdicts without losses, instead of less
than 500.000 pps before patch.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-07-18 16:08:27 +02:00
Eric Dumazet
6b75e3e8d6 netfilter: nfnetlink: add RCU in nfnetlink_rcv_msg()
Goal of this patch is to permit nfnetlink providers not mandate
nfnl_mutex being held while nfnetlink_rcv_msg() calls them.

If struct nfnl_callback contains a non NULL call_rcu(), then
nfnetlink_rcv_msg() will use it instead of call() field, holding
rcu_read_lock instead of nfnl_mutex

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-07-18 16:08:07 +02:00
David S. Miller
d3aaeb38c4 net: Add ->neigh_lookup() operation to dst_ops
In the future dst entries will be neigh-less.  In that environment we
need to have an easy transition point for current users of
dst->neighbour outside of the packet output fast path.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-18 00:40:17 -07:00
David S. Miller
69cce1d140 net: Abstract dst->neighbour accesses behind helpers.
dst_{get,set}_neighbour()

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:35 -07:00
David S. Miller
9cbb7ecbcf ipv6: Get rid of rt6i_nexthop macro.
It just makes it harder to see 1) what the code is doing
and 2) grep for all users of dst{->,.}neighbour

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:35 -07:00
David S. Miller
8f40b161de neigh: Pass neighbour entry to output ops.
This will get us closer to being able to do "neigh stuff"
completely independent of the underlying dst_entry for
protocols (ipv4/ipv6) that wish to do so.

We will also be able to make dst entries neigh-less.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:17 -07:00
David Lamparter
69ecca86da net: vlan, qlcnic: make vlan_find_dev private
there is only one user of vlan_find_dev outside of the actual vlan code:
qlcnic uses it to iterate over some VLANs it knows.

let's just make vlan_find_dev private to the VLAN code and have the
iteration in qlcnic be a bit more direct. (a few rcu dereferences less
too)

Signed-off-by: David Lamparter <equinox@diac24.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Amit Kumar Salecha <amit.salecha@qlogic.com>
Cc: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Cc: linux-driver@qlogic.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 12:33:22 -07:00
David S. Miller
542d4d685f neigh: Kill ndisc_ops->queue_xmit
It is always dev_queue_xmit().

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16 18:30:59 -07:00
David S. Miller
b23b5455b6 neigh: Kill hh_cache->hh_output
It's just taking on one of two possible values, either
neigh_ops->output or dev_queue_xmit().  And this is purely depending
upon whether nud_state has NUD_CONNECTED set or not.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16 17:45:02 -07:00
David S. Miller
47ec132a40 neigh: Kill neigh_ops->hh_output
It's always dev_queue_xmit().

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16 17:39:57 -07:00
David S. Miller
0895b08ade neigh: Simply destroy handling wrt. hh_cache.
Now that hh_cache entries are embedded inside of neighbour
entries, their lifetimes and accesses are now synchronous
to that of the encompassing neighbour object.

Therefore we don't need to hook up the blackhole op to
hh_output on destroy.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16 17:36:53 -07:00
David S. Miller
05e3aa0949 net: Create and use new helper, neigh_output().
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16 17:26:00 -07:00
David S. Miller
a29282972c ipv6: Use calculated 'neigh' instead of re-evaluating dst->neighbour
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16 14:30:47 -07:00
David S. Miller
fec8292d9c ipv4: Use calculated 'neigh' instead of re-evaluating dst->neighbour
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16 14:25:54 -07:00
Ilia Kolomisnky
05e9a2f678 Bluetooth: Fix crash with incoming L2CAP connections
Another regression fix considering incomming l2cap connections with
defer_setup enabled. In situations when incomming connection is
extracted with l2cap_sock_accept, it's bt_sock info will have
'parent' member zerroed, but 'parent' may be used unconditionally
in l2cap_conn_start() and l2cap_security_cfm() when defer_setup
is enabled.

Backtrace:
[<bf02d5ac>] (l2cap_security_cfm+0x0/0x2ac [bluetooth]) from [<bf01f01c>] (hci_event_pac
ket+0xc2c/0x4aa4 [bluetooth])
[<bf01e3f0>] (hci_event_packet+0x0/0x4aa4 [bluetooth]) from [<bf01a844>] (hci_rx_task+0x
cc/0x27c [bluetooth])
[<bf01a778>] (hci_rx_task+0x0/0x27c [bluetooth]) from [<c008eee4>] (tasklet_action+0xa0/
0x15c)
[<c008ee44>] (tasklet_action+0x0/0x15c) from [<c008f38c>] (__do_softirq+0x98/0x130)
 r7:00000101 r6:00000018 r5:00000001 r4:efc46000
[<c008f2f4>] (__do_softirq+0x0/0x130) from [<c008f524>] (do_softirq+0x4c/0x58)
[<c008f4d8>] (do_softirq+0x0/0x58) from [<c008f5e0>] (run_ksoftirqd+0xb0/0x1b4)
 r4:efc46000 r3:00000001
[<c008f530>] (run_ksoftirqd+0x0/0x1b4) from [<c009f2a8>] (kthread+0x84/0x8c)
 r7:00000000 r6:c008f530 r5:efc47fc4 r4:efc41f08
[<c009f224>] (kthread+0x0/0x8c) from [<c008cc84>] (do_exit+0x0/0x5f0)

Signed-off-by: Ilia Kolomisnky <iliak@ti.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16 10:14:44 -07:00
Gustavo F. Padovan
9191e6ad89 Bluetooth: Fix regression in L2CAP connection procedure
Caused by the following commit, partially revert it.

commit 9fa7e4f76f
Author: Gustavo F. Padovan <padovan@profusion.mobi>
Date:   Thu Jun 30 16:11:30 2011 -0300

    Bluetooth: Fix regression with incoming L2CAP connections

    PTS test A2DP/SRC/SRC_SET/TC_SRC_SET_BV_02_I revealed that
    ( probably after the df3c3931e commit ) the l2cap connection
    could not be established in case when the "Auth Complete" HCI
    event does not arive before the initiator send "Configuration
    request", in which case l2cap replies with "Command rejected"
    since the channel is still in BT_CONNECT2 state.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-16 10:14:44 -07:00
David S. Miller
25009a1ae1 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2011-07-15 08:51:13 -07:00
Krishna Kumar
f0c50c7c9a Remove redundant variable/code in __qdisc_run
Remove redundant variable "work".

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-15 08:08:26 -07:00
John W. Linville
95a943c162 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	net/bluetooth/l2cap_core.c
2011-07-15 10:05:24 -04:00
Michał Mirosław
62f2a3a48b net: remove NETIF_F_ALL_TX_OFFLOADS
There is no software fallback implemented for SCTP or FCoE checksumming,
and so it should not be passed on by software devices like bridge or bonding.

For VLAN devices, this is different. First, the driver for underlying device
should be prepared to get offloaded packets even when the feature is disabled
(especially if it advertises it in vlan_features). Second, devices under
VLANs do not get replaced without tearing down the VLAN first.

This fixes a mess I accidentally introduced while converting bonding to
ndo_fix_features.

NETIF_F_SOFT_FEATURES are removed from BOND_VLAN_FEATURES because they
are unused as of commit 712ae51afd.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14 15:18:49 -07:00
Michał Mirosław
b73c43f884 net: sctp: fix checksum marking for outgoing packets
Packets to devices without NETIF_F_SCTP_CSUM (including NETIF_F_NO_CSUM)
should be properly checksummed because the packets can be diverted or
rerouted after construction. This still leaves packets diverted from
NETIF_F_SCTP_CSUM-enabled devices with broken checksums. Fixing this
needs implementing software offload fallback in networking core.

For users of sctp_checksum_disable, skb->ip_summed should be left as
CHECKSUM_NONE and not CHECKSUM_UNNECESSARY as per include/linux/skbuff.h.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14 15:16:31 -07:00
Michał Mirosław
e20e694073 net: remove SK_ROUTE_CAPS from meta ematch
Remove it, as it indirectly exposes netdev features. It's not used in
iproute2 (2.6.38) - is anything else using its interface?

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14 14:45:59 -07:00
Michał Mirosław
974151e611 net: remove /sys/class/net/*/features
The same information and more can be obtained by using ethtool
with ETHTOOL_GFEATURES.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14 14:45:15 -07:00
Michał Mirosław
fec30c3381 net: unexport netdev_fix_features()
It is not used anywhere except net/core/dev.c now.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14 14:44:32 -07:00
Michał Mirosław
1180e7d659 net: cleanup vlan_features setting in register_netdev
vlan_features contains features inherited from underlying device.
NETIF_SOFT_FEATURES are not inherited but belong to the vlan device
itself (ensured in vlan_dev_fix_features()).

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14 14:41:11 -07:00
Michał Mirosław
6c9c1b5456 net: vlan: remove reduntant check in ndo_fix_features callback
Use the fact that ORing with zero is a no-op.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14 14:39:29 -07:00
Chetan Loke
cc9f01b246 af-packet: fix - avoid reading stale data
Currently we flush tp_status and then flush the remainder of the header+payload.
tp_status should be flushed in the end to avoid stale data being read by user-space.

Incorrectly re-ordered barriers in v1.

Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14 08:36:33 -07:00