240572 Commits

Author SHA1 Message Date
David S. Miller
436c3b66ec ipv4: Invalidate nexthop cache nh_saddr more correctly.
Any operation that:

1) Brings up an interface
2) Adds an IP address to an interface
3) Deletes an IP address from an interface

can potentially invalidate the nh_saddr value, requiring
it to be recomputed.

Perform the recomputation lazily using a generation ID.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24 17:42:21 -07:00
Randy Dunlap
f7594d4294 net: fix pch_gbe section mismatch warning
Fix section mismatch warning by renaming the pci_driver variable to a
recognized (whitelisted) name.

WARNING: drivers/net/pch_gbe/pch_gbe.o(.data+0x1f8): Section mismatch in reference from the variable pch_gbe_pcidev to the variable .devinit.rodata:pch_gbe_pcidev_id
The variable pch_gbe_pcidev references
the variable __devinitconst pch_gbe_pcidev_id
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24 16:16:02 -07:00
Eric Dumazet
fcd13f42c9 ipv4: fix fib metrics
Alessandro Suardi reported that we could not change route metrics :

ip ro change default .... advmss 1400

This regression came with commit 9c150e82ac50 (Allocate fib metrics
dynamically). fib_metrics is no longer an array, but a pointer to an
array.

Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24 11:49:54 -07:00
Yevgeny Petrilin
6f71d7927c mlx4_en: Removing HW info from ethtool -i report.
Avoiding abuse of ethtool_drvinfo.driver field.
HW specific info can be retrieved using lspci.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24 11:47:51 -07:00
David S. Miller
54a4fe5499 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-03-24 11:43:09 -07:00
Eric Dumazet
ef352e7cdf net_sched: fix THROTTLED/RUNNING race
commit fd245a4adb52 (net_sched: move TCQ_F_THROTTLED flag)
added a race.

qdisc_watchdog() is run from softirq, so special care should be taken or
we can lose one state transition (THROTTLED/RUNNING)

Prior to fd245a4adb52, we were manipulating q->flags (qdisc->flags &=
~TCQ_F_THROTTLED;) and this manipulation could only race with
qdisc_warn_nonwc().

Since we want to avoid atomic ops in qdisc fast path - it was the
meaning of commit 371121057607e (QDISC_STATE_RUNNING dont need atomic
bit ops) - fix is to move THROTTLE bit into 'state' field, this one
being manipulated with SMP and IRQ safe operations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24 00:13:14 -07:00
David S. Miller
8c64b4cc02 Merge branch 'sfc-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-2.6 2011-03-23 15:56:02 -07:00
Julia Lawall
e6937ee626 drivers/net/a2065.c: Convert release_resource to release_region/release_mem_region
Request_mem_region should be used with release_mem_region, not
release_resource.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,E;
@@
*x = request_mem_region(...)
... when != release_mem_region(x)
    when != x = E
* release_resource(x);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 14:10:37 -07:00
Julia Lawall
88e87be6ba drivers/net/ariadne.c: Convert release_resource to release_region/release_mem_region
Request_mem_region should be used with release_mem_region, not
release_resource.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,E;
@@
*x = request_mem_region(...)
... when != release_mem_region(x)
    when != x = E
* release_resource(x);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 14:10:36 -07:00
Jiri Pirko
35d48903e9 bonding: fix rx_handler locking
This prevents possible race between bond_enslave and bond_handle_frame
as reported by Nicolas by moving rx_handler register/unregister.
slave->bond is added to hold pointer to master bonding sructure. That
way dev->master is no longer used in bond_handler_frame.
Also, this removes "BUG: scheduling while atomic" message

Reported-by: Nicolas de Pesloüan <nicolas.2p.debian@gmail.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Tested-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:45:10 -07:00
Stanislaw Gruszka
cda6587c21 myri10ge: fix rmmod crash
Rmmod myri10ge crash at free_netdev() -> netif_napi_del(), because napi
structures are already deallocated. To fix call netif_napi_del() before
kfree() at myri10ge_free_slices().

Cc: stable@kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:31:39 -07:00
Yevgeny Petrilin
61b85bf606 mlx4_en: updated driver version to 1.5.4.1
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:26 -07:00
Yevgeny Petrilin
87a5c3896f mlx4_en: Using blue flame support
Doorbell is used according to usage of BlueFlame.
For Blue Flame to work in Ethernet mode QP number should have 0
at bits 6,7.
Allocating range of QPs accordingly.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:25 -07:00
Eli Cohen
9ace5e0176 mlx4_core: reserve UARs for userspace consumers
Do not allow a kernel consumer to allocate a UAR to serve for blue flame if the
number of available UARs gets below MLX4_NUM_RESERVED_UARS (currently 8). This
will allow userspace apps to open a device file and run things like
ibv_devinfo.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:24 -07:00
Eli Cohen
42d1e017e2 mlx4_core: maintain available field in bitmap allocator
Add mlx4_bitmap_avail() to give the number of available resources. We want to
use this as a hint to whether to allocate a resources or not. This patch is
introduced to be used with allocation blue flame registers.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:24 -07:00
Eli Cohen
c1b43dca13 mlx4: Add blue flame support for kernel consumers
Using blue flame can improve latency by allowing the HW to more efficiently
access the WQE. This patch presents two functions that are used to allocate or
release HW resources for using blue flame; the caller need to supply a struct
mlx4_bf object when allocating resources. Consumers that make use of this API
should post doorbells to the UAR object pointed by the initialized struct
mlx4_bf;

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:23 -07:00
Yevgeny Petrilin
1679200f91 mlx4_en: Enabling new steering
The mlx4_en module now uses the new steering mechanism.
The RX packets are now steered through the MCG table instead
of Mac table for unicast, and default entry for multicast.
The feature is enabled through INIT_HCA

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:22 -07:00
Yevgeny Petrilin
b12d93d63c mlx4: Add support for promiscuous mode in the new steering model.
For Ethernet mode only,
When we want to register QP as promiscuous, it must be added to all the
existing steering entries and also to the default one.
The promiscuous QP might also be on of "real" QPs,
which means we need to monitor every entry to avoid duplicates and ensure
we close an entry when all it has is promiscuous QPs.
Same mechanism both for unicast and multicast.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:22 -07:00
Yevgeny Petrilin
0345584e0b mlx4: generalization of multicast steering.
The same packet steering mechanism would be used both for IB and Ethernet,
Both multicasts and unicasts.
This commit prepares the general infrastructure for this.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:21 -07:00
Yevgeny Petrilin
725c89997e mlx4_en: Reporting HW revision in ethtool -i
HW revision is derived from device ID and rev id.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:20 -07:00
Yevgeny Petrilin
14c07b1358 mlx4: Wake on LAN support
The driver queries the FW for WOL support.
Ethtool get/set_wol is implemented accordingly.
Only magic packets are supported at the time.

Signed-off-by: Igor Yarovinsky <igory@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:19 -07:00
Yevgeny Petrilin
1fb9876e9b mlx4_en: using new mlx4 interrupt scheme
Each RX ring will have its own interrupt vector, and TX rings will share one
(we mostly use polling for TX completions).
The vectors are assigned first time device is opened, and its name includes
the interface name and ring number.

Signed-off-by: Markuze Alex <markuze@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:18 -07:00
Yevgeny Petrilin
0b7ca5a928 mlx4: Changing interrupt scheme
Adding a pool of MSI-X vectors and EQs that can be used explicitly by mlx4_core
customers (mlx4_ib, mlx4_en). The consumers will assign their own names to the
interrupt vectors. Those vectors are not opened at mlx4 device initialization,
opened by demand.
Changed the max number of possible EQs according to the new scheme, no longer relies on
on number of cores.
The new functionality is exposed through mlx4_assign_eq() and mlx4_release_eq().
Customers that do not use the new API will get completion vectors as before.

Signed-off-by: Markuze Alex <markuze@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:18 -07:00
Yevgeny Petrilin
908222655b mlx4_en: bringing link up when registering netdevice
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:17 -07:00
Yevgeny Petrilin
46afd0fb01 mlx4_en: optimize adaptive moderation algorithm for better latency
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:16 -07:00
Yevgeny Petrilin
39f17b44aa mlx4_en: moderation parameters are not reseted.
Instead of reseting the module parameters each ifup or mtu change,
they are being set once at device initialization
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:16 -07:00
Yevgeny Petrilin
b6055006b2 mlx4_en: going out of range of TX rings when reporting stats
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:15 -07:00
Senthil Balasubramanian
d78f4b3e2c ath9k: Fix TX queue stuck issue.
commit 86271e460a66003dc1f4cbfd845adafb790b7587 introduced a
regression that caused mac80211 queues in stopped state.

ath_drain_all_txq is called in driver flush which would reset
the stopped flag and the mac80211 queues were never started
after that. iperf traffic is completely stalled due to this issue.

Restart the mac80211 queues in driver flush only if the txqs were
drained.

Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-23 15:22:05 -04:00
Senthil Balasubramanian
19b9675069 ath9k: Fix kernel panic caused by invalid rate index access.
With the recent tx status optimization in mac80211, we bail out as
and and when invalid rate index is found. So the behavior of resetting
rate idx to -1 and count to 0 has changed for the rate indexes that
were not part of the driver's retry series.

This has resulted in ath9k using incorrect rate table index which
caused the system to panic. Ideally ath9k need to loop only for the
indexes that were part of the retry series and so simply use hw->max_rates
as the loop counter.

Pasted the stack trace of the panic issue for reference.

[  754.093192] BUG: unable to handle kernel paging request at ffff88046a9025b0
[  754.093256] IP: [<ffffffffa02eac49>] ath_tx_status+0x209/0x2f0 [ath9k]
[  754.094888] Call Trace:
[  754.094903]  <IRQ>
[  754.094928]  [<ffffffffa051f883>] ieee80211_tx_status+0x203/0x9e0 [mac80211]
[  754.094975]  [<ffffffffa053e305>] ? __ieee80211_wake_queue+0x125/0x140 [mac80211]
[  754.095017]  [<ffffffffa02e66c9>] ath_tx_complete_buf+0x1b9/0x370 [ath9k]
[  754.095054]  [<ffffffffa02e6fcf>] ath_tx_complete_aggr+0x51f/0xb50 [ath9k]
[  754.095098]  [<ffffffffa05382a3>] ? ieee80211_prepare_and_rx_handle+0x173/0xab0 [mac80211]
[  754.095148]  [<ffffffff81350e62>] ? _raw_spin_unlock_irqrestore+0x32/0x40
[  754.095186]  [<ffffffffa02e9735>] ath_tx_tasklet+0x365/0x4b0 [ath9k]
[  754.095224]  [<ffffffff8107a2a2>] ? clockevents_program_event+0x62/0xa0
[  754.095261]  [<ffffffffa02e2628>] ath9k_tasklet+0x168/0x1c0 [ath9k]
[  754.095298]  [<ffffffff8105599b>] tasklet_action+0x6b/0xe0
[  754.095331]  [<ffffffff81056278>] __do_softirq+0x98/0x120
[  754.095361]  [<ffffffff8100cd5c>] call_softirq+0x1c/0x30
[  754.095393]  [<ffffffff8100efb5>] do_softirq+0x65/0xa0
[  754.095423]  [<ffffffff810563fd>] irq_exit+0x8d/0x90
[  754.095453]  [<ffffffff8100ebc1>] do_IRQ+0x61/0xe0
[  754.095482]  [<ffffffff81351413>] ret_from_intr+0x0/0x15
[  754.095513]  <EOI>
[  754.095531]  [<ffffffff81014375>] ? native_sched_clock+0x15/0x70
[  754.096475]  [<ffffffffa02bcfa6>] ? acpi_idle_enter_bm+0x24d/0x285 [processor]
[  754.096475]  [<ffffffffa02bcf9f>] ? acpi_idle_enter_bm+0x246/0x285 [processor]
[  754.096475]  [<ffffffff8127fab2>] cpuidle_idle_call+0x82/0x100
[  754.096475]  [<ffffffff8100a236>] cpu_idle+0xa6/0xf0
[  754.096475]  [<ffffffff81339bc1>] rest_init+0x91/0xa0
[  754.096475]  [<ffffffff814efccd>] start_kernel+0x3fd/0x408
[  754.096475]  [<ffffffff814ef347>] x86_64_start_reservations+0x132/0x136
[  754.096475]  [<ffffffff814ef451>] x86_64_start_kernel+0x106/0x115
[  754.096475] RIP  [<ffffffffa02eac49>] ath_tx_status+0x209/0x2f0 [ath9k]

Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-23 15:22:04 -04:00
armadefuego@gmail.com
a3ad38e87e orinoco: Clear dangling pointer on hardware busy
On hardware busy the scan request pointer should be cleared, as higher
levels will release. This avoids a crash when that pointer is
erroneously used later.

Signed-off-by: Joseph J. Gunn <armadefuego@yahoo.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-23 15:22:04 -04:00
Johannes Berg
be36cacddd iwlagn: fix error in command waiting
Clearly a mistake, since pointers won't suddenly
change their value...

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-23 15:22:04 -04:00
Eric Dumazet
eb49a97363 ipv4: fix ip_rt_update_pmtu()
commit 2c8cec5c10bc (Cache learned PMTU information in inetpeer) added
an extra inet_putpeer() call in ip_rt_update_pmtu().

This results in various problems, since we can free one inetpeer, while
it is still in use.

Ref: http://www.spinics.net/lists/netdev/msg159121.html

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:18:15 -07:00
David S. Miller
406b6f974d ipv4: Fallback to FIB local table in __ip_dev_find().
In commit 9435eb1cf0b76b323019cebf8d16762a50a12a19
("ipv4: Implement __ip_dev_find using new interface address hash.")
we reimplemented __ip_dev_find() so that it doesn't have to
do a full FIB table lookup.

Instead, it consults a hash table of addresses configured to
interfaces.

This works identically to the old code in all except one case,
and that is for loopback subnets.

The old code would match the loopback device for any IP address
that falls within a subnet configured to the loopback device.

Handle this corner case by doing the FIB lookup.

We could implement this via inet_addr_onlink() but:

1) Someone could configure many addresses to loopback and
   inet_addr_onlink() is a simple list traversal.

2) We know the old code works.

Reported-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:16:15 -07:00
David S. Miller
f6152737a9 tcp: Make undo_ssthresh arg to tcp_undo_cwr() a bool.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:37:11 -07:00
Yuchung Cheng
67d4120a17 tcp: avoid cwnd moderation in undo
In the current undo logic, cwnd is moderated after it was restored
to the value prior entering fast-recovery. It was moderated first
in tcp_try_undo_recovery then again in tcp_complete_cwr.

Since the undo indicates recovery was false, these moderations
are not necessary. If the undo is triggered when most of the
outstanding data have been acknowledged, the (restored) cwnd is
falsely pulled down to a small value.

This patch removes these cwnd moderations if cwnd is undone
  a) during fast-recovery
	b) by receiving DSACKs past fast-recovery

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:36:08 -07:00
Linus Lüssing
a7bff75b08 bridge: Fix possibly wrong MLD queries' ethernet source address
The ipv6_dev_get_saddr() is currently called with an uninitialized
destination address. Although in tests it usually seemed to nevertheless
always fetch the right source address, there seems to be a possible race
condition.

Therefore this commit changes this, first setting the destination
address and only after that fetching the source address.

Reported-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:26:42 -07:00
Sriram
6a1fef6d00 net: davinci_emac:Fix translation logic for buffer descriptor
With recent changes to the driver(switch to new cpdma layer),
the support for buffer descriptor address translation logic
is broken. This affects platforms where the physical address of
the descriptors as seen by the DMA engine is different from the
physical address.

Original Patch adding translation logic support:
Commit: ad021ae8862209864dc8ebd3b7d3a55ce84b9ea2

Signed-off-by: Sriramakrishnan A G <srk@ti.com>
Tested-By: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:25:05 -07:00
Florian Westphal
9c7a4f9ce6 ipv6: ip6_route_output does not modify sk parameter, so make it const
This avoids explicit cast to avoid 'discards qualifiers'
compiler warning in a netfilter patch that i've been working on.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:17:36 -07:00
Steve Hodgson
d88d6b05fe sfc: Siena: Disable write-combining when SR-IOV is enabled
If SR-IOV is enabled by firmware, even if it is not enabled in the PCI
capability, TX pushes using write-combining may be corrupted.

We want to know whether it is enabled before mapping the NIC
registers, and even if PCI extended capabilities are not accessible.
Therefore, we look for the MSI capability, which is removed if SR-IOV
is enabled.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-03-23 01:35:15 +00:00
David S. Miller
db138908cc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-03-22 14:36:18 -07:00
Julian Anastasov
04024b937a ipv4: optimize route adding on secondary promotion
Optimize the calling of fib_add_ifaddr for all
secondary addresses after the promoted one to start from
their place, not from the new place of the promoted
secondary. It will save some CPU cycles because we
are sure the promoted secondary was first for the subnet
and all next secondaries do not change their place.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:06:33 -07:00
Julian Anastasov
2d230e2b2c ipv4: remove the routes on secondary promotion
The secondary address promotion relies on fib_sync_down_addr
to remove all routes created for the secondary addresses when
the old primary address is deleted. It does not happen for cases
when the primary address is also in another subnet. Fix that
by deleting local and broadcast routes for all secondaries while
they are on device list and by faking that all addresses from
this subnet are to be deleted. It relies on fib_del_ifaddr being
able to ignore the IPs from the concerned subnet while checking
for duplication.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:06:33 -07:00
Julian Anastasov
e6abbaa272 ipv4: fix route deletion for IPs on many subnets
Alex Sidorenko reported for problems with local
routes left after IP addresses are deleted. It happens
when same IPs are used in more than one subnet for the
device.

	Fix fib_del_ifaddr to restrict the checks for duplicate
local and broadcast addresses only to the IFAs that use
our primary IFA or another primary IFA with same address.
And we expect the prefsrc to be matched when the routes
are deleted because it is possible they to differ only by
prefsrc. This patch prevents local and broadcast routes
to be leaked until their primary IP is deleted finally
from the box.

	As the secondary address promotion needs to delete
the routes for all secondaries that used the old primary IFA,
add option to ignore these secondaries from the checks and
to assume they are already deleted, so that we can safely
delete the route while these IFAs are still on the device list.

Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:06:32 -07:00
Julian Anastasov
74cb3c108b ipv4: match prefsrc when deleting routes
fib_table_delete forgets to match the routes by prefsrc.
Callers can specify known IP in fc_prefsrc and we should remove
the exact route. This is needed for cases when same local or
broadcast addresses are used in different subnets and the
routes differ only in prefsrc. All callers that do not provide
fc_prefsrc will ignore the route prefsrc as before and will
delete the first occurence. That is how the ip route del default
magic works.

	Current callers are:

- ip_rt_ioctl where rtentry_to_fib_config provides fc_prefsrc only
when the provided device name matches IP label with colon.

- inet_rtm_delroute where RTA_PREFSRC is optional too

- fib_magic which deals with routes when deleting addresses
and where the fc_prefsrc is always set with the primary IP
for the concerned IFA.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:06:31 -07:00
Marc Zyngier
3c0f3c605b NET: smsc95xx: don't use stack for async writes to the device
The set_multicast operation performs asynchronous writes to the
device, with some addresses pointing to the stack. Bad things may
happen, and this is trapped CONFIG_DMA_API_DEBUG:

[    5.237762] WARNING: at /build/buildd/linux-linaro-omap-2.6.38/lib/dma-debug.c:867 check_for_stack+0xd4/0x100()
[    5.237792] ehci-omap ehci-omap.0: DMA-API: device driver maps memory fromstack [addr=d9c77dec]
[    5.237792] Modules linked in: smsc95xx(+) usbnet twl6030_usb twl4030_pwrbutton leds_gpio omap_wdt omap2_mcspi
[    5.237854] [<c006d618>] (unwind_backtrace+0x0/0xf8) from [<c00a6a14>] (warn_slowpath_common+0x54/0x64)
[    5.237884] [<c00a6a14>] (warn_slowpath_common+0x54/0x64) from [<c00a6ab8>] (warn_slowpath_fmt+0x30/0x40)
[    5.237915] [<c00a6ab8>] (warn_slowpath_fmt+0x30/0x40) from [<c034e9d8>] (check_for_stack+0xd4/0x100)
[    5.237915] [<c034e9d8>] (check_for_stack+0xd4/0x100) from [<c034fea8>] (debug_dma_map_page+0xb4/0xdc)
[    5.237976] [<c034fea8>] (debug_dma_map_page+0xb4/0xdc) from [<c04242f0>] (map_urb_for_dma+0x26c/0x304)
[    5.237976] [<c04242f0>] (map_urb_for_dma+0x26c/0x304) from [<c0424594>] (usb_hcd_submit_urb+0x78/0x19c)
[    5.238037] [<c0424594>] (usb_hcd_submit_urb+0x78/0x19c) from [<bf049c5c>] (smsc95xx_write_reg_async+0xb4/0x130 [smsc95xx])
[    5.238067] [<bf049c5c>] (smsc95xx_write_reg_async+0xb4/0x130 [smsc95xx]) from [<bf049dd4>] (smsc95xx_set_multicast+0xfc/0x148 [smsc95xx])
[    5.238098] [<bf049dd4>] (smsc95xx_set_multicast+0xfc/0x148 [smsc95xx]) from [<bf04a118>] (smsc95xx_reset+0x2f8/0x68c [smsc95xx])
[    5.238128] [<bf04a118>] (smsc95xx_reset+0x2f8/0x68c [smsc95xx]) from [<bf04a8cc>] (smsc95xx_bind+0xcc/0x188 [smsc95xx])
[    5.238159] [<bf04a8cc>] (smsc95xx_bind+0xcc/0x188 [smsc95xx]) from [<bf03ef1c>] (usbnet_probe+0x204/0x4c4 [usbnet])
[    5.238220] [<bf03ef1c>] (usbnet_probe+0x204/0x4c4 [usbnet]) from [<c0429078>] (usb_probe_interface+0xe4/0x1c4)
[    5.238250] [<c0429078>] (usb_probe_interface+0xe4/0x1c4) from [<c03a8770>] (really_probe+0x64/0x160)
[    5.238250] [<c03a8770>] (really_probe+0x64/0x160) from [<c03a8a30>] (driver_probe_device+0x48/0x60)
[    5.238281] [<c03a8a30>] (driver_probe_device+0x48/0x60) from [<c03a8ad4>] (__driver_attach+0x8c/0x90)
[    5.238311] [<c03a8ad4>] (__driver_attach+0x8c/0x90) from [<c03a7b24>] (bus_for_each_dev+0x50/0x7c)
[    5.238311] [<c03a7b24>] (bus_for_each_dev+0x50/0x7c) from [<c03a82ec>] (bus_add_driver+0x190/0x250)
[    5.238311] [<c03a82ec>] (bus_add_driver+0x190/0x250) from [<c03a8cf8>] (driver_register+0x78/0x13c)
[    5.238433] [<c03a8cf8>] (driver_register+0x78/0x13c) from [<c0428040>] (usb_register_driver+0x78/0x13c)
[    5.238464] [<c0428040>] (usb_register_driver+0x78/0x13c) from [<c005b680>] (do_one_initcall+0x34/0x188)
[    5.238494] [<c005b680>] (do_one_initcall+0x34/0x188) from [<c00e11f0>] (sys_init_module+0xb0/0x1c0)
[    5.238525] [<c00e11f0>] (sys_init_module+0xb0/0x1c0) from [<c0065c40>] (ret_fast_syscall+0x0/0x30)

Move the two offenders to the private structure which is kmalloc-ed,
and thus safe.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:02:18 -07:00
Michał Mirosław
27660515a2 net: implement dev_disable_lro() hw_features compatibility
Implement compatibility with new hw_features for dev_disable_lro().
This is a transition path - dev_disable_lro() should be later
integrated into netdev_fix_features() after all drivers are converted.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:00:26 -07:00
Simon Horman
736561a01f IPVS: Use global mutex in ip_vs_app.c
As part of the work to make IPVS network namespace aware
__ip_vs_app_mutex was replaced by a per-namespace lock,
ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes.

Unfortunately this implementation results in ipvs->app_key residing
in non-static storage which at the very least causes a lockdep warning.

This patch takes the rather heavy-handed approach of reinstating
__ip_vs_app_mutex which will cover access to the ipvs->list_head
of all network namespaces.

[   12.610000] IPVS: Creating netns size=2456 id=0
[   12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[   12.640000] BUG: key ffff880003bbf1a0 not in .data!
[   12.640000] ------------[ cut here ]------------
[   12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570()
[   12.640000] Hardware name: Bochs
[   12.640000] Pid: 1, comm: swapper Tainted: G        W 2.6.38-kexec-06330-g69b7efe-dirty #122
[   12.650000] Call Trace:
[   12.650000]  [<ffffffff8102e685>] warn_slowpath_common+0x75/0xb0
[   12.650000]  [<ffffffff8102e6d5>] warn_slowpath_null+0x15/0x20
[   12.650000]  [<ffffffff8105967b>] lockdep_init_map+0x37b/0x570
[   12.650000]  [<ffffffff8105829d>] ? trace_hardirqs_on+0xd/0x10
[   12.650000]  [<ffffffff81055ad8>] debug_mutex_init+0x38/0x50
[   12.650000]  [<ffffffff8104bc4c>] __mutex_init+0x5c/0x70
[   12.650000]  [<ffffffff81685ee7>] __ip_vs_app_init+0x64/0x86
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.660000]  [<ffffffff811b1c33>] T.620+0x43/0x170
[   12.660000]  [<ffffffff811b1e9a>] ? register_pernet_subsys+0x1a/0x40
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.660000]  [<ffffffff811b1db7>] register_pernet_operations+0x57/0xb0
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.670000]  [<ffffffff811b1ea9>] register_pernet_subsys+0x29/0x40
[   12.670000]  [<ffffffff81685f19>] ip_vs_app_init+0x10/0x12
[   12.670000]  [<ffffffff81685a87>] ip_vs_init+0x4c/0xff
[   12.670000]  [<ffffffff8166562c>] do_one_initcall+0x7a/0x12e
[   12.670000]  [<ffffffff8166583e>] kernel_init+0x13e/0x1c2
[   12.670000]  [<ffffffff8128c134>] kernel_thread_helper+0x4/0x10
[   12.670000]  [<ffffffff8128ad40>] ? restore_args+0x0/0x30
[   12.680000]  [<ffffffff81665700>] ? kernel_init+0x0/0x1c2
[   12.680000]  [<ffffffff8128c130>] ? kernel_thread_helper+0x0/0x1global0

Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 20:39:24 -07:00
Eric Dumazet
f40f94fc6c ipvs: fix a typo in __ip_vs_control_init()
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 20:39:24 -07:00
Eric W. Biederman
675071a2ef veth: Fix the byte counters
Commit 44540960 "veth: move loopback logic to common location" introduced
a bug in the packet counters.  I don't understand why that happened as it
is not explained in the comments and the mut check in dev_forward_skb
retains the assumption that skb->len is the total length of the packet.

I just measured this emperically by setting up a veth pair between two
noop network namespaces setting and attempting a telnet connection between
the two.  I saw three packets in each direction and the byte counters were
exactly 14*3 = 42 bytes high in each direction.  I got the actual
packet lengths with tcpdump.

So remove the extra ETH_HLEN from the veth byte count totals.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 18:24:53 -07:00
Eric W. Biederman
9d2a8fa96a net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries.
When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh
by adding a mount point it appears I missed a critical ordering issue, in the
ipv6 initialization.  I had not realized that ipv6_sysctl_register is called
at the very end of the ipv6 initialization and in particular after we call
neigh_sysctl_register from ndisc_init.

"neigh" needs to be initialized in ipv6_static_sysctl_register which is
the first ipv6 table to initialized, and definitely before ndisc_init.
This removes the weirdness of duplicate tables while still providing a
"neigh" mount point which prevents races in sysctl unregistering.

This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232
Reported-by: sunkan@zappa.cx
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 18:23:34 -07:00