linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-28 12:25:31 +00:00

Author	SHA1	Message	Date
David S. Miller	f1f28aa351	netdev: Add addr_list_lock to struct net_device. This will be used to protect the per-device unicast and multicast address lists, as well as the callbacks into the drivers which configure such state such as ->set_rx_mode() and ->set_multicast_list(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-15 00:08:33 -07:00
Patrick McHardy	bc1d0411b8	vlan: deliver packets received with VLAN acceleration to network taps When VLAN header stripping is used, packets currently bypass packet sockets (and other network taps) completely. For locally existing VLANs, they appear directly on the VLAN device, for unknown VLANs they are silently dropped. Add a new function netif_nit_deliver() to deliver incoming packets to all network interface taps and use it in __vlan_hwaccel_rx() to make VLAN packets visible on the underlying device. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:49:30 -07:00
Linus Torvalds	666484f025	Merge branch 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softirq: remove irqs_disabled warning from local_bh_enable softirq: remove initialization of static per-cpu variable Remove argument from open_softirq which is always NULL	2008-07-14 15:28:42 -07:00
David S. Miller	c773e847ea	netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue. Accesses are mostly structured such that when there are multiple TX queues the code transformations will be a little bit simpler. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:13:53 -07:00
David S. Miller	eb6aafe3f8	pkt_sched: Make qdisc_run take a netdev_queue. This allows us to use this calling convention all the way down into qdisc_restart(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:12:38 -07:00
David S. Miller	86d804e10a	netdev: Make netif_schedule() routines work with netdev_queue objects. Only plain netif_schedule() remains taking a net_device, mostly as a compatability item while we transition the rest of these interfaces. Everything else calls netif_schedule_queue() or __netif_schedule(), both of which take a netdev_queue pointer. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:11:25 -07:00
David S. Miller	ee609cb362	netdev: Move next_sched into struct netdev_queue. We schedule queues, not the device, for output queue processing in BH. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 22:58:37 -07:00
David S. Miller	816f3258e7	netdev: Kill qdisc_ingress, use netdev->rx_queue.qdisc instead. Now that our qdisc management is bi-directional, per-queue, and fully orthogonal, there is no reason to have a special ingress qdisc pointer in struct net_device. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 22:49:00 -07:00
David S. Miller	b0e1e6462d	netdev: Move rest of qdisc state into struct netdev_queue Now qdisc, qdisc_sleeping, and qdisc_list also live there. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:42:10 -07:00
David S. Miller	555353cfa1	netdev: The ingress_lock member is no longer needed. Every qdisc is assosciated with a queue, and in the case of ingress qdiscs that will now be netdev->rx_queue so using that queue's lock is the thing to do. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:33:13 -07:00
David S. Miller	dc2b48475a	netdev: Move queue_lock into struct netdev_queue. The lock is now an attribute of the device queue. One thing to notice is that "suspicious" places emerge which will need specific training about multiple queue handling. They are so marked with explicit "netdev->rx_queue" and "netdev->tx_queue" references. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:18:23 -07:00
David S. Miller	bb949fbd18	netdev: Create netdev_queue abstraction. A netdev_queue is an entity managed by a qdisc. Currently there is one RX and one TX queue, and a netdev_queue merely contains a backpointer to the net_device. The Qdisc struct is augmented with a netdev_queue pointer as well. Eventually the 'dev' Qdisc member will go away and we will have the resulting hierarchy: net_device --> netdev_queue --> Qdisc Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 16:55:56 -07:00
Patrick McHardy	4b5a698ef4	net: fix dev_set_promiscuity() breakage Commit `dad9b335` (netdevice: Fix promiscuity and allmulti overflow) broke dev_set_promiscuity() by returning on success without reprogramming the device. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-06 15:49:08 -07:00
Ingo Molnar	68083e05d7	Merge commit 'v2.6.26-rc9' into cpus4096	2008-07-06 14:23:39 +02:00
David S. Miller	ea2aca084b	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wan/hdlc_fr.c drivers/net/wireless/iwlwifi/iwl-4965.c drivers/net/wireless/iwlwifi/iwl3945-base.c	2008-07-05 23:08:07 -07:00
Wang Chen	93b3cff991	netdevice: Fix wrong string handle in kernel command line parsing v1->v2: Use strlcpy() to ensure s[i].name be null-termination. 1. In netdev_boot_setup_add(), a long name will leak. ex. : dev=21,0x1234,0x1234,0x2345,eth123456789verylongname......... 2. In netdev_boot_setup_check(), mismatch will happen if s[i].name is a substring of dev->name. ex. : dev=...eth1 dev=...eth11 [ With feedback from Ben Hutchings. ] Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-01 19:57:19 -07:00
David S. Miller	1b63ba8a86	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl4965-base.c	2008-06-28 01:19:40 -07:00
Wang Chen	5dbaec5dc6	netdevice: Fix typo of dev_unicast_add() comment Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-27 19:35:16 -07:00
Ingo Molnar	a60b33cf59	Merge branch 'linus' into core/softirq	2008-06-23 10:52:59 +02:00
Eric W. Biederman	b9f75f45a6	netns: Don't receive new packets in a dead network namespace. Alexey Dobriyan <adobriyan@gmail.com> writes: > Subject: ICMP sockets destruction vs ICMP packets oops > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt. > icmp_reply() wants ICMP socket. > > Steps to reproduce: > > launch shell in new netns > move real NIC to netns > setup routing > ping -i 0 > exit from shell > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30 > PGD 17f3cd067 PUD 17f3ce067 PMD 0 > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC > CPU 0 > Modules linked in: usblp usbcore > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4 > RIP: 0010:[<ffffffff803fce17>] [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP: 0018:ffffffff8057fc30 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900 > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800 > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815 > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28 > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800 > FS: 0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0) > Stack: 0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4 > ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246 > 000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436 > Call Trace: > <IRQ> [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0 > [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360 > [<ffffffff803fd645>] icmp_echo+0x65/0x70 > [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0 > [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0 > [<ffffffff803d71bb>] ip_rcv+0x33b/0x650 > [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340 > [<ffffffff803be57d>] process_backlog+0x9d/0x100 > [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250 > [<ffffffff80237be5>] __do_softirq+0x75/0x100 > [<ffffffff8020c97c>] call_softirq+0x1c/0x30 > [<ffffffff8020f085>] do_softirq+0x65/0xa0 > [<ffffffff80237af7>] irq_exit+0x97/0xa0 > [<ffffffff8020f198>] do_IRQ+0xa8/0x130 > [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60 > [<ffffffff8020bc46>] ret_from_intr+0x0/0xf > <EOI> [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60 > [<ffffffff80212f23>] ? mwait_idle+0x43/0x60 > [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0 > [<ffffffff8040f380>] ? rest_init+0x70/0x80 > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08 > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00 > RIP [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP <ffffffff8057fc30> > CR2: 0000000000000000 > ---[ end trace ea161157b76b33e8 ]--- > Kernel panic - not syncing: Aiee, killing interrupt handler! Receiving packets while we are cleaning up a network namespace is a racy proposition. It is possible when the packet arrives that we have removed some but not all of the state we need to fully process it. We have the choice of either playing wack-a-mole with the cleanup routines or simply dropping packets when we don't have a network namespace to handle them. Since the check looks inexpensive in netif_receive_skb let's just drop the incoming packets. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-20 22:16:51 -07:00
Ben Hutchings	0187bdfb05	net: Disable LRO on devices that are forwarding Large Receive Offload (LRO) is only appropriate for packets that are destined for the host, and should be disabled if received packets may be forwarded. It can also confuse the GSO on output. Add dev_disable_lro() function which uses the appropriate ethtool ops to disable LRO if enabled. Add calls to dev_disable_lro() in br_add_if() and functions that enable IPv4 and IPv6 forwarding. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-19 16:15:47 -07:00
Wang Chen	dad9b335c6	netdevice: Fix promiscuity and allmulti overflow Max of promiscuity and allmulti plus positive @inc can cause overflow. Fox example: when allmulti=0xFFFFFFFF, any caller give dev_set_allmulti() a positive @inc will cause allmulti be off. This is not what we want, though it's rare case. The fix is that only negative @inc will cause allmulti or promiscuity be off and when any caller makes the counters touch the roof, we return error. Change of v2: Change void function dev_set_promiscuity/allmulti to return int. So callers can get the overflow error. Caller's fix will be done later. Change of v3: 1. Since we return error to caller, we don't need to print KERN_ERROR, KERN_WARNING is enough. 2. In dev_set_promiscuity(), if __dev_set_promiscuity() failed, we return at once. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-18 01:48:28 -07:00
Or Gerlitz	c1da4ac752	net/core: add NETDEV_BONDING_FAILOVER event Add NETDEV_BONDING_FAILOVER event to be used in a successive patch by bonding to announce fail-over for the active-backup mode through the netdev events notifier chain mechanism. Such an event can be of use for the RDMA CM (communication manager) to let native RDMA ULPs (eg NFS-RDMA, iSER) always be aligned with the IP stack, in the sense that they use the same ports/links as the stack does. More usages can be done to allow monitoring tools based on netlink events being aware to bonding fail-over. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-06-17 23:59:41 -04:00
Ben Hutchings	6de329e26c	net: Fix test for VLAN TX checksum offload capability Selected device feature bits can be propagated to VLAN devices, so we can make use of TX checksum offload and TSO on VLAN-tagged packets. However, if the physical device does not do VLAN tag insertion or generic checksum offload then the test for TX checksum offload in dev_queue_xmit() will see a protocol of htons(ETH_P_8021Q) and yield false. This splits the checksum offload test into two functions: - can_checksum_protocol() tests a given protocol against a feature bitmask - dev_can_checksum() first tests the skb protocol against the device features; if that fails and the protocol is htons(ETH_P_8021Q) then it tests the encapsulated protocol against the effective device features for VLANs Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:02:28 -07:00
Carlos R. Mafra	962cf36c5b	Remove argument from open_softirq which is always NULL As git-grep shows, open_softirq() is always called with the last argument being NULL block/blk-core.c: open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL); kernel/hrtimer.c: open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL); kernel/rcuclassic.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL); kernel/rcupreempt.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL); kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL); kernel/softirq.c: open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); kernel/softirq.c: open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL); kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL); net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL); net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL); This observation has already been made by Matthew Wilcox in June 2002 (http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html) "I notice that none of the current softirq routines use the data element passed to them." and the situation hasn't changed since them. So it appears we can safely remove that extra argument to save 128 (54) bytes of kernel data (text). Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-25 07:43:15 +02:00
Mike Travis	0e12f848b3	net: use performance variant for_each_cpu_mask_nr Change references from for_each_cpu_mask to for_each_cpu_mask_nr where appropriate Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-23 18:35:12 +02:00
David Woodhouse	0e91796eb4	net: Fix call to ->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags() Am I just being particularly dim today, or can the call to dev->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags() never happen? We've just set dev->flags = flags & IFF_MULTICAST, effectively. So the condition '(dev->flags ^ flags) & IFF_MULTICAST' is _never_ going to be true. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-20 14:36:14 -07:00
Stephen Hemminger	dcc997738e	net: handle errors from device_rename device_rename can fail with -EEXIST or -ENOMEM, so handle any problems. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-14 22:33:38 -07:00
Ben Hutchings	e46b66bc42	net: Added ASSERT_RTNL() to dev_open() and dev_close(). dev_open() and dev_close() must be called holding the RTNL, since they call device functions and netdevice notifiers that are promised the RTNL. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-08 02:53:17 -07:00
Pavel Emelyanov	aca51397d0	netns: Fix arbitrary net_device-s corruptions on net_ns stop. When a net namespace is destroyed, some devices (those, not killed on ns stop explicitly) are moved back to init_net. The problem, is that this net_ns change has one point of failure - the __dev_alloc_name() may be called if a name collision occurs (and this is easy to trigger). This allocator performs a likely-to-fail GFP_ATOMIC allocation to find a suitable number. Other possible conditions that may cause error (for device being ns local or not registered) are always false in this case. So, when this call fails, the device is unregistered. But this is not the right thing to do, since after this the device may be released (and kfree-ed) improperly. E. g. bridges require more actions (sysfs update, timer disarming, etc.), some other devices want to remove their private areas from lists, etc. I. e. arbitrary use-after-free cases may occur. The proposed fix is the following: since the only reason for the dev_change_net_namespace to fail is the name generation, we may give it a unique fall-back name w/o %d-s in it - the dev<ifindex> one, since ifindexes are still unique. So make this change, raise the failure-case printk loglevel to EMERG and replace the unregister_netdevice call with BUG(). [ Use snprintf() -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-08 01:24:25 -07:00
Daniel Lezcano	aaf8cdc34d	netns: Fix device renaming for sysfs When a netdev is moved across namespaces with the 'dev_change_net_namespace' function, the 'device_rename' function is used to fixup kobject and refresh the sysfs tree. The device_rename function will call kobject_rename and this one will check if there is an object with the same name and this is the case because we are renaming the object with the same name. The use of 'device_rename' seems for me wrong because we usually don't rename it but just move it across namespaces. As we just want to do a mini "netdev_[un]register", IMO the functions 'netdev_[un]register_kobject' should be used instead, like an usual network device [un]registering. This patch replace device_rename by netdev_unregister_kobject, followed by netdev_register_kobject. The netdev_register_kobject will call device_initialize and will raise a warning indicating the device was already initialized. In order to fix that, I split the device initialization into a separate function and use it together with 'netdev_register_kobject' into register_netdevice. So we can safely call 'netdev_register_kobject' in 'dev_change_net_namespace'. This fix will allow to properly use the sysfs per namespace which is coming from -mm tree. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-02 17:00:58 -07:00
Mike Travis	0c0b0aca66	net: remove NR_CPUS arrays in net/core/dev.c Remove the fixed size channels[NR_CPUS] array in net/core/dev.c and dynamically allocate array based on nr_cpu_ids. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-02 16:43:08 -07:00
Hirofumi Nakagawa	801678c5a3	Remove duplicated unlikely() in IS_ERR() Some drivers have duplicated unlikely() macros. IS_ERR() already has unlikely() in itself. This patch cleans up such pointless code. Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jeff Garzik <jeff@garzik.org> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:25 -07:00
Alexey Dobriyan	d1643d24c6	[NET]: Fix and allocate less memory for ->priv'less netdevices This patch effectively reverts commit `d0498d9ae1` aka "[NET]: Do not allocate unneeded memory for dev->priv alignment." It was found to be buggy because of final unconditional += NETDEV_ALIGN_CONST removal. For example, for sizeof(struct net_device) being 2048 bytes, "alloc_size" was also 2048 bytes, but allocator with debugging options turned on started giving out !32-byte aligned memory resulting in redzones overwrites. Patch does small optimization in ->priv'less case: bumping size to next 32-byte boundary was always done to ensure ->priv will also be aligned. But, no ->priv, no need to do that. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-18 15:43:32 -07:00
Pavel Emelyanov	d0498d9ae1	[NET]: Do not allocate unneeded memory for dev->priv alignment. The alloc_netdev_mq() tries to produce 32-bytes alignment for both the net_device itself and its private data. The second alignment is achieved by adding the NETDEV_ALIGN_CONST to the whole size of the memory to be allocated. However, for those devices that do not need the private area, this addition just makes the net_device weight 1024 + 32 = 1068 bytes, i.e. consume twice as much memory. Since loopback device is such (sizeof_priv == 0 for it), and each net namespace creates one, this can save a noticeable amount of memory for kernel with net namespaces turned on. After this set the lo device is actually allocated from a size-1024 kmem cache on i386 box even with NETPOLL and WIRELESS_EXT turned on. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 02:17:42 -07:00
Denis V. Lunev	f3005d7f4a	[NETNS]: Add netns refcnt debug for network devices. dev_set_net is called for - just allocated devices - devices moving from one namespace to another release_net has proper check inside to distinguish these cases. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 02:02:18 -07:00
David S. Miller	8e8e43843b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/usb/rndis_host.c drivers/net/wireless/b43/dma.c net/ipv6/ndisc.c	2008-03-27 18:48:56 -07:00
Patrick McHardy	61ee6bd487	[NET]: Fix multicast device ioctl checks SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list method to determine whether it supports multicast. Drivers implementing secondary unicast support use set_rx_mode however. Check for both dev->set_multicast_mode and dev->set_rx_mode to determine multicast capabilities. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 02:12:11 -07:00
YOSHIFUJI Hideaki	878628fbf2	[NET] NETNS: Omit namespace comparision without CONFIG_NET_NS. Introduce an inline net_eq() to compare two namespaces. Without CONFIG_NET_NS, since no namespace other than &init_net exists, it is always 1. We do not need to convert 1) inline vs inline and 2) inline vs &init_net comparisons. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:40:00 +09:00
YOSHIFUJI Hideaki	c346dca108	[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:53 +09:00
Pavel Emelyanov	2feb27dbe0	[NETNS]: Minor information leak via /proc/net/ptype file. This file displays the registered packet types, but some of them (packet sockets creates such) can be bound to a net device and showing them in a wrong namespace is not correct. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 14:57:45 -07:00
Peter P Waskiewicz Jr	82cc1a7a56	[NET]: Add per-connection option to set max TSO frame size Update: My mailer ate one of Jarek's feedback mails... Fixed the parameter in netif_set_gso_max_size() to be u32, not u16. Fixed the whitespace issue due to a patch import botch. Changed the types from u32 to unsigned int to be more consistent with other variables in the area. Also brought the patch up to the latest net-2.6.26 tree. Update: Made gso_max_size container 32 bits, not 16. Moved the location of gso_max_size within netdev to be less hotpath. Made more consistent names between the sock and netdev layers, and added a define for the max GSO size. Update: Respun for net-2.6.26 tree. Update: changed max_gso_frame_size and sk_gso_max_size from signed to unsigned - thanks Stephen! This patch adds the ability for device drivers to control the size of the TSO frames being sent to them, per TCP connection. By setting the netdevice's gso_max_size value, the socket layer will set the GSO frame size based on that value. This will propogate into the TCP layer, and send TSO's of that size to the hardware. This can be desirable to help tune the bursty nature of TSO on a per-adapter basis, where one may have 1 GbE and 10 GbE devices coexisting in a system, one running multiqueue and the other not, etc. This can also be desirable for devices that cannot support full 64 KB TSO's, but still want to benefit from some level of segmentation offloading. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 03:43:19 -07:00
Linus Torvalds	bdc0894289	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits) [NETFILTER]: fix ebtable targets return [IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly. [NET]: Restore sanity wrt. print_mac(). [NEIGH]: Fix race between neighbor lookup and table's hash_rnd update. [RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK tg3: ethtool phys_id default [BNX2]: Update version to 1.7.4. [BNX2]: Disable parallel detect on an HP blade. [BNX2]: More 5706S link down workaround. ssb: Fix support for PCI devices behind a SSB->PCI bridge zd1211rw: fix sparse warnings rtl818x: fix sparse warnings ssb: Fix pcicore cardbus mode ssb: Make the GPIO API reentrancy safe ssb: Fix the GPIO API ssb: Fix watchdog access for devices without a chipcommon ssb: Fix serial console on new bcm47xx devices ath5k: Fix build warnings on some 64-bit platforms. WDEV, ath5k, don't return int from bool function WDEV: ath5k, fix lock imbalance ...	2008-02-23 21:07:10 -08:00
Jorge Boncompte [DTI2]	12aa343add	[NET]: Messed multicast lists after dev_mc_sync/unsync Commit `a0a400d79e` ("[NET]: dev_mcast: add multicast list synchronization helpers") from you introduced a new field "da_synced" to struct dev_addr_list that is not properly initialized to 0. So when any of the current users (8021q, macvlan, mac80211) calls dev_mc_sync/unsync they mess the address list for both devices. The attached patch fixed it for me and avoid future problems. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-19 14:17:04 -08:00
Linus Torvalds	f6866fecd6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits) [NET]: Make sure sockets implement splice_read netconsole: avoid null pointer dereference at show_local_mac() [IPV6]: Fix reversed local_df test in ip6_fragment [XFRM]: Avoid bogus BUG() when throwing new policy away. [AF_KEY]: Fix bug in spdadd [NETFILTER] nf_conntrack_proto_tcp.c: Mistyped state corrected. net: xfrm statistics depend on INET [NETFILTER]: make secmark_tg_destroy() static [INET]: Unexport inet_listen_wlock [INET]: Unexport __inet_hash_connect [NET]: Improve cache line coherency of ingress qdisc [NET]: Fix race in dev_close(). (Bug 9750) [IPSEC]: Fix bogus usage of u64 on input sequence number [RTNETLINK]: Send a single notification on device state changes. [NETLABLE]: Hide netlbl_unlabel_audit_addr6 under ifdef CONFIG_IPV6. [NETLABEL]: Don't produce unused variables when IPv6 is off. [NETLABEL]: Compilation for CONFIG_AUDIT=n case. [GENETLINK]: Relax dances with genl_lock. [NETLABEL]: Fix lookup logic of netlbl_domhsh_search_def. [IPV6]: remove unused method declaration (net/ndisc.h). ...	2008-02-15 07:33:07 -08:00
Randy Dunlap	bc2cda1ebd	docbook: make a networking book and fix a few errors Move networking (core and drivers) docbook to its own networking book. Fix a few kernel-doc errors in header and source files. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:19 -08:00
Harvey Harrison	b5606c2d44	remove final fastcall users fastcall always expands to empty, remove it. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:18 -08:00
Matti Linnanvuori	d8b2a4d21e	[NET]: Fix race in dev_close(). (Bug 9750) There is a race in Linux kernel file net/core/dev.c, function dev_close. The function calls function dev_deactivate, which calls function dev_watchdog_down that deletes the watchdog timer. However, after that, a driver can call netif_carrier_ok, which calls function __netdev_watchdog_up that can add the watchdog timer again. Function unregister_netdevice calls function dev_shutdown that traps the bug !timer_pending(&dev->watchdog_timer). Moving dev_deactivate after netif_running() has been cleared prevents function netif_carrier_on from calling __netdev_watchdog_up and adding the watchdog timer again. Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-12 23:11:16 -08:00
Klaus Heinrich Kiwi	7759db8277	[AUDIT] Add uid, gid fields to ANOM_PROMISCUOUS message Changes the ANOM_PROMISCUOUS message to include uid and gid fields, making it consistent with other AUDIT_ANOM_ messages and in the format the userspace is expecting. Signed-off-by: Klaus Heinrich Kiwi <klausk@br.ibm.com> Acked-by: Eric Paris <eparis@redhat.com>	2008-02-01 14:25:10 -05:00
Eric Paris	4746ec5b01	[AUDIT] add session id to audit messages In order to correlate audit records to an individual login add a session id. This is incremented every time a user logs in and is included in almost all messages which currently output the auid. The field is labeled ses= or oses= Signed-off-by: Eric Paris <eparis@redhat.com>	2008-02-01 14:06:51 -05:00
Al Viro	0c11b9428f	[PATCH] switch audit_get_loginuid() to task_struct * all callers pass something->audit_context Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-02-01 14:04:59 -05:00
Chris Leech	e83a2ea850	[VLAN]: set_rx_mode support for unicast address list Reuse the existing logic for multicast list synchronization for the unicast address list. The core of dev_mc_sync/unsync are split out as __dev_addr_sync/unsync and moved from dev_mcast.c to dev.c. These are then used to implement dev_unicast_sync/unsync as well. I'm working on cleaning up Intel's FCoE stack, which generates new MAC addresses from the fibre channel device id assigned by the fabric as per the current draft specification in T11. When using such a protocol in a VLAN environment it would be nice to not always be forced into promiscuous mode, assuming the underlying Ethernet driver supports multiple unicast addresses as well. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-01-31 19:28:24 -08:00
Stephen Hemminger	72348a424f	[PKT_SCHED] net: add sparse annotation to ptype_seq_start/stop Get rid of some more sparse warnings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:42 -08:00
Eric Dumazet	9a429c4983	[NET]: Add some acquires/releases sparse annotations. Add __acquires() and __releases() annotations to suppress some sparse warnings. example of warnings : net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong count at exit net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' - unexpected unlock Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:31 -08:00
Herbert Xu	a66207121f	[NET]: Check RTNL status in unregister_netdevice The caller must hold the RTNL so let's check it in unregister_netdevice. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:43 -08:00
Denis V. Lunev	81103a52f2	[NETNS]: network namespace was passed into dev_getbyhwaddr but not used Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:24 -08:00
Denis Cheng	3b5b34fd2b	[NET] net/core/dev.c: use LIST_HEAD instead of LIST_HEAD_INIT single list_head variable initialized with LIST_HEAD_INIT could almost always can be replaced with LIST_HEAD declaration, this shrinks the code and looks better. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:51 -08:00
Pavel Emelyanov	82d8a867ff	[NET]: Make macro to specify the ptype_base size Currently this size is 16, but as the comment says this is so only because all the chains (except one) has the length 1. I think, that some day this may change, so growing this hash will be much easier. Besides, symbolic names are read better than magic constants. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:04 -08:00
Denis V. Lunev	e372c41401	[NET]: Consolidate net namespace related proc files creation. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:28 -08:00
David S. Miller	fed17f3094	[NET]: Stop polling when napi_disable() is pending. This finally adds the code in net_rx_action() to break out of the ->poll()'ing loop when a napi_disable() is found to be pending. Now, even if a device is being flooded with packets it can be cleanly brought down. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-08 23:30:13 -08:00
Joe Perches	53ccaae1ef	[NET] net/core/: Spelling fixes Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-20 14:02:06 -08:00
Wang Chen	d59b54b150	[NET]: Fix wrong comments for unregister_net* There are some return value comments for void functions. Fixed it. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-11 02:45:32 -08:00
Pavel Emelyanov	c67625a1ec	[NET]: Remove notifier block from chain when register_netdevice_notifier fails Commit `fcc5a03ac4`: [NET]: Allow netdev REGISTER/CHANGENAME events to fail makes the register_netdevice_notifier() handle the error from the NETDEV_REGISTER event, sent to the registering block. The bad news is that in this case the notifier block is not removed from the list, but the error is returned to the caller. In case the caller is in module init function and handles this error this can abort the module loading. The notifier block will be then removed from the kernel, but will be left in the list. Oops :( I think that the notifier block should be removed from the chain in case of error, regardless whether this error is handled by the caller or not. In the worst case (the error is _not_ handled) module will not receive the events any longer. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-14 15:53:16 -08:00
Denis V. Lunev	022cbae611	[NET]: Move unneeded data to initdata section. This patch reverts Eric's commit `2b008b0a8e` It diets .text & .data section of the kernel if CONFIG_NET_NS is not set. This is safe after list operations cleanup. Signed-of-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-13 03:23:50 -08:00
Alexey Dobriyan	33d36bb83c	[NETNS]: init dev_base_lock only once * it already statically initialized * reinitializing live global spinlock every time netns is setup is also wrong Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 22:09:25 -08:00
Stephen Hemminger	3b582cc14c	[NET]: docbook fixes for netif_ functions Documentation updates for network interfaces. 1. Add doc for netif_napi_add 2. Remove doc for unused returns from netif_rx 3. Add doc for netif_receive_skb [ Incorporated minor mods from Randy Dunlap -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 02:21:47 -07:00
Daniel Lezcano	93ee31f14f	[NET]: Fix free_netdev on register_netdev failure. Point 1: The unregistering of a network device schedule a netdev_run_todo. This function calls dev->destructor when it is set and the destructor calls free_netdev. Point 2: In the case of an initialization of a network device the usual code is: * alloc_netdev * register_netdev -> if this one fails, call free_netdev and exit with error. Point 3: In the register_netdevice function at the later state, when the device is at the registered state, a call to the netdevice_notifiers is made. If one of the notification falls into an error, a rollback to the registered state is done using unregister_netdevice. Conclusion: When a network device fails to register during initialization because one network subsystem returned an error during a notification call chain, the network device is freed twice because of fact 1 and fact 2. The second free_netdev will be done with an invalid pointer. Proposed solution: The following patch move all the code of unregister_netdevice except the call to net_set_todo, to a new function "rollback_registered". The following functions are changed in this way: * register_netdevice: calls rollback_registered when a notification fails * unregister_netdevice: calls rollback_register + net_set_todo, the call order to net_set_todo is changed because it is the latest now. Since it justs add an element to a list that should not break anything. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 21:16:18 -07:00
David S. Miller	0a7606c121	[NET]: Fix race between poll_napi() and net_rx_action() netpoll_poll_lock() synchronizes the ->poll() invocation code paths, but once we have the lock we have to make sure that NAPI_STATE_SCHED is still set. Otherwise we get: cpu 0 cpu 1 net_rx_action() poll_napi() netpoll_poll_lock() ... spin on ->poll_lock ->poll() netif_rx_complete netpoll_poll_unlock() acquire ->poll_lock() ->poll() netif_rx_complete() CRASH Based upon a bug report from Tina Yang. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:28 -07:00
Eric W. Biederman	2b008b0a8e	[NET]: Marking struct pernet_operations __net_initdata was inappropriate It is not safe to to place struct pernet_operations in a special section. We need struct pernet_operations to last until we call unregister_pernet_subsys. Which doesn't happen until module unload. So marking struct pernet_operations is a disaster for modules in two ways. - We discard it before we call the exit method it points to. - Because I keep struct pernet_operations on a linked list discarding it for compiled in code removes elements in the middle of a linked list and does horrible things for linked insert. So this looks safe assuming __exit_refok is not discarded for modules. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 22:54:53 -07:00
Stephen Hemminger	c8d90dca32	[NET] dev_change_name: ignore changes to same name Prevent error/backtrace from dev_rename() when changing name of network device to the same name. This is a common situation with udev and other scripts that bind addr to device. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 03:53:42 -07:00
Pavel Emelyanov	342709efc7	[NET]: Remove in-code externs for some functions from net/core/dev.c Inconsistent prototype and real type for functions may have worse consequences, than those for variables, so move them into a header. Since they are used privately in net/core, make this file reside in the same place. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:56 -07:00
Jeff Garzik	bada339ba2	[NET]: Validate device addr prior to interface-up Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:50 -07:00
Pavel Emelyanov	668f895a85	[NET]: Hide the queue_mapping field inside netif_subqueue_stopped Many places get the queue_mapping field from skb to pass it to the netif_subqueue_stopped() which will be 0 in any case. Make the helper that works with sk_buff Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:56 -07:00
Pavel Emelyanov	dfa4091129	[NET]: Use the skb_set_queue_mapping where appropriate There's already such a helper to initialize this field. Use it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:55 -07:00
Herbert Xu	a030847e9f	[NET]: Avoid copying TCP packets unnecessarily TCP packets all have writable heads, that is, even though it's cloned, it is writable up to the end of the TCP header. This patch makes skb_checksum_help aware of this fact by using skb_clone_writable and avoiding a copy for TCP. I've also modified the BUG_ON tests to be unsigned. The only case where this makes a difference is if csum_start points to a location before skb->data. Since skb->data should always include the header where the checksum field is (and all currently callers adhere to that), this change is safe and may uncover bugs later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:34 -07:00
Herbert Xu	f697c3e8b3	[NET]: Avoid unnecessary cloning for ingress filtering As it is we always invoke pt_prev before ing_filter, even if there are no ingress filters attached. This can cause unnecessary cloning in pt_prev. This patch changes it so that we only invoke pt_prev if there are ingress filters attached. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:26 -07:00
Randy Dunlap	c4ea43c552	net core: fix kernel-doc for new function parameters Fix networking code kernel-doc for newly added parameters. Warning(linux-2.6.23-git2//net/core/sock.c:879): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:570): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:594): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:617): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:641): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:667): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:722): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:959): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:1195): No description found for parameter 'dev' Warning(linux-2.6.23-git2//net/core/dev.c:2105): No description found for parameter 'n' Warning(linux-2.6.23-git2//net/core/dev.c:3272): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:3445): No description found for parameter 'net' Warning(linux-2.6.23-git2//include/linux/netdevice.h:1301): No description found for parameter 'cpu' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-13 09:52:26 -07:00
Pavel Emelyanov	9b77265235	[NET]: Remove double dev->flags checking when calling dev_close() The unregister_netdevice() and dev_change_net_namespace() both check for dev->flags to be IFF_UP before calling the dev_close(), but the dev_close() checks for IFF_UP itself, so remove those unneeded checks. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:52 -07:00
Pavel Emelyanov	4665079cbb	[NETNS]: Move some code into __init section when CONFIG_NET_NS=n With the net namespaces many code leaved the __init section, thus making the kernel occupy more memory than it did before. Since we have a config option that prohibits the namespace creation, the functions that initialize/finalize some netns stuff are simply not needed and can be freed after the boot. Currently, this is almost not noticeable, since few calls are no longer in __init, but when the namespaces will be merged it will be possible to free more code. I propose to use the __net_init, __net_exit and __net_initdata "attributes" for functions/variables that are not used if the CONFIG_NET_NS is not set to save more space in memory. The exiting functions cannot just reside in the __exit section, as noticed by David, since the init section will have references on it and the compilation will fail due to modpost checks. These references can exist, since the init namespace never dies and the exit callbacks are never called. So I introduce the __exit_refok attribute just like it is already done with the __init_refok. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:58 -07:00
Jeff Garzik	14e3e07979	[NET]: split dev_ifsioc() according to locking This always bugged me: dev_ioctl() called dev_ifsioc() either inside read_lock(dev_base_lock) or rtnl_lock(), depending on the ioctl being executed. This change moves the ioctls executed inside dev_base_lock to a new function, dev_ifsioc_locked(). Now the locking context is completely clear to the reader. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:49 -07:00
Stephen Hemminger	cfcabdcc2d	[NET]: sparse warning fixes Fix a bunch of sparse warnings. Mostly about 0 used as NULL pointer, and shadowed variable declarations. One notable case was that hash size should have been unsigned. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:48 -07:00
Stephen Hemminger	3b04ddde02	[NET]: Move hardware header operations out of netdevice. Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:52 -07:00
Eric W. Biederman	8b41d1887d	[NET]: Fix running without sysfs When sysfs support is compiled out the kernel still keeps and maintains the kobject tree. So it is not safe to skip our kobject reference counting or to avoid becoming members of the kobject tree. It is safe to not add the networking specific sysfs attributes. This patch removes the sysfs special cases from net/core/dev.c renames functions from netdev_sysfs_xxxx to netdev_kobject_xxxx and always compiles in net-sysfs.c net-sysfs.c is modified with a CONFIG_SYSFS guard around the parts that are actually sysfs specific. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:46 -07:00
Pavel Emelyanov	056925ab31	[NET]: Cleanup calling netdev notifiers. The call_netdev_notifiers routine can successfully be used in the net/core_dev.c itself. This will save 6 lines of code and 62 ;) bytes of .text section. 62 is rather small, but I have one more patch saving ~30 bytes from netns code (sent to Eric), so altogether they can save some more noticeable amount. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:21 -07:00
Pavel Emelyanov	30d97d3585	[NETNS]: Consolidate hashes creation in netdev_init() The dev_name_hash and the dev_index_hash are now booth kmalloc-ed (and each element is properly initialized as usually) so I think it's worth consolidating this code making it look nicer (and saving 28 bytes of .text section ;) ) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:21 -07:00
Eric W. Biederman	ad7379d494	[NET]: Fix the prototype of call_netdevice_notifiers. This replaces the void * parameter with a struct net_device * which is what is actually required. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:20 -07:00
Jamal Hadi Salim	22dd749501	[NET]: migrate HARD_TX_LOCK to header file HARD_TX_LOCK micro is a nice aggregation that could be used in other spots. move it to netdevice.h Also makes sure the previously superflous cpu arguement is used. Thanks to DaveM for the suggestions. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:20 -07:00
Eric W. Biederman	077130c0cf	[NET]: Fix race when opening a proc file while a network namespace is exiting. The problem: proc_net files remember which network namespace the are against but do not remember hold a reference count (as that would pin the network namespace). So we currently have a small window where the reference count on a network namespace may be incremented when opening a /proc file when it has already gone to zero. To fix this introduce maybe_get_net and get_proc_net. maybe_get_net increments the network namespace reference count only if it is greater then zero, ensuring we don't increment a reference count after it has gone to zero. get_proc_net handles all of the magic to go from a proc inode to the network namespace instance and call maybe_get_net on it. PROC_NET the old accessor is removed so that we don't get confused and use the wrong helper function. Then I fix up the callers to use get_proc_net and handle the case case where get_proc_net returns NULL. In that case I return -ENXIO because effectively the network namespace has already gone away so the files we are trying to access don't exist anymore. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:22 -07:00
David S. Miller	9d5010db7e	[NET]: Add a might_sleep() to dev_close(). Requested by Johannes Berg. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:15 -07:00
Eric W. Biederman	ce286d3273	[NET]: Implement network device movement between namespaces This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate a network device is local to a single network namespace and should never be moved. Useful for pseudo devices that we need an instance in each network namespace (like the loopback device) and for any device we find that cannot handle multiple network namespaces so we may trap them in the initial network namespace. This patch introduces the function dev_change_net_namespace a function used to move a network device from one network namespace to another. To the network device nothing special appears to happen, to the components of the network stack it appears as if the network device was unregistered in the network namespace it is in, and a new device was registered in the network namespace the device was moved to. This patch sets up a namespace device destructor that upon the exit of a network namespace moves all of the movable network devices to the initial network namespace so they are not lost. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:12 -07:00
Eric W. Biederman	b267b17964	[NET]: Factor out __dev_alloc_name from dev_alloc_name When forcibly changing the network namespace of a device I need something that can generate a name for the device in the new namespace without overwriting the old name. __dev_alloc_name provides me that functionality. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:11 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
Eric W. Biederman	6d34b1c27a	[NET]: Initialize the network namespace of network devices. Except for carefully selected pseudo devices all network interfaces should start out in the initial network namespace. Ultimately it will be register_netdev that examines what dev->nd_net is set to and places a device in a network namespace. This patch modifies alloc_netdev to initialize the network namespace a device is in with the initial network namespace. This gets it right for the vast majority of devices so their drivers need not be modified and for those few pseudo devices that need something different they can change this parameter before calling register_netdevice. The network namespace parameter on a network device is not reference counted as the devices are inside of a network namespace and cannot remain in that namespace past the lifetime of the network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:07 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
Stephen Hemminger	bea3348eef	[NET]: Make NAPI polling independent of struct net_device objects. Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device dev, int budget) to int foo_poll(struct napi_struct napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:45 -07:00
Herbert Xu	7f353bf29e	[NET]: Share correct feature code between bridging and bonding http://bugzilla.kernel.org/show_bug.cgi?id=8797 shows that the bonding driver may produce bogus combinations of the checksum flags and SG/TSO. For example, if you bond devices with NETIF_F_HW_CSUM and NETIF_F_IP_CSUM you'll end up with a bonding device that has neither flag set. If both have TSO then this produces an illegal combination. The bridge device on the other hand has the correct code to deal with this. In fact, the same code can be used for both. So this patch moves that logic into net/core/dev.c and uses it for both bonding and bridging. In the process I've made small adjustments such as only setting GSO_ROBUST if at least one constituent device supports it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-13 22:52:14 -07:00
Herbert Xu	fcc5a03ac4	[NET]: Allow netdev REGISTER/CHANGENAME events to fail This patch adds code to allow errors to be passed up from event handlers of NETDEV_REGISTER and NETDEV_CHANGENAME. It also adds the notifier_from_errno/notifier_to_errnor helpers to pass the errno value up to the notifier caller. If an error is detected when a device is registered, it causes that operation to fail. A NETDEV_UNREGISTER will be sent to all event handlers. Similarly if NETDEV_CHANGENAME fails the original name is restored and a new NETDEV_CHANGENAME event is sent. As such all event handlers must be idempotent with respect to these events. When an event handler is registered NETDEV_REGISTER events are sent for all devices currently registered. Should any of them fail, we will send NETDEV_GOING_DOWN/NETDEV_DOWN/NETDEV_UNREGISTER events to that handler for the devices which have already been registered with it. The handler registration itself will fail. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:15 -07:00
Herbert Xu	7f988eab57	[NET]: Take dev_base_lock when moving device name hash list entry When we added name-based hashing the dev_base_lock was designated as the lock to take when changing the name hash list. Unfortunately, because it was a preexisting lock that just happened to be taken in the right spots we neglected to take it in dev_change_name. The race can affect calles of __dev_get_by_name that do so without taking the RTNL. They may end up walking down the wrong hash chain and end up missing the device that they're looking for. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:13 -07:00
Herbert Xu	7ce1b0edcb	[NET]: Call uninit if necessary in register_netdevice This patch makes register_netdevice call dev->uninit if the regsitration fails after dev->init has completed successfully. Very few drivers use the init/uninit calls but at least one (drivers/net/wan/sealevel.c) may leak without this change. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:12 -07:00
Randy Dunlap	0ed72ec4af	[NET]: kernel-doc fixes Fix kernel-doc omissions in net/: Warning(linux-2.6.23-rc1//net/core/dev.c:2728): No description found for parameter 'addr' Warning(linux-2.6.23-rc1//net/core/dev.c:2752): No description found for parameter 'addr' Warning(linux-2.6.23-rc1//net/core/dev.c:3839): No description found for parameter 'net_dma' Warning(linux-2.6.23-rc1//net/core/dev.c:3877): No description found for parameter 'state' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:00 -07:00
Patrick McHardy	31ce72a6b1	[NET]: Fix loopback crashes when multiqueue is enabled. From: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-20 19:45:45 -07:00
YOSHIFUJI Hideaki	40b77c9434	[NET] CORE: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-07-19 10:43:23 +09:00
Denis Cheng	12972621c8	[NET]: move __dev_addr_discard adjacent to dev_addr_discard for readability Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 02:12:56 -07:00
Denis Cheng	26cc2522cb	[NET]: merge dev_unicast_discard and dev_mc_discard into one this two functions could share the dev->_xmit_lock acquired context. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 02:12:03 -07:00
Denis Cheng	456ad75c89	[NET]: move dev_mc_discard from dev_mcast.c to dev.c Because this function is only called by unregister_netdevice, this moving could make this non-global function static, and also remove its declaration in netdevice.h; Any further, function __dev_addr_discard is also just called by dev_mc_discard and dev_unicast_discard, keeping this two functions both in one c file could make __dev_addr_discard also static and remove its declaration in netdevice.h; Futhermore, the sequential call to dev_unicast_discard and then dev_mc_discard in unregister_netdevice have a similar mechanism that: (netif_tx_lock_bh / __dev_addr_discard / netif_tx_unlock_bh), they should merged into one to eliminate duplicates in acquiring and releasing the dev->_xmit_lock, this would be done in my following patch. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 02:10:54 -07:00
Patrick McHardy	b863ceb7dd	[NET]: Add macvlan driver Add macvlan driver, which allows to create virtual ethernet devices based on MAC address. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-14 18:55:06 -07:00
Patrick McHardy	24023451c8	[NET]: Add net_device change_rx_mode callback Currently the set_multicast_list (and set_rx_mode) callbacks are responsible for configuring the device according to the IFF_PROMISC, IFF_MULTICAST and IFF_ALLMULTI flags and the mc_list (and uc_list in case of set_rx_mode). These callbacks can be invoked from BH context without the rtnl_mutex by dev_mc_add/dev_mc_delete, which makes reading the device flags and promiscous/allmulti count racy. For real hardware drivers that just commit all changes to the hardware this is not a real problem since the stack guarantees to call them for every change, so at least the final call will not race and commit the correct configuration to the hardware. For software devices that want to synchronize promiscous and multicast state to an underlying device however this can cause corruption of the underlying device's flags or promisc/allmulti counts. When the software device is concurrently put in promiscous or allmulti mode while set_multicast_list is invoked from bottem half context, the device might synchronize the change to the underlying device without holding the rtnl_mutex, which races with concurrent changes to the underlying device. Add a dev->change_rx_flags hook that is invoked when any of the flags that affect rx filtering change (under the rtnl_mutex), which allows drivers to perform synchronization immediately and only synchronize the address lists in set_multicast_list/set_rx_mode. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-14 18:51:31 -07:00
Linus Torvalds	e030dbf91a	Merge branch 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop * 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop: (28 commits) ioatdma: add the unisys "i/oat" pci vendor/device id ARM: Add drivers/dma to arch/arm/Kconfig iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver iop13xx: surface the iop13xx adma units to the iop-adma driver dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines md: remove raid5 compute_block and compute_parity5 md: handle_stripe5 - request io processing in raid5_run_ops md: handle_stripe5 - add request/completion logic for async expand ops md: handle_stripe5 - add request/completion logic for async read ops md: handle_stripe5 - add request/completion logic for async check ops md: handle_stripe5 - add request/completion logic for async compute ops md: handle_stripe5 - add request/completion logic for async write ops md: common infrastructure for running operations with raid5_run_ops md: raid5_run_ops - run stripe operations outside sh->lock raid5: replace custom debug PRINTKs with standard pr_debug raid5: refactor handle_stripe5 and handle_stripe6 (v3) async_tx: add the async_tx api xor: make 'xor_blocks' a library routine for use with async_tx dmaengine: make clients responsible for managing channels dmaengine: refactor dmaengine around dma_async_tx_descriptor ...	2007-07-13 10:52:27 -07:00
Dan Williams	d379b01e90	dmaengine: make clients responsible for managing channels The current implementation assumes that a channel will only be used by one client at a time. In order to enable channel sharing the dmaengine core is changed to a model where clients subscribe to channel-available-events. Instead of tracking how many channels a client wants and how many it has received the core just broadcasts the available channels and lets the clients optionally take a reference. The core learns about the clients' needs at dma_event_callback time. In support of multiple operation types, clients can specify a capability mask to only be notified of channels that satisfy a certain set of capabilities. Changelog: * removed DMA_TX_ARRAY_INIT, no longer needed * dma_client_chan_free -> dma_chan_release: switch to global reference counting only at device unregistration time, before it was also happening at client unregistration time * clients now return dma_state_client to dmaengine (ack, dup, nak) * checkpatch.pl fixes * fixup merge with git-ioat Cc: Chris Leech <christopher.leech@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-13 08:06:13 -07:00
Patrick McHardy	61cbc2fca6	[NET]: Fix secondary unicast/multicast address count maintenance When a reference to an existing address is increased or decreased without hitting zero, the address count is incorrectly adjusted. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:23 -07:00
Peter P Waskiewicz Jr	f25f4e4480	[CORE] Stack changes to add multiqueue hardware support API Add the multiqueue hardware device support API to the core network stack. Allow drivers to allocate multiple queues and manage them at the netdev level if they choose to do so. Added a new field to sk_buff, namely queue_mapping, for drivers to know which tx_ring to select based on OS classification of the flow. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:21 -07:00
Herbert Xu	a298830cd0	[NET]: Fix TX checksum feature check This patch fixes a boolean error in the new TX checksum check that causes bogus TSO packets to be generated. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:19 -07:00
Patrick McHardy	4417da668c	[NET]: dev: secondary unicast address support Add support for configuring secondary unicast addresses on network devices. To support this devices capable of filtering multiple unicast addresses need to change their set_multicast_list function to configure unicast filters as well and assign it to dev->set_rx_mode instead of dev->set_multicast_list. Other devices are put into promiscous mode when secondary unicast addresses are present. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:56 -07:00
Patrick McHardy	bf742482d7	[NET]: dev: introduce generic net_device address lists Introduce struct dev_addr_list and list maintenance functions based on dev_mc_list and the related functions. This will be used by follow-up patches for both multicast and secondary unicast addresses. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:54 -07:00
Stephen Hemminger	d212f87b06	[NET]: IPV6 checksum offloading in network devices The existing model for checksum offload does not correctly handle devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag implies device can do any arbitrary protocol. This patch: * adds NETIF_F_IPV6_CSUM for those devices * fixes bnx2 and tg3 devices that need it * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO) * fixes assumptions about NETIF_F_ALL_CSUM in nat * adjusts bridge union of checksumming computation Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:52 -07:00
Shannon Nelson	515e06c455	[NET]: Re-enable irqs before pushing pending DMA requests This moves the local_irq_enable() call in net_rx_action() to before calling the CONFIG_NET_DMA's dma_async_memcpy_issue_pending() rather than after. This shortens the irq disabled window and allows for DMA drivers that need to do their own irq hold. Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-23 23:09:23 -07:00
Thomas Graf	7c355f532d	[NET]: Avoid duplicate netlink notification when changing link state When changing the link state from userspace not affecting any other flags. Two duplicate notification are being sent, once as action in the NETDEV_UP/NETDEV_DOWN notification chain and a second time when comparing old and new device flags after the change has been completed. Although harmless, the duplicates should be avoided. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:56 -07:00
Stephen Hemminger	9093bbb2d9	[NET]: Fix race condition about network device name allocation. Kenji Kaneshige found this race between device removal and registration. On unregister it is possible for the old device to exist, because sysfs file is still open. A new device with 'eth%d' will select the same name, but sysfs kobject register will fial. The following changes the shutdown order slightly. It hold a removes the sysfs entries earlier (on unregister_netdevice), but holds a kobject reference. Then when todo runs the actual last put free happens. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-19 15:39:25 -07:00
Jarek Poplawski	723e98b79c	[NET]: lockdep classes in register_netdevice After initializing dev->_xmit_lock register_netdevice() sets lockdep class according to dev->type. Idea of this patch - by David Miller. Reported & tested by: "Yuriy N. Shkandybin" <jura@netams.com> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-17 14:20:28 -07:00
Rafael J. Wysocki	8bb7844286	Add suspend-related notifications for CPU hotplug Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:56 -07:00
Josef 'Jeff' Sipek	2396a22e09	[NET] net/core: Fix error handling Upon failure to register "ptype" procfs entry, "softnet_stat" was not removed, and an incorrect attempt was made to remove the "ptype" entry. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-07 00:33:18 -07:00
Pavel Emelianov	7562f876cd	[NET]: Rework dev_base via list_head (v3) Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 15:13:45 -07:00
Patrick McHardy	4e9cac2ba4	[NET]: Add __dev_getfirstbyhwtype Add __dev_getfirstbyhwtype for callers that don't want a reference but some data from the device and thus need to take the rtnl anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:28:13 -07:00
Rusty Russell	5a1b5898ee	[NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats. Herbert Xu conviced me that a new flag was overkill; every driver currently overrides get_stats, so we might as well make the internal one the default. If someone did fail to set get_stats, they would now get all 0 stats instead of "No statistics available". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 21:04:03 -07:00
Johannes Berg	295f4a1fa3	[WEXT]: Clean up how wext is called. This patch cleans up the call paths from the core code into wext. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 20:43:56 -07:00
Johannes Berg	11433ee450	[WEXT]: Move to net/wireless This patch moves dev/core/wireless.c to net/wireless/wext.c. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 20:42:51 -07:00
Herbert Xu	f9d106a6d5	[NET]: Warn about GSO/checksum abuse Now that Patrick has added the code to deal with GSO in netfilter, we no longer need the crutch that computes partial checksums just before transmission. This patch turns this into a warning again. If this goes OK, we can then turn it into a BUG_ON and remove the gso_send_check cruft. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:47 -07:00
Andrew Morton	372cc74c8b	[NET]: Prevent much sadness in qdisc_lock_tree(). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:38 -07:00
Borislav Petkov	38b4da3837	[NET]: Fix comments for register_netdev(). Correct the function name in the comments supplied with register_netdev() Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:33 -07:00
Stephen Hemminger	9be9a6b983	[NET]: Get rid of netdev_nit It isn't any faster to test a boolean global variable than do a simple check for empty list. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:22 -07:00
Patrick McHardy	fd44de7cc1	[NET_SCHED]: ingress: switch back to using ingress_lock Switch ingress queueing back to use ingress_lock. qdisc_lock_tree now locks both the ingress and egress qdiscs on the device. All changes to data that might be used on both ingress and egress needs to be protected by using qdisc_lock_tree instead of manually taking dev->queue_lock. Additionally the qdisc stats_lock needs to be initialized to ingress_lock for ingress qdiscs. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:08 -07:00
Stephen Hemminger	6229e362dd	bridge: eliminate call by reference Change the bridging hook to be simple function with return value rather than modifying the skb argument. This could generate better code and is cleaner. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>	2007-04-25 22:28:44 -07:00
Herbert Xu	663ead3bb8	[NET]: Use csum_start offset instead of skb_transport_header The skb transport pointer is currently used to specify the start of the checksum region for transmit checksum offload. Unfortunately, the same pointer is also used during receive side processing. This creates a problem when we want to retransmit a received packet with partial checksums since the skb transport pointer would be overwritten. This patch solves this problem by creating a new 16-bit csum_start offset value to replace the skb transport header for the purpose of checksums. This offset is calculated from skb->head so that it does not have to change when skb->data changes. No extra space is required since csum_offset itself fits within a 16-bit word so we can use the other 16 bits for csum_start. For backwards compatibility, just before we push a packet with partial checksums off into the device driver, we set the skb transport header to what it would have been under the old scheme. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:40 -07:00
Rusty Russell	c45d286e72	[NET]: Inline net_device_stats Network drivers which keep stats allocate their own stats structure then write a get_stats() function to return them. It would be nice if this were done by default. 1) Add a new "stats" field to "struct net_device". 2) Add a new feature field to say "this driver uses the internal one" 3) Have a default "get_stats" which returns NULL if that feature not set. 4) Change callers to check result of get_stats call for NULL, not if ->get_stats is set. This should not break backwards compatibility with older drivers, yet allow modern drivers to shed some boilerplate code. Lightly tested: works for a modified lguest network driver. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:26 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
Arnaldo Carvalho de Melo	b0e380b1d8	[SK_BUFF]: unions of just one member don't get anything done, kill them Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and skb->mac to skb->mac_header, to match the names of the associated helpers (skb[_[re]set]_{transport,network,mac}_header). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:20 -07:00
Arnaldo Carvalho de Melo	9c70220b73	[SK_BUFF]: Introduce skb_transport_header(skb) For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo	ea2ae17d64	[SK_BUFF]: Introduce skb_transport_offset() For the quite common 'skb->h.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:16 -07:00
Arnaldo Carvalho de Melo	badff6d01a	[SK_BUFF]: Introduce skb_reset_transport_header(skb) For the common, open coded 'skb->h.raw = skb->data' operation, so that we can later turn skb->h.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple cases: skb->h.raw = skb->data; skb->h.raw = {skb_push\|[__]skb_pull}() The next ones will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:15 -07:00
Stephen Hemminger	0e1256ffd1	[NET]: show bound packet types Show what protocols are bound to what packet types in /proc/net/ptype Uses kallsyms to decode function pointers if possible. Example: Type Device Function ALL eth1 packet_rcv_spkt+0x0 0800 ip_rcv+0x0 0806 arp_rcv+0x0 86dd :ipv6:ipv6_rcv+0x0 Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:04 -07:00
Stephen Hemminger	f690808e17	[NET]: make seq_operations const The seq_file operations stuff can be marked constant to get it out of dirty cache. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:03 -07:00
Stephen Hemminger	6b2bedc3a6	[NET]: network dev read_mostly For Eric, mark packet type and network device watermarks as read mostly. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:02 -07:00
Arnaldo Carvalho de Melo	d56f90a7c9	[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo	c1d2bbe1cd	[SK_BUFF]: Introduce skb_reset_network_header(skb) For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo	98e399f82a	[SK_BUFF]: Introduce skb_mac_header() For the places where we need a pointer to the mac header, it is still legal to touch skb->mac.raw directly if just adding to, subtracting from or setting it to another layer header. This one also converts some more cases to skb_reset_mac_header() that my regex missed as it had no spaces before nor after '=', ugh. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:41 -07:00
Arnaldo Carvalho de Melo	459a98ed88	[SK_BUFF]: Introduce skb_reset_mac_header(skb) For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:32 -07:00
Stephen Hemminger	6f05f62971	[NET]: deinline some functions Several functions are marked inline or forced inline, but it would be better to let the compiler decide. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:14 -07:00
Eric Dumazet	b7aa0bf70c	[NET]: convert network timestamps to ktime_t We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:34 -07:00
Eric W. Biederman	927498217c	[PATCH] net: Ignore sysfs network device rename bugs. The generic networking code ensures that no two networking devices have the same name, so there is no time except when sysfs has implementation bugs that device_rename when called from dev_change_name will fail. The current error handling for errors from device_rename in dev_change_name is wrong and results in an unusable and unrecoverable network device if device_rename is happens to return an error. This patch removes the buggy error handling. Which confines the mess when device_rename hits a problem to sysfs, instead of propagating it the rest of the network stack. Making linux a little more robust. Without this patch you can observe what happens when sysfs has a bug when CONFIG_SYSFS_DEPRECATED is not set and you attempt to rename a real network device to a name like (broken_parity_status, device, modalias, power, resource2, subsystem_vendor, class, driver, irq, msi_bus, resource, subsystem, uevent, config, enable, local_cpus, numa_node, resource0, subsystem_device, vendor) Greg has a patch that fixes the sysfs bugs but he doesn't trust it for a 2.6.21 timeframe. This patch which just ignores errors should be safe and it keeps the system from going completely wacky. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-04 08:51:52 -07:00
Patrick McHardy	c01003c205	[IFB]: Fix crash on input device removal The input_device pointer is not refcounted, which means the device may disappear while packets are queued, causing a crash when ifb passes packets with a stale skb->dev pointer to netif_rx(). Fix by storing the interface index instead and do a lookup where neccessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-29 11:46:52 -07:00

1 2 3 4 5 ...

338 Commits