Commit Graph

6175 Commits

Author SHA1 Message Date
Pavel Emelyanov
4ae289444b [NEIGH]: Ensure that pneigh_lookup is protected with RTNL
The pnigh_lookup is used to lookup proxy entries and to 
create them in case lookup failed. 

However, the "creation" code does not perform the re-lookup
after GFP_KERNEL allocation. This is done because the code
is expected to be protected with the RTNL lock, so add the 
assertion (mainly to address future questions from new network 
developers like me :) ).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:54:15 -07:00
Denis V. Lunev
f1673ca52c [INET]: kmalloc+memset -> kzalloc in frag_alloc_queue
kmalloc + memset -> kzalloc in frag_alloc_queue

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:53:13 -07:00
Herbert Xu
e5bbef20e0 [IPV6]: Replace sk_buff ** with sk_buff * in input handlers
With all the users of the double pointers removed from the IPv6 input path,
this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input
handlers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:50:28 -07:00
Pavel Emelyanov
762cc40801 [INET]: Consolidate the xxx_put
These ones use the generic data types too, so move
them in one place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:43 -07:00
Pavel Emelyanov
4b6cb5d8e3 [INET]: Small cleanup for xxx_put after evictor consolidation
After the evictor code is consolidated there is no need in
passing the extra pointer to the xxx_put() functions.

The only place when it made sense was the evictor code itself.

Maybe this change must got with the previous (or with the
next) patch, but I try to make them shorter as much as
possible to simplify the review (but they are still large
anyway), so this change goes in a separate patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:43 -07:00
Pavel Emelyanov
8e7999c44e [INET]: Consolidate the xxx_evictor
The evictors collect some statistics for ipv4 and ipv6,
so make it return the number of evicted queues and account
them all at once in the caller.

The XXX_ADD_STATS_BH() macros are just for this case,
but maybe there are places in code, that can make use of
them as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:42 -07:00
Pavel Emelyanov
1e4b82873a [INET]: Consolidate the xxx_frag_destroy
To make in possible we need to know the exact frag queue
size for inet_frags->mem management and two callbacks:

 * to destoy the skb (optional, used in conntracks only)
 * to free the queue itself (mandatory, but later I plan to
   move the allocation and the destruction of frag_queues
   into the common place, so this callback will most likely
   be optional too).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:42 -07:00
Pavel Emelyanov
321a3a99e4 [INET]: Consolidate xxx_the secret_rebuild
This code works with the generic data types as well, so
move this into inet_fragment.c

This move makes it possible to hide the secret_timer
management and the secret_rebuild routine completely in
the inet_fragment.c

Introduce the ->hashfn() callback in inet_frags() to get
the hashfun for a given inet_frag_queue() object.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:41 -07:00
Pavel Emelyanov
277e650ddf [INET]: Consolidate the xxx_frag_kill
Since now all the xxx_frag_kill functions now work
with the generic inet_frag_queue data type, this can
be moved into a common place.

The xxx_unlink() code is moved as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:41 -07:00
Pavel Emelyanov
04128f233f [INET]: Collect common frag sysctl variables together
Some sysctl variables are used to tune the frag queues
management and it will be useful to work with them in
a common way in the future, so move them into one
structure, moreover they are the same for all the frag
management codes.

I don't place them in the existing inet_frags object,
introduced in the previous patch for two reasons:

 1. to keep them in the __read_mostly section;
 2. not to export the whole inet_frags objects outside.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:40 -07:00
Pavel Emelyanov
7eb95156d9 [INET]: Collect frag queues management objects together
There are some objects that are common in all the places
which are used to keep track of frag queues, they are:

 * hash table
 * LRU list
 * rw lock
 * rnd number for hash function
 * the number of queues
 * the amount of memory occupied by queues
 * secret timer

Move all this stuff into one structure (struct inet_frags)
to make it possible use them uniformly in the future. Like
with the previous patch this mostly consists of hunks like

-    write_lock(&ipfrag_lock);
+    write_lock(&ip4_frags.lock);

To address the issue with exporting the number of queues and
the amount of memory occupied by queues outside the .c file
they are declared in, I introduce a couple of helpers.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:39 -07:00
Pavel Emelyanov
5ab11c98d3 [INET]: Move common fields from frag_queues in one place.
Introduce the struct inet_frag_queue in include/net/inet_frag.h
file and place there all the common fields from three structs:

 * struct ipq in ipv4/ip_fragment.c
 * struct nf_ct_frag6_queue in nf_conntrack_reasm.c
 * struct frag_queue in ipv6/reassembly.c

After this, replace these fields on appropriate structures with
this structure instance and fix the users to use correct names
i.e. hunks like

-    atomic_dec(&fq->refcnt);
+    atomic_dec(&fq->q.refcnt);

(these occupy most of the patch)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:38 -07:00
Ilpo Järvinen
f885c5b08e [TCP]: high_seq parameter removed (all callers use tp->high_seq)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:37 -07:00
Patrick McHardy
ad643a793b [IPV6]: Uninline netfilter okfns
Uninline netfilter okfns for those cases where gcc can generate tail-calls.

Before:
   text    data     bss     dec     hex filename
8994153 1016524  524652 10535329         a0c1a1 vmlinux

After:
   text    data     bss     dec     hex filename
8992761 1016524  524652 10533937         a0bc31 vmlinux
-------------------------------------------------------
  -1392

All cases have been verified to generate tail-calls with and without netfilter.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:36 -07:00
Patrick McHardy
9c2842bd94 [BRIDGE]: Remove SKB share checks in br_nf_pre_routing().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:35 -07:00
Patrick McHardy
861d048607 [IPV4]: Uninline netfilter okfns
Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc
can generate tail calls for some of the netfilter hook okfn invocations,
so there is no need to inline the functions anymore. This caused huge
code bloat since we ended up with one inlined version and one out-of-line
version since we pass the address to nf_hook_slow.

Before:
   text    data     bss     dec     hex filename
8997385 1016524  524652 10538561         a0ce41 vmlinux

After:
   text    data     bss     dec     hex filename
8994009 1016524  524652 10535185         a0c111 vmlinux
-------------------------------------------------------
  -3376

All cases have been verified to generate tail-calls with and without
netfilter. The okfns in ipmr and xfrm4_input still remain inline because
gcc can't generate tail-calls for them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:35 -07:00
Herbert Xu
a030847e9f [NET]: Avoid copying TCP packets unnecessarily
TCP packets all have writable heads, that is, even though it's cloned, it is
writable up to the end of the TCP header.  This patch makes skb_checksum_help
aware of this fact by using skb_clone_writable and avoiding a copy for TCP.

I've also modified the BUG_ON tests to be unsigned.  The only case where this
makes a difference is if csum_start points to a location before skb->data.
Since skb->data should always include the header where the checksum field
is (and all currently callers adhere to that), this change is safe and may
uncover bugs later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:34 -07:00
Herbert Xu
172a863f2b [NET]: Fix csum_start update in pskb_expand_head
I got confused by the dual nature of the off variable in the
function pskb_expand_head.  The csum_start offset should use
nhead instead of off which can change depending on whether we
are using offsets or pointers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:33 -07:00
Jesper Juhl
f937f1f46b [NETLINK]: Don't leak 'listeners' in netlink_kernel_create()
The Coverity checker spotted that we'll leak the storage allocated
to 'listeners' in netlink_kernel_create() when the
  if (!nl_table[unit].registered)
check is false.

This patch avoids the leak.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:32 -07:00
Adrian Bunk
1dff92e09e [IPV6] __inet6_csk_dst_store(): fix check-after-use
The Coverity checker spotted that we have already oops'ed if "dst" was
NULL.

Since "dst" being NULL doesn't seem to be possible at this point this
patch removes the NULL check.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:32 -07:00
Herbert Xu
65c8846660 [IPV6]: Avoid skb_copy/pskb_copy/skb_realloc_headroom on input
This patch replaces unnecessary uses of skb_copy by pskb_expand_head
on the IPv6 input path.

This allows us to remove the double pointers later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:31 -07:00
Herbert Xu
f61944efdf [IPV6]: Make ipv6_frag_rcv return the same packet
This patch implements the same change taht was done to ip_defrag.  It
makes ipv6_frag_rcv return the last packet received of a train of fragments
rather than the head of that sequence.

This allows us to get rid of the sk_buff ** argument later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:30 -07:00
Herbert Xu
3db05fea51 [NETFILTER]: Replace sk_buff ** with sk_buff *
With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:29 -07:00
Herbert Xu
2ca7b0ac02 [NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroom
This patch replaces unnecessary uses of skb_copy, pskb_copy and
skb_realloc_headroom by functions such as skb_make_writable and
pskb_expand_head.

This allows us to remove the double pointers later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:28 -07:00
Herbert Xu
af1e1cf073 [IPVS]: Replace local version of skb_make_writable
This patch removes the IPVS-specific version of skb_make_writable and
replaces it with the netfilter one.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:28 -07:00
Herbert Xu
37d4187922 [NETFILTER]: Do not copy skb in skb_make_writable
Now that all callers of netfilter can guarantee that the skb is not shared,
we no longer have to copy the skb in skb_make_writable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:27 -07:00
Herbert Xu
7b995651e3 [BRIDGE]: Unshare skb upon entry
Due to the special location of the bridging hook, it should never see a
shared packet anyway (certainly not with any in-kernel code).  So it
makes sense to unshare the skb there if necessary as that will greatly
simplify the code below it (in particular, netfilter).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:27 -07:00
Herbert Xu
f697c3e8b3 [NET]: Avoid unnecessary cloning for ingress filtering
As it is we always invoke pt_prev before ing_filter, even if there are no
ingress filters attached.  This can cause unnecessary cloning in pt_prev.

This patch changes it so that we only invoke pt_prev if there are ingress
filters attached.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:26 -07:00
Herbert Xu
776c729e8d [IPV4]: Change ip_defrag to return an integer
Now that ip_frag always returns the packet given to it on input, we can
change it to return an integer indicating error instead.  This patch does
that and updates all its callers accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:25 -07:00
Herbert Xu
1706d58763 [IPV4]: Make ip_defrag return the same packet
This patch is a bit of a hack.  However it is worth it if you consider that
this is the only reason why we have to carry around the struct sk_buff **
pointers in netfilter.

It makes ip_defrag always return the packet that was given to it on input.
It does this by cloning the packet and replacing its original contents with
the head fragment if necessary.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:25 -07:00
Herbert Xu
e0053ec07e [SKBUFF]: Add skb_morph
This patch creates a new function skb_morph that's just like skb_clone
except that it lets user provide the spare skb that will be overwritten
by the one that's to be cloned.

This will be used by IP fragment reassembly so that we get back the same
skb that went in last (rather than the head skb that we get now which
requires us to carry around double pointers all over the place).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:24 -07:00
Herbert Xu
dec18810c5 [SKBUFF]: Merge common code between copy_skb_header and skb_clone
This patch creates a new function __copy_skb_header to merge the common
code between copy_skb_header and skb_clone.  Having two functions which
are largely the same is a source of wasted labour as well as confusion.

In fact the tc_verd stuff is almost certainly a bug since it's treated
differently in skb_clone compared to the callers of copy_skb_header
(skb_copy/pskb_copy/skb_copy_expand).

I've kept that difference in tact with a comment added asking for
clarification.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:24 -07:00
Linus Torvalds
f4921aff5b Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
  NFSv4: Fix a typo in nfs_inode_reclaim_delegation
  NFS: Add a boot parameter to disable 64 bit inode numbers
  NFS: nfs_refresh_inode should clear cache_validity flags on success
  NFS: Fix a connectathon regression in NFSv3 and NFSv4
  NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
  SUNRPC: Don't call xprt_release in call refresh
  SUNRPC: Don't call xprt_release() if call_allocate fails
  SUNRPC: Fix buggy UDP transmission
  [23/37] Clean up duplicate includes in
  [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
  SUNRPC: Use correct type in buffer length calculations
  SUNRPC: Fix default hostname created in rpc_create()
  nfs: add server port to rpc_pipe info file
  NFS: Get rid of some obsolete macros
  NFS: Simplify filehandle revalidation
  NFS: Ensure that nfs_link() returns a hashed dentry
  NFS: Be strict about dentry revalidation when doing exclusive create
  NFS: Don't zap the readdir caches upon error
  NFS: Remove the redundant nfs_reval_fsid()
  NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
  ...

Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
2007-10-15 10:47:35 -07:00
Linus Torvalds
b5869ce7f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (140 commits)
  sched: sync wakeups preempt too
  sched: affine sync wakeups
  sched: guest CPU accounting: maintain guest state in KVM
  sched: guest CPU accounting: maintain stats in account_system_time()
  sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
  sched: guest CPU accounting: add guest-CPU /proc/stat field
  sched: domain sysctl fixes: add terminator comment
  sched: domain sysctl fixes: do not crash on allocation failure
  sched: domain sysctl fixes: unregister the sysctl table before domains
  sched: domain sysctl fixes: use for_each_online_cpu()
  sched: domain sysctl fixes: use kcalloc()
  Make scheduler debug file operations const
  sched: enable wake-idle on CONFIG_SCHED_MC=y
  sched: reintroduce topology.h tunings
  sched: allow the immediate migration of cache-cold tasks
  sched: debug, improve migration statistics
  sched: debug: increase width of debug line
  sched: activate task_hot() only on fair-scheduled tasks
  sched: reintroduce cache-hot affinity
  sched: speed up context-switches a bit
  ...
2007-10-15 08:22:16 -07:00
Linus Torvalds
37ca506adc Merge branch 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux
* 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux:
  knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME
  knfsd: nfsv4 delegation recall should take reference on client
  knfsd: don't shutdown callbacks until nfsv4 client is freed
  knfsd: let nfsd manage timing out its own leases
  knfsd: Add source address to sunrpc svc errors
  knfsd: 64 bit ino support for NFS server
  svcgss: move init code into separate function
  knfsd: remove code duplication in nfsd4_setclientid()
  nfsd warning fix
  knfsd: fix callback rpc cred
  knfsd: move nfsv4 slab creation/destruction to module init/exit
  knfsd: spawn kernel thread to probe callback channel
  knfsd: nfs4 name->id mapping not correctly parsing negative downcall
  knfsd: demote some printk()s to dprintk()s
  knfsd: cleanup of nfsd4 cmp_* functions
  knfsd: delete code made redundant by map_new_errors
  nfsd: fix horrible indentation in nfsd_setattr
  nfsd: remove unused cache_for_each macro
  nfsd: tone down inaccurate dprintk
2007-10-15 08:16:53 -07:00
Ingo Molnar
71e20f1873 sched: affine sync wakeups
make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Al Viro
f53f4137ba fix endianness bug in inet_lro
all uses of and almost all assignments to lro_desc->tcp_ack assume that it's
net-endian; one converts net-endian to host-endian and sticks it in
lro_desc->tcp_ack.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:52 -07:00
Al Viro
9df7c98a0f inet_lro: trivial endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:52 -07:00
Al Viro
411223c01a fix breakage in sctp getsockopt
copy_to_user() into on-stack array

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:51 -07:00
Randy Dunlap
c4ea43c552 net core: fix kernel-doc for new function parameters
Fix networking code kernel-doc for newly added parameters.

Warning(linux-2.6.23-git2//net/core/sock.c:879): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:570): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:594): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:617): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:641): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:667): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:722): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:959): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:1195): No description found for parameter 'dev'
Warning(linux-2.6.23-git2//net/core/dev.c:2105): No description found for parameter 'n'
Warning(linux-2.6.23-git2//net/core/dev.c:3272): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:3445): No description found for parameter 'net'
Warning(linux-2.6.23-git2//include/linux/netdevice.h:1301): No description found for parameter 'cpu'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-13 09:52:26 -07:00
Greg Kroah-Hartman
19c38de88a kobjects: fix up improper use of the kobject name field
A number of different drivers incorrect access the kobject name field
directly.  This is not correct as the name might not be in the array.
Use the proper accessor function instead.
2007-10-12 14:51:02 -07:00
Kay Sievers
7eff2e7a8b Driver core: change add_uevent_var to use a struct
This changes the uevent buffer functions to use a struct instead of a
long list of parameters. It does no longer require the caller to do the
proper buffer termination and size accounting, which is currently wrong
in some places. It fixes a known bug where parts of the uevent
environment are overwritten because of wrong index calculations.

Many thanks to Mathieu Desnoyers for finding bugs and improving the
error handling.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:01 -07:00
Ilpo Järvinen
b08d6cb22c [TCP]: Limit processing lost_retrans loop to work-to-do cases
This addition of lost_retrans_low to tcp_sock might be
unnecessary, it's not clear how often lost_retrans worker is
executed when there wasn't work to do.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 17:36:13 -07:00
Ilpo Järvinen
f785a8e28b [TCP]: Fix lost_retrans loop vs fastpath problems
Detection implemented with lost_retrans must work also when
fastpath is taken, yet most of the queue is skipped including
(very likely) those retransmitted skb's we're interested in.
This problem appeared when the hints got added, which removed
a need to always walk over the whole write queue head.
Therefore decicion for the lost_retrans worker loop entry must
be separated from the sacktag processing more than it was
necessary before.

It turns out to be problematic to optimize the worker loop
very heavily because ack_seqs of skb may have a number of
discontinuity points. Maybe similar approach as currently is
implemented could be attempted but that's becoming more and
more complex because the trend is towards less skb walking
in sacktag marker. Trying a simple work until all rexmitted
skbs heve been processed approach.

Maybe after(highest_sack_end_seq, tp->high_seq) checking is not
sufficiently accurate and causes entry too often in no-work-to-do
cases. Since that's not known, I've separated solution to that
from this patch.

Noticed because of report against a related problem from TAKANO
Ryousei <takano@axe-inc.co.jp>. He also provided a patch to
that part of the problem. This patch includes solution to it
(though this patch has to use somewhat different placement).
TAKANO's description and patch is available here:

  http://marc.info/?l=linux-netdev&m=119149311913288&w=2

...In short, TAKANO's problem is that end_seq the loop is using
not necessarily the largest SACK block's end_seq because the
current ACK may still have higher SACK blocks which are later
by the loop.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 17:35:41 -07:00
Ilpo Järvinen
4cd829995b [TCP]: No need to re-count fackets_out/sacked_out at RTO
Both sacked_out and fackets_out are directly known from how
parameter. Since fackets_out is accurate, there's no need for
recounting (sacked_out was previously unnecessarily counted
in the loop anyway).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 17:34:57 -07:00
Ilpo Järvinen
d193594299 [TCP]: Extract tcp_match_queue_to_sack from sacktag code
This is necessary for upcoming DSACK bugfix. Reduces sacktag
length which is not very sad thing at all... :-)

Notice that there's a need to handle out-of-mem at caller's
place.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 17:34:25 -07:00
Ilpo Järvinen
f6fb128d27 [TCP]: Kill almost unused variable pcount from sacktag
It's on the way for future cutting of that function.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 17:33:55 -07:00
Ilpo Järvinen
3eec0047d9 [TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L
This condition (plain R) can arise at least in recovery that
is triggered after tcp_undo_loss. There isn't any reason why
they should not be marked as lost, not marking makes in_flight
estimator to return too large values.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 17:33:11 -07:00
Ilpo Järvinen
16e906812f [TCP]: Add bytes_acked (ABC) clearing to FRTO too
I was reading tcp_enter_loss while looking for Cedric's bug and
noticed bytes_acked adjustment is missing from FRTO side.

Since bytes_acked will only be used in tcp_cong_avoid, I think
it's safe to assume RTO would be spurious. During FRTO cwnd
will be not controlled by tcp_cong_avoid and if FRTO calls for
conventional recovery, cwnd is adjusted and the result of wrong
assumption is cleared from bytes_acked. If RTO was in fact
spurious, we did normal ABC already and can continue without
any additional adjustments.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 17:32:31 -07:00
Brian Haley
4953f0fcc0 [IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2
From RFC 3493, Section 5.2:

       IPV6_MULTICAST_IF

          Set the interface to use for outgoing multicast packets.  The
          argument is the index of the interface to use.  If the
          interface index is specified as zero, the system selects the
          interface (for example, by looking up the address in a routing
          table and using the resulting interface).

This patch adds support for (index == 0) to reset the value to it's 
original state, allowing the system to choose the best interface.  IPv4 
already behaves this way.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 14:39:29 -07:00
Jan Engelhardt
73aaf9355b [NETFILTER]: x_tables: add missing ip6t_modulename aliases
The patch will add MODULE_ALIAS("ip6t_<modulename>") where missing,
otherwise you will get

	ip6tables: No chain/target/match by that name

when xt_<modulename> is not already loaded.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 14:36:40 -07:00
Jozsef Kadlecsik
17311393f9 [NETFILTER]: nf_conntrack_tcp: fix connection reopening
With your description I could reproduce the bug and actually you were
completely right: the code above is incorrect. Somehow I was able to
misread RFC1122 and mixed the roles :-(:

   When a connection is >>closed actively<<, it MUST linger in
   TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
   However, it MAY >>accept<< a new SYN from the remote TCP to
   reopen the connection directly from TIME-WAIT state, if it:
   [...]

The fix is as follows: if the receiver initiated an active close, then the
sender may reopen the connection - otherwise try to figure out if we hold
a dead connection.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-11 14:35:52 -07:00
David S. Miller
28f7b0360f [NETLINK]: fib_frontend build fixes
1) fibnl needs to be declared outside of config ifdefs,
   and also should not be explicitly initialized to NULL
2) nl_fib_input() args are wrong for netlink_kernel_create()
   input method

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:32:39 -07:00
Pierre Ynard
31910575a9 [IPv6]: Export userland ND options through netlink (RDNSS support)
As discussed before, this patch provides userland with a way to access
relevant options in Router Advertisements, after they are processed
and validated by the kernel. Extra options are processed in a generic
way; this patch only exports RDNSS options described in RFC5006, but
support to control which options are exported could be easily added.

A new rtnetlink message type is defined, to transport Neighbor
Discovery options, along with optional context information. At the
moment only the address of the router sending an RDNSS option is
included, but additional attributes may be later defined, if needed by
new use cases.

Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:22:05 -07:00
Denis V. Lunev
cd40b7d398 [NET]: make netlink user -> kernel interface synchronious
This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced 
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:15:29 -07:00
Denis V. Lunev
aed815601f [NET]: unify netlink kernel socket recognition
There are currently two ways to determine whether the netlink socket is a
kernel one or a user one. This patch creates a single inline call for
this purpose and unifies all the calls in the af_netlink.c

No similar calls are found outside af_netlink.c.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:14:32 -07:00
Denis V. Lunev
7ee015e0fa [NET]: cleanup 3rd argument in netlink_sendskb
netlink_sendskb does not use third argument. Clean it and save a couple of
bytes.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:14:03 -07:00
Denis V. Lunev
3b71535f35 [NET]: Make netlink processing routines semi-synchronious (inspired by rtnl) v2
The code in netfilter/nfnetlink.c and in ./net/netlink/genetlink.c looks
like outdated copy/paste from rtnetlink.c. Push them into sync with the
original.

Changes from v1:
- deleted comment in nfnetlink_rcv_msg by request of Patrick McHardy

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:13:32 -07:00
Denis V. Lunev
1536cc0d55 [NET]: rtnl_unlock cleanups
There is no need to process outstanding netlink user->kernel packets
during rtnl_unlock now. There is no rtnl_trylock in the rtnetlink_rcv
anymore.

Normal code path is the following:
netlink_sendmsg
   netlink_unicast
       netlink_sendskb
           skb_queue_tail
           netlink_data_ready
               rtnetlink_rcv
                   mutex_lock(&rtnl_mutex);
                   netlink_run_queue(sk, qlen, &rtnetlink_rcv_msg);
                   mutex_unlock(&rtnl_mutex);

So, it is possible, that packets can be present in the rtnl->sk_receive_queue
during rtnl_unlock, but there is no need to process them at that moment as
rtnetlink_rcv for that packet is pending.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:12:58 -07:00
Tony Battersby
fa8705b00a [NET]: sanitize kernel_accept() error path
If kernel_accept() returns an error, it may pass back a pointer to
freed memory (which the caller should ignore).  Make it pass back NULL
instead for better safety.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:09:04 -07:00
Stephen Hemminger
227b60f510 [INET]: local port range robustness
Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 17:30:46 -07:00
Stephen Hemminger
0639300900 [SCTP]: port randomization
Add port randomization rather than a simple fixed rover
for use with SCTP.  This makes it act similar to TCP, UDP, DCCP
when allocating ports.

No longer need port_alloc_lock as well (suggestion by Brian Haley).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 17:30:18 -07:00
Patrick McHardy
3c0cfc1358 [NET_SCHED]: Show timer resolution instead of clock resolution in /proc/net/psched
The fourth parameter of /proc/net/psched is supposed to show the timer
resultion and is used by HTB userspace to calculate the necessary
burst rate. Currently we show the clock resolution, which results in a
too low burst rate when the two differ.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:59 -07:00
Herbert Xu
631a6698d0 [IPSEC]: Move IP protocol setting from transforms into xfrm4_input.c
This patch makes the IPv4 x->type->input functions return the next protocol
instead of setting it directly.  This is identical to how we do things in
IPv6 and will help us merge common code on the input path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:56 -07:00
Herbert Xu
ceb1eec829 [IPSEC]: Move IP length/checksum setting out of transforms
This patch moves the setting of the IP length and checksum fields out of
the transforms and into the xfrmX_output functions.  This would help future
efforts in merging the transforms themselves.

It also adds an optimisation to ipcomp due to the fact that the transport
offset is guaranteed to be zero.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:56 -07:00
Herbert Xu
87bdc48d30 [IPSEC]: Get rid of ipv6_{auth,esp,comp}_hdr
This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since
they're identical to the IPv4 versions.  Duplicating them would only create
problems for ourselves later when we need to add things like extended
sequence numbers.

I've also added transport header type conversion headers for these types
which are now used by the transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:55 -07:00
Herbert Xu
37fedd3aab [IPSEC]: Use IPv6 calling convention as the convention for x->mode->output
The IPv6 calling convention for x->mode->output is more general and could
help an eventual protocol-generic x->type->output implementation.  This
patch adopts it for IPv4 as well and modifies the IPv4 type output functions
accordingly.

It also rewrites the IPv6 mac/transport header calculation to be based off
the network header where practical.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:54 -07:00
Herbert Xu
7b277b1a5f [IPSEC]: Set skb->data to payload in x->mode->output
This patch changes the calling convention so that on entry from
x->mode->output and before entry into x->type->output skb->data
will point to the payload instead of the IP header.

This is essentially a redistribution of skb_push/skb_pull calls
with the aim of minimising them on the common path of tunnel +
ESP.

It'll also let us use the same calling convention between IPv4
and IPv6 with the next patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:54 -07:00
Herbert Xu
bee0b40c06 [IPSEC] beet: Fix extension header support on output
The beet output function completely kills any extension headers by replacing
them with the IPv6 header.  This is because it essentially ignores the
result of ip6_find_1stfragopt by simply acting as if there aren't any
extension headers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:53 -07:00
Herbert Xu
8bd1707504 [IPSEC] esp: Remove NAT-T checksum invalidation for BEET
I pointed this out back when this patch was first proposed but it looks like
it got lost along the way.

The checksum only needs to be ignored for NAT-T in transport mode where
we lose the original inner addresses due to NAT.  With BEET the inner
addresses will be intact so the checksum remains valid.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:53 -07:00
Mitsuru Chinen
f24e3d658c [IPV6]: Defer IPv6 device initialization until a valid qdisc is specified
To judge the timing for DAD, netif_carrier_ok() is used. However,
there is a possibility that dev->qdisc stays noop_qdisc even if
netif_carrier_ok() returns true. In that case, DAD NS is not sent out.
We need to defer the IPv6 device initialization until a valid qdisc
is specified.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:52 -07:00
Pavel Emelyanov
9b77265235 [NET]: Remove double dev->flags checking when calling dev_close()
The unregister_netdevice() and dev_change_net_namespace()
both check for dev->flags to be IFF_UP before calling the
dev_close(), but the dev_close() checks for IFF_UP itself,
so remove those unneeded checks.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:52 -07:00
Ilpo Järvinen
1c1e87edb9 [TCP]: Separate lost_retrans loop into own function
Follows own function for each task principle, this is really
somewhat separate task being done in sacktag. Also reduces
indentation.

In addition, added ack_seq local var to break some long
lines & fixed coding style things.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:51 -07:00
Pavel Emelyanov
ec93103519 [SUNRPC]: Make the sunrpc use the seq_open_private()
Just switch to the consolidated code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:36 -07:00
Pavel Emelyanov
a662d4cb50 [IRDA]: Make the IRDA use the seq_open_private()
Just switch to the consolidated code

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:35 -07:00
Pavel Emelyanov
31164088d7 [DECNET]: Make decnet code use the seq_open_private()
Just switch to the consolidated code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:34 -07:00
Pavel Emelyanov
e2da591338 [NETFILTER]: Make netfilter code use the seq_open_private
Just switch to the consolidated calls.

ipt_recent() has to initialize the private, so use
the __seq_open_private() helper.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:34 -07:00
Pavel Emelyanov
cf7732e4cc [NET]: Make core networking code use seq_open_private
This concerns the ipv4 and ipv6 code mostly, but also the netlink
and unix sockets.

The netlink code is an example of how to use the __seq_open_private()
call - it saves the net namespace on this private.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:33 -07:00
Mattias Nissler
e2f036da2f [PATCH] mac80211: Defer setting of RX_FLAG_DECRYPTED.
The decryption handlers will skip the frame if the RX_FLAG_DECRYPTED
flag is set, so the early flag setting introduced by Johannes breaks
decryption. To work around this, call the handlers first and then set
the flag.

Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:55:23 -07:00
John W. Linville
0654ff055c [PATCH] ieee80211_if_set_type: make check for master dev more explicit
Problem description by Daniel Drake <dsd@gentoo.org>:

"This sequence of events causes loss of connectivity:

<plug in>
<associate as normal in managed mode>
ifconfig eth7 down
iwconfig eth7 mode monitor
ifconfig eth7 up
ifconfig eth7 down
iwconfig eth7 mode managed
<associate as normal>

At this point you are associated but TX does not work. This is because
the eth7 hard_start_xmit is still ieee80211_monitor_start_xmit."

The problem is caused by ieee80211_if_set_type checking for a non-zero
hard_start_xmit pointer value in order to avoid changing that value for
master devices.  The fix is to make that check more explicitly linked to
master devices rather than simply checking if the value has been
previously set.

CC: Daniel Drake <dsd@gentoo.org>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:55:23 -07:00
Herbert Xu
b7c6538cd8 [IPSEC]: Move state lock into x->type->output
This patch releases the lock on the state before calling x->type->output.
It also adds the lock to the spots where they're currently needed.

Most of those places (all except mip6) are expected to disappear with
async crypto.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:03 -07:00
Herbert Xu
050f009e16 [IPSEC]: Lock state when copying non-atomic fields to user-space
This patch adds locking so that when we're copying non-atomic fields such as
life-time or coaddr to user-space we don't get a partial result.

For af_key I've changed every instance of pfkey_xfrm_state2msg apart from
expiration notification to include the keys and life-times.  This is in-line
with XFRM behaviour.

The actual cases affected are:

* pfkey_getspi: No change as we don't have any keys to copy.
* key_notify_sa:
	+ ADD/UPD: This wouldn't work otherwise.
	+ DEL: It can't hurt.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:02 -07:00
Herbert Xu
68325d3b12 [XFRM] user: Move attribute copying code into copy_to_user_state_extra
Here's a good example of code duplication leading to code rot.  The
notification patch did its own netlink message creation for xfrm states.
It duplicated code that was already in dump_one_state.  Guess what, the
next time (and the time after) when someone updated dump_one_state the
notification path got zilch.

This patch moves that code from dump_one_state to copy_to_user_state_extra
and uses it in xfrm_notify_sa too.  Unfortunately whoever updates this
still needs to update xfrm_sa_len since the notification path wants to
know the exact size for allocation.

At least I've added a comment saying so and if someone still forgest, we'll
have a WARN_ON telling us so.

I also changed the security size calculation to use xfrm_user_sec_ctx since
that's what we actually put into the skb.  However it makes no practical
difference since it has the same size as xfrm_sec_ctx.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:02 -07:00
Herbert Xu
658b219e93 [IPSEC]: Move common code into xfrm_alloc_spi
This patch moves some common code that conceptually belongs to the xfrm core
from af_key/xfrm_user into xfrm_alloc_spi.

In particular, the spin lock on the state is now taken inside xfrm_alloc_spi.
Previously it also protected the construction of the response PF_KEY/XFRM
messages to user-space.  This is inconsistent as other identical constructions
are not protected by the state lock.  This is bad because they in fact should
be protected but only in certain spots (so as not to hold the lock for too
long which may cause packet drops).

The SPI byte order conversion has also been moved.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:01 -07:00
Herbert Xu
75ba28c633 [IPSEC]: Remove gratuitous km wake-up events on ACQUIRE
There is no point in waking people up when creating/updating larval states
because they'll just go back to sleep again as larval states by definition
cannot be found by xfrm_state_find.

We should only wake them up when the larvals mature or die.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:01 -07:00
Herbert Xu
007f0211a8 [IPSEC]: Store IPv6 nh pointer in mac_header on output
Current the x->mode->output functions store the IPv6 nh pointer in the
skb network header.  This is inconvenient because the network header then
has to be fixed up before the packet can leave the IPsec stack.  The mac
header field is unused on output so we can use that to store this instead.

This patch does that and removes the network header fix-up in xfrm_output.

It also uses ipv6_hdr where appropriate in the x->type->output functions.

There is also a minor clean-up in esp4 to make it use the same code as
esp6 to help any subsequent effort to merge the two.

Lastly it kills two redundant skb_set_* statements in BEET that were
simply copied over from transport mode.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:00 -07:00
Herbert Xu
1ecafede83 [IPSEC]: Remove bogus ref count in xfrm_secpath_reject
Constructs of the form

	xfrm_state_hold(x);
	foo(x);
	xfrm_state_put(x);

tend to be broken because foo is either synchronous where this is totally
unnecessary or if foo is asynchronous then the reference count is in the
wrong spot.

In the case of xfrm_secpath_reject, the function is synchronous and therefore
we should just kill the reference count.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:59 -07:00
Pavel Emelyanov
32f0c4cbe4 [NETNS]: Don't memset() netns to zero manually
The newly created net namespace is set to 0 with memset()
in setup_net(). The setup_net() is also called for the
init_net_ns(), which is zeroed naturally as a global var.

So remove this memset and allocate new nets with the
kmem_cache_zalloc().

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:59 -07:00
Benjamin Thery
0a8891a0a4 [IPv6]: use container_of() macro in fib6_clean_node()
In ip6_fib.c, fib6_clean_node() casts a fib6_walker_t pointer to
a fib6_cleaner_t pointer assuming a struct fib6_walker_t (field 'w')
is the first field in struct fib6_walker_t.

To prevent any future problems that may occur if one day a field
is inadvertently inserted before the 'w' field in struct fib6_cleaner_t,
(and to improve readability), this patch uses the container_of() macro.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:58 -07:00
Pavel Emelyanov
4665079cbb [NETNS]: Move some code into __init section when CONFIG_NET_NS=n
With the net namespaces many code leaved the __init section,
thus making the kernel occupy more memory than it did before.
Since we have a config option that prohibits the namespace
creation, the functions that initialize/finalize some netns
stuff are simply not needed and can be freed after the boot.

Currently, this is almost not noticeable, since few calls
are no longer in __init, but when the namespaces will be
merged it will be possible to free more code. I propose to
use the __net_init, __net_exit and __net_initdata "attributes"
for functions/variables that are not used if the CONFIG_NET_NS
is not set to save more space in memory.

The exiting functions cannot just reside in the __exit section,
as noticed by David, since the init section will have
references on it and the compilation will fail due to modpost
checks. These references can exist, since the init namespace
never dies and the exit callbacks are never called. So I
introduce the __exit_refok attribute just like it is already
done with the __init_refok.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:58 -07:00
Ursula Braun
3607c44676 [8021Q]: transfer dev_id from real device
A net_device struct provides field dev_id. It is used for
unique ipv6 generation in case of shared network cards
(as for the OSA network cards of IBM System z).
If VLAN devices are built on top of such shared network cards,
this dev_id information needs to be transferred to the VLAN device.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:56 -07:00
Herbert Xu
45b17f48ea [IPSEC]: Move RO-specific output code into xfrm6_mode_ro.c
The lastused update check in xfrm_output can be done just as well in
the mode output function which is specific to RO.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:56 -07:00
Herbert Xu
cdf7e668d4 [IPSEC]: Unexport xfrm_replay_notify
Now that the only callers of xfrm_replay_notify are in xfrm, we can remove
the export.

This patch also removes xfrm_aevent_doreplay since it's now called in just
one spot.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:55 -07:00
Herbert Xu
436a0a4022 [IPSEC]: Move output replay code into xfrm_output
The replay counter is one of only two remaining things in the output code
that requires a lock on the xfrm state (the other being the crypto).  This
patch moves it into the generic xfrm_output so we can remove the lock from
the transforms themselves.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:54 -07:00
Herbert Xu
83815dea47 [IPSEC]: Move xfrm_state_check into xfrm_output.c
The functions xfrm_state_check and xfrm_state_check_space are only used by
the output code in xfrm_output.c so we can move them over.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:54 -07:00
Herbert Xu
406ef77c89 [IPSEC]: Move common output code to xfrm_output
Most of the code in xfrm4_output_one and xfrm6_output_one are identical so
this patch moves them into a common xfrm_output function which will live
in net/xfrm.

In fact this would seem to fix a bug as on IPv4 we never reset the network
header after a transform which may upset netfilter later on.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:53 -07:00
Herbert Xu
bc31d3b2c7 [IPSEC] ah: Remove keys from ah_data structure
The keys are only used during initialisation so we don't need to carry them
in esp_data.  Since we don't have to allocate them again, there is no need
to place a limit on the authentication key length anymore.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:53 -07:00
Herbert Xu
4b7137ff8f [IPSEC] esp: Remove keys from esp_data structure
The keys are only used during initialisation so we don't need to carry them
in esp_data.  Since we don't have to allocate them again, there is no need
to place a limit on the authentication key length anymore.

This patch also kills the unused auth.icv member.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:52 -07:00
Ursula Braun
f0703c80e5 [AF_IUCV]: postpone receival of iucv-packets
AF_IUCV socket programs may waste Linux storage, because af_iucv
allocates an skb whenever posted by the receive callback routine and
receives the message immediately.
Message receival is now postponed if data from previous callbacks has
not yet been transferred to the receiving socket program. Instead a
message handle is saved in a message queue as a reminder. Once
messages could be given to the receiving socket program, there is
an additional checking for entries in the message queue, followed
by skb allocation and message receival if applicable.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:51 -07:00
Heiko Carstens
57f2044803 [AF_IUCV]: remove static declarations from header file.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:51 -07:00
Jeff Garzik
14e3e07979 [NET]: split dev_ifsioc() according to locking
This always bugged me: dev_ioctl() called dev_ifsioc() either inside
read_lock(dev_base_lock) or rtnl_lock(), depending on the ioctl being
executed.

This change moves the ioctls executed inside dev_base_lock to a new
function, dev_ifsioc_locked().  Now the locking context is completely
clear to the reader.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:49 -07:00
Stephen Hemminger
cfcabdcc2d [NET]: sparse warning fixes
Fix a bunch of sparse warnings. Mostly about 0 used as
NULL pointer, and shadowed variable declarations.
One notable case was that hash size should have been unsigned.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:48 -07:00
Ilpo Järvinen
de83c058af [TCP]: "Annotate" another fackets_out state reset
This should no longer be necessary because fackets_out is
accurate. It indicates bugs elsewhere, thus report it.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:48 -07:00
Ilpo Järvinen
29d0a309d1 [TCP]: Fix two off-by-one errors in fackets_out adjusting logic
1) Passing wrong skb to tcp_adjust_fackets_out could corrupt
fastpath_cnt_hint as tcp_skb_pcount(next_skb) is not included
to it if hint points exactly to the next_skb (it's lagging
behind, see sacktag).

2) When fastpath_skb_hint is put backwards to avoid dangling
skb reference, the skb's pcount must also be removed from count
(not included like above).

Reported by Cedric Le Goater <legoater@free.fr>

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:47 -07:00
Gerrit Renker
4a5409a5a8 [DCCP]: Twice the wrong reset code in receiving connection-Requests
This fixes two bugs in processing of connection-Requests in
v{4,6}_conn_request:

 1. Due to using the variable `reset_code', the Reset code generated
    internally by dccp_parse_options() is overwritten with the
    initialised value ("Too Busy") of reset_code, which is not what is
    intended.

 2. When receiving a connection-Request on a multicast or broadcast
    address, no Reset should be generated, to avoid storms of such
    packets. Instead of jumping to the `drop' label, the
    v{4,6}_conn_request functions now return 0. Below is why in my
    understanding this is correct:

    When the conn_request function returns < 0, then the caller,
    dccp_rcv_state_process(), returns 1. In all instances where
    dccp_rcv_state_process is called (dccp_v4_do_rcv, dccp_v6_do_rcv,
    and dccp_child_process), a return value of != 0 from
    dccp_rcv_state_process() means that a Reset is generated.

    If on the other hand the conn_request function returns 0, the
    packet is discarded and no Reset is generated.

Note: There may be a related problem when sending the Response, due to
the following.

	if (dccp_v6_send_response(sk, req, NULL))
		goto drop_and_free;
	/* ... */
	drop_and_free:
		return -1;

In this case, if send_response fails due to transmission errors, the
next thing that is generated is a Reset with a code "Too Busy". I
haven't been able to conjure up such a condition, but it might be good
to change the behaviour here also (not done by this patch).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:38 -07:00
Gerrit Renker
dcad856fe8 [DCCP]: Wrong format in printk
The elapsed time uses u32, but printk was using %d, not %u.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-10-10 16:54:36 -07:00
Gerrit Renker
451bc0473f [DCCP]: Tidy-up -- minisock initialisation
This

 * removes a declaration of a non-existent function
   __dccp_minisock_init;

 * shifts the initialisation function dccp_minisock_init() from
   options.c to minisocks.c, where it is more naturally expected to
   be.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:36 -07:00
Gerrit Renker
5e28599a6e [CCID2]: Sequence number wraparound issues
This replaces several uses of standard arithmetic with the DCCP
sequence number arithmetic functions. The problem here is that the
sequence number wrap-around was not taken into consideration.

 * Condition "seqp->ccid2s_seq <= prev->ccid2s_seq" has been replaced
   by

   	dccp_delta_seqno(seqp->ccid2s_seq, prev->ccid2s_seq) >= 0

   since if seqp is `before' prev, then the delta_seqno() is positive.

 * The test whether sequence numbers `a' and `b' are consecutive has
   the form

   	dccp_delta_seqno(a, b) == 1

 * Increment of ccid2hctx_rpseq could be done using dccp_inc_seqno(),
   but since here the incremented ccid2hctx_rpseq == seqno, used
   assignment instead.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:35 -07:00
Gerrit Renker
6c58324808 [CCID2]: Remove redundant case block
skb's passed to ccid2_hc_tx_send_packet() are headerless, the packet
type is decided later, in dccp_write_xmit(). Therefore the first test
of the switch/case block is always true, the others are never reached.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:35 -07:00
Gerrit Renker
ee196c2186 [CCID2]: Remove redundant BUG_ON
This removes a test for `val < 1' which would only have been triggered
when val < 0, due to a preceding test for 0.  Fixed by using an
unsigned type for cwnd (as in TCP) instead.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:34 -07:00
Gerrit Renker
7d9e8931f9 [CCID2]: Remove ugly BUG_ON
This removes an ugly BUG_ON which has been pointed out by Arnaldo.

Instead of freezing up the machine, a `critical' message is now issued
to the system log.

There is potential of doing this more gracefully (eg. there are a few
internal variables which could be updated despite the lack of memory),
but that requires more complicated changes to the algorithm; thus a
`FIXME' has been added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:34 -07:00
Gerrit Renker
cd1f7d347c [CCID2]: Simplify interface
This patch simplifies the interface of ccid2_hc_tx_alloc_seq():

   * ccid2_hc_tx_alloc_seq() is always called with an argument of
     CCID2_SEQBUF_LEN;

   * other code - ccid2_hc_tx_check_sanity() - even depends on the
     assumption that ccid2_hc_tx_alloc_seq() has been called with this
     particular size;

   * passing the `gfp_t' argument to ccid2_hc_tx_alloc_seq() is
     redundant with gfp_any().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:33 -07:00
Gerrit Renker
042d18f9f3 [DCCP]: Make all `debug' parameters bool
This just sets the parameter to bool, since debugging messages are
either on or off.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:32 -07:00
Gerrit Renker
7c559a9e44 [DCCP]: Add socket option to query the current MPS
This enables applications to query the current value of the Maximum
Packet Size via a socket option, suggested as a SHOULD in (RFC 4340,
p. 102).

This socket option is useful to avoid the annoying bail-out via
`-EMSGSIZE'.  In particular, as fragmentation is not currently
supported (and its use is partly discouraged in RFC 4340).

With this option, it is possible to size buffers accordingly, e.g.

	int buflen = dccp_get_cur_mps(sockfd);

	/* or */
	if (msgsize > dccp_get_cur_mps(sockfd))
		die("message is too large for this path");

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:31 -07:00
Gerrit Renker
bc8498721d [DCCP]: Wait for CCID
This performs a minor optimisation: when ccid_hc_tx_send_packet
returns a value greater zero, then the same call previously was done
again at the begin of the while loop in dccp_wait_for_ccid.

This patch exploits the available information and schedule-timeouts
directly instead.

Documentation also added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:31 -07:00
Tomas Winkler
478f8d2ba5 [MAC80211]: add sta_notify callback
This patch adds sta_notify callback and removes sta_table_notification
which was not used by any driver.
sta_notify() is essential for drivers that keeps notion of station
internally and need to be notified about removal or addition of a station
to the (I)BSS or assocation to an AP.

This version adds interface id to the parameter list
as suggested by Johannes Berg

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:21 -07:00
Johannes Berg
42613db760 [MAC80211]: implement cfg80211's change_interface hook
This implements the cfg80211 change_interface hook that changes the
type of an interface and cleans up the code a bit.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:20 -07:00
Michael Buesch
47f0c50220 [MAC80211]: Add association LED trigger
Many devices have LEDs to indicate the link status.
Export this functionality to drivers.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:20 -07:00
Johannes Berg
ddd3d2be85 [MAC80211]: make userspace-mlme a per-interface setting
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:19 -07:00
Johannes Berg
58d4185e36 [MAC80211]: improve radiotap injection
This improves radiotap injection by removing the shortcut over TX handlers
that led to BUGS when injecting frames without setting a rate and also
resulted in various other quirks. Now, TX handlers are run but some
information that was present in the radiotap header is used instead of
automatic settings.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Andy Green <andy@warmcat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:54:18 -07:00
Johannes Berg
628a140ba0 [MAC80211]: remove ALG_NONE
This "algorithm" is used only internally and is not useful.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Michael Buesch <mb@bu3sch.de>
Acked-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:18 -07:00
Johannes Berg
640845a563 [MAC80211]: use RX_FLAG_DECRYPTED for sw decrypted as well
This makes mac80211 set the RX_FLAG_DECRYPTED flag for frames
decrypted in software allowing us to handle some things more
uniformly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:17 -07:00
Johannes Berg
1990af8d14 [MAC80211]: consolidate decryption more
Currently, we have three RX handlers doing the decryption.
This patch changes it to have only one handler doing
everything, thereby getting rid of many duplicate checks.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
--
 net/mac80211/rx.c |   46 ++++++++++++----------------------------------
 1 files changed, 12 insertions(+), 34 deletions(-)
2007-10-10 16:54:16 -07:00
Johannes Berg
70f0876579 [MAC80211]: move sta_process rx handler later
This moves the sta_process RX handler to after decryption
so that frames that cannot be decrypted don't influence
statistics, it is likely that they were injected or something
else is totally wrong.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:16 -07:00
Johannes Berg
f9d540ee5f [MAC80211]: remove management interface
Removes the management interface since it is only required
for hostapd/userspace MLME, will not be in the final tree
at least in this form and hostapd/userspace MLME currently
do not work against this tree anyway.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:15 -07:00
Johannes Berg
a289755250 [MAC80211]: add "invalid" interface type
Since I cannot convince the lazy driver authors (hello Michael)
to stop (ab)using the MGMT interface type internally in their
drivers, this patch introduces a new _INVALID type especially
for their use and changes all affected drivers to use it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:15 -07:00
Michael Buesch
f7c4daed99 [MAC80211]: Check open_count before calling config callback.
Also remove the check for ops->config!=NULL, as it can never be NULL.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:14 -07:00
Michael Buesch
20405c0841 [RFKILL]: Add support for hardware-only rfkill buttons
Buttons that work directly on hardware cannot support
the "user_claim" functionality. Add a flag to signal
this and return -EOPNOTSUPP in this case.
b43 is such a device.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:11 -07:00
Michael Buesch
135900c182 [RFKILL]: Add support for an rfkill LED.
This adds a LED trigger.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:10 -07:00
Ilpo Järvinen
3de96471bd [TCP]: Wrap-safed reordering detection FRTO check
In case somebody has a suggestion about a better place for this
check, which must guarantee execution "early enough" (i.e,
before the wrap can occur), I'm very open to them.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:00 -07:00
Ilpo Järvinen
0e835331e3 [TCP]: Update comment of SACK block validator
Just came across what RFC2018 states about generation of valid
SACK blocks in case of reneging. Alter comment a bit to point
out clearly.

IMHO, there isn't any reason to change code because the
validation is there for a purpose (counters will inform user
about decision TCP made if this case ever surfaces).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:59 -07:00
Ilpo Järvinen
95eacd27e2 [TCP]: fix comments that got messed up during code move
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:59 -07:00
Ilpo Järvinen
dc86967b54 [TCP]: No fackets_out/highest_sack tuning when SACK isn't enabled
This was found due to bug report from Cedric Le Goater though
it turned this turned out to be unrelated bug.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:58 -07:00
Joseph Fannin
5871174149 [NETFILTER]: bridge: remove broken netfilter binary sysctls
The netfilter sysctls in the bridging code don't set strategy routines:

 sysctl table check failed: /net/bridge/bridge-nf-call-arptables .3.10.1 Missing strategy
 sysctl table check failed: /net/bridge/bridge-nf-call-iptables .3.10.2 Missing strategy
 sysctl table check failed: /net/bridge/bridge-nf-call-ip6tables .3.10.3 Missing strategy
 sysctl table check failed: /net/bridge/bridge-nf-filter-vlan-tagged .3.10.4 Missing strategy
 sysctl table check failed: /net/bridge/bridge-nf-filter-pppoe-tagged .3.10.5 Missing strategy

    These binary sysctls can't work. The binary sysctl numbers of
other netfilter sysctls with this problem are being removed.  These
need to go as well.

Signed-off-by: Joseph Fannin <jfannin@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:41 -07:00
Jan Engelhardt
ee4411a1b1 [NETFILTER]: x_tables: add xt_time match
This is ipt_time from POM-ng enhanced by the following:

 * xtables/ipv6 support
 * second granularity for daytime
 * day-of-month support (for example "match on the 15th of each month")
 * match against UTC or local timezone

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:40 -07:00
Michal Miroslaw
6b6ec99a03 [NETFILTER]: nfnetlink_log: fix some constants
Fix timeout (one second is 1 * HZ) and convert max packet copy length
to #defined constant.

Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:40 -07:00
Michal Miroslaw
aace57e054 [NETFILTER]: nfnetlink_log: fix instance_create() failure path
Fix memory leak on instance_create() while module is being unloaded.

Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:39 -07:00
Michal Miroslaw
c6a8f64836 [NETFILTER]: nfnetlink_log: fix style
Fix function definition style to match other functions in nfnetlink_log.c.

Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:39 -07:00
Michal Miroslaw
d63b043d95 [NETFILTER]: nfnetlink_log: flush queue early
If queue is filled to its threshold, then flush it right away instead
of waiting for timer or next packet.

Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:38 -07:00
Michal Miroslaw
e35670614d [NETFILTER]: nfnetlink_log: kill duplicate code
Kill some cut'n'paste effect.
Just after __nfulnl_send() returning, inst->skb is always NULL.

Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:38 -07:00
Pablo Neira Ayuso
5faa1f4cb5 [NETFILTER]: nf_conntrack_netlink: add support to related connections
This patch adds support to relate a connection to an existing master
connection. This patch is used by conntrackd to correctly replicate
related connections.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:37 -07:00
Patrick McHardy
3583240249 [NETFILTER]: nf_conntrack_expect: kill unique ID
Similar to the conntrack ID, the per-expectation ID is not needed
anymore, kill it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:36 -07:00
Patrick McHardy
7f85f91472 [NETFILTER]: nf_conntrack: kill unique ID
Remove the per-conntrack ID, its not necessary anymore for dumping.
For compatiblity reasons we send the address of the conntrack to
userspace as ID.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:36 -07:00
Patrick McHardy
f73e924cdd [NETFILTER]: ctnetlink: use netlink policy
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:35 -07:00
Patrick McHardy
5bf7585393 [NETFILTER]: nfnetlink_queue: use netlink policy
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:34 -07:00
Patrick McHardy
fd8281adac [NETFILTER]: nfnetlink_log: use netlink policy
Also remove unused nfula_min array.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:34 -07:00
Patrick McHardy
e373057828 [NETFILTER]: nfnetlink: support attribute policies
Add support for automatic checking of per-callback attribute policies.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:33 -07:00
Patrick McHardy
dd82185f2c [NETFILTER]: nfnetlink: use nlmsg_notify()
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:32 -07:00
Patrick McHardy
fdf708322d [NETFILTER]: nfnetlink: rename functions containing 'nfattr'
There is no struct nfattr anymore, rename functions to 'nlattr'.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:32 -07:00
Patrick McHardy
df6fb868d6 [NETFILTER]: nfnetlink: convert to generic netlink attribute functions
Get rid of the duplicated rtnetlink macros and use the generic netlink
attribute functions. The old duplicated stuff is moved to a new header
file that exists just for userspace.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:31 -07:00
Patrick McHardy
7c8d4cb419 [NETFILTER]: nfnetlink: make subsystem and callbacks const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:30 -07:00
Ivo van Doorn
fe242cfd33 [RFKILL]: Move rfkill_switch_all out of global header
rfkill_switch_all shouldn't be called by drivers directly,
instead they should send a signal over the input device.

To prevent confusion for driver developers, move the
function into a rfkill private header.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:29 -07:00
Michael Buesch
30ccb08847 [PATCH] mac80211: bss_tim_clear must use ~ instead of !
We need to use bitwise NOT.
This also cleans up the code a little bit to make it more readable.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:18 -07:00
Johannes Berg
b4010e0890 [PATCH] mac80211: remove generic IE for AP interfaces
This is not useful since we do not support probe response
offload to hardware at this time and beacons are set in
another way.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:17 -07:00
Johannes Berg
51617f0b76 [PATCH] mac80211: remove all prism2 ioctls
This patch removes all prism2 ioctls.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:17 -07:00
Johannes Berg
53918994b7 [PATCH] mac80211: fix iff_promiscs, iff_allmultis race
When we update the counters iff_promiscs and iff_allmultis
in struct ieee80211_local we have no common lock held to
protect them. The problem is that the update to each counter
may not be atomic, so we could end up with iff_promiscs == -1
in unfortunate conditions. To fix it, use atomic_t values.
It doesn't matter whether the two counters are updated
together atomically or not, if there are two invocations
of set_multicast_list we will end up with multiple
configure_filter() invocations of which the latter will always
be correct.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:16 -07:00
Johannes Berg
50741ae05a [PATCH] mac80211: fix TKIP IV update
The TKIP IV should be updated only after MMIC verification,
this patch changes it to be at that spot.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:16 -07:00
Johannes Berg
fb1c1cd6c5 [PATCH] mac80211: fix vlan bug
VLAN interfaces have yet another bug: they aren't accounted
for properly in the receive path in prepare_for_handlers().
I noticed this by code inspection, but it would be easy for
the compiler to catch such things if we'd just use the proper
enum where appropriate.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:15 -07:00
Johannes Berg
af1a90da39 [PATCH] mac80211: remove ieee80211_wep_get_keyidx
This function is not used any more.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:14 -07:00
Johannes Berg
6a22a59d48 [PATCH] mac80211: consolidate encryption
Currently we run through all crypto handlers for each transmitted
frame although we already know which one will be used. This
changes the code to invoke only the needed handler. It also moves
the wep code into wep.c.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:14 -07:00
Johannes Berg
4f0d18e26f [PATCH] mac80211: consolidate decryption
Currently, we run through all three crypto algorithms for each
received frame even though we have previously determined which
key we have and as such already know which algorithm will be
used. Change it to invoke only the needed function. Also move
the WEP decrypt handler to wep.c so that fewer functions need
to be non-static.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:13 -07:00
Johannes Berg
b2e7771e55 [PATCH] mac80211: pass frames to monitor interfaces early
This makes mac80211 pass all frames to monitor interfaces early
before all receive processing with the benefit that only a single
copy needs to be made, all monitors can receive clones of the skb
and if the frame will be discarded we don't even need to make a
single copy.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:12 -07:00
Johannes Berg
5b2812e925 [PATCH] mac80211: fix interface initialisation and deinitialisation
When an interface is registered it is still uninitialised so
ieee80211_if_reinit() can't be called on it (it will oops.)
Hence, we need to move the uninit method assignment.

Also, this patch fixes the bug that the master device is never
initialised nor deinitialised at all. Oddly, the deinit code
had an if statement to not run some code when running for the
master interface (which never happened), but that if statement
is also wrong. Fix that too.

Now that the uninit code is run for the master device, another
bug surfaced: it tries to remove all dependent interfaces and
that oopses or BUGs at some point, either because it unregisters
already unregistered interfaces (missing list_del bug) or due
to trying to iterate a list that has had other things removed.
Fix this too by handling the master interface specially.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:11 -07:00
Herbert Xu
b421995235 [PKT_SCHED]: Add stateless NAT
Stateless NAT is useful in controlled environments where restrictions are
placed on through traffic such that we don't need connection tracking to
correctly NAT protocol-specific data.

In particular, this is of interest when the number of flows or the number
of addresses being NATed is large, or if connection tracking information
has to be replicated and where it is not practical to do so.

Previously we had stateless NAT functionality which was integrated into
the IPv4 routing subsystem.  This was a great solution as long as the NAT
worked on a subnet to subnet basis such that the number of NAT rules was
relatively small.  The reason is that for SNAT the routing based system
had to perform a linear scan through the rules.

If the number of rules is large then major renovations would have take
place in the routing subsystem to make this practical.

For the time being, the least intrusive way of achieving this is to use
the u32 classifier written by Alexey Kuznetsov along with the actions
infrastructure implemented by Jamal Hadi Salim.

The following patch is an attempt at this problem by creating a new nat
action that can be invoked from u32 hash tables which would allow large
number of stateless NAT rules that can be used/updated in constant time.

The actual NAT code is mostly based on the previous stateless NAT code
written by Alexey.  In future we might be able to utilise the protocol
NAT code from netfilter to improve support for other protocols.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:11 -07:00
Johannes Berg
79010420cc [PATCH] mac80211: fix virtual interface locking
Florian Lohoff noticed a bug in mac80211: when bringing the
master interface down while other virtual interfaces are up
we call dev_close() under a spinlock which is not allowed.
This patch removes the sub_if_lock used by mac80211 in favour
of using an RCU list. All list manipulations are already done
under rtnl so are well protected against each other, and the
read-side locks we took in the RX and TX code are already in
RCU read-side critical sections.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Florian Lohoff <flo@rfc822.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:00 -07:00
Johannes Berg
ea49c359f3 [PATCH] mac80211: remove crypto algorithm typedef
The typedef is not required, we can just use "enum ieee80211_key_alg"
instead of "ieee80211_key_alg"

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:53:00 -07:00
Johannes Berg
0ec3ca4459 [PATCH] mac80211: validate VLAN interfaces better
This patch changes mac80211 to verify that VLAN interfaces
are valid and not bother drivers about them any more.
VLAN interfaces are now only valid when an AP interface
is up with the same MAC address, and are automatically
turned off when the AP interface is set down.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jouni Malinen <j@w1.fi>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:52:57 -07:00
Johannes Berg
4150c57212 [PATCH] mac80211: revamp interface and filter configuration
Drivers are currently supposed to keep track of monitor
interfaces if they allow so-called "hard" monitor, and
they are also supposed to keep track of multicast etc.

This patch changes that, replaces the set_multicast_list()
callback with a new configure_filter() callback that takes
filter flags (FIF_*) instead of interface flags (IFF_*).
For a driver, this means it should open the filter as much
as necessary to get all frames requested by the filter flags.
Accordingly, the filter flags are named "positively", e.g.
FIF_ALLMULTI.

Multicast filtering is a bit special in that drivers that
have no multicast address filters need to allow multicast
frames through when either the FIF_ALLMULTI flag is set or
when the mc_count value is positive.

At the same time, drivers are no longer notified about
monitor interfaces at all, this means they now need to
implement the start() and stop() callbacks and the new
change_filter_flags() callback. Also, the start()/stop()
ordering changed, start() is now called *before* any
add_interface() as it really should be, and stop() after
any remove_interface().

The patch also changes the behaviour of setting the bssid
to multicast for scanning when IEEE80211_HW_NO_PROBE_FILTERING
is set; the IEEE80211_HW_NO_PROBE_FILTERING flag is removed
and the filter flag FIF_BCN_PRBRESP_PROMISC introduced.
This is a lot more efficient for hardware like b43 that
supports it and other hardware can still set the BSSID
to all-ones.

Driver modifications by Johannes Berg (b43 & iwlwifi), Michael Wu
(rtl8187, adm8211, and p54), Larry Finger (b43legacy), and
Ivo van Doorn (rt2x00).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10 16:52:57 -07:00
Eric W. Biederman
f4618d39a3 [NETNS]: Simplify the network namespace list locking rules.
Denis V. Lunev <den@sw.ru> noticed that the locking rules
for the network namespace list are over complicated and broken.

In particular the current register_netdev_notifier currently
does not take any lock making the for_each_net iteration racy
with network namespace creation and destruction. Oops.

The fact that we need to use for_each_net in rtnl_unlock() when
the rtnetlink support becomes per network namespace makes designing
the proper locking tricky.  In addition we need to be able to call
rtnl_lock() and rtnl_unlock() when we have the net_mutex held.

After thinking about it and looking at the alternatives carefully
it looks like the simplest and most maintainable solution is
to remove net_list_mutex altogether, and to use the rtnl_mutex instead.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:55 -07:00
Andrew Morton
58c14a8fe6 [ATM] net/atm/lec.c: printk warning fix
net/atm/lec.c: In function 'lec_start_xmit':
net/atm/lec.c:371: warning: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:54 -07:00
Andrew Morton
0aa4f3331b [WIRELESS]: Fix Kconfig.
Seems that a bare "depends" is no longer allowed in Sam's kbuild tree.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:52 -07:00
Stephen Hemminger
3b04ddde02 [NET]: Move hardware header operations out of netdevice.
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:52 -07:00
Stephen Hemminger
b95cce3576 [NET]: Wrap hard_header_parse
Wrap the hard_header_parse function to simplify next step of
header_ops conversion.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:51 -07:00
Stephen Hemminger
0c4e85813d [NET]: Wrap netdevice hardware header creation.
Add inline for common usage of hardware header creation, and
fix bug in IPV6 mcast where the assumption about negative return is
an errno. Negative return from hard_header means not enough space
was available,(ie -N bytes).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:50 -07:00
Eric W. Biederman
2774c7aba6 [NET]: Make the loopback device per network namespace.
This patch makes loopback_dev per network namespace.  Adding
code to create a different loopback device for each network
namespace and adding the code to free a loopback device
when a network namespace exits.

This patch modifies all users the loopback_dev so they
access it as init_net.loopback_dev, keeping all of the
code compiling and working.  A later pass will be needed to
update the users to use something other than the initial network
namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:49 -07:00
Eric W. Biederman
0cc217e16c [IPV4]: When possible test for IFF_LOOPBACK and not dev == loopback_dev
Now that multiple loopback devices are becoming possible it makes
the code a little cleaner and more maintainable to test if a deivice
is th a loopback device by testing dev->flags & IFF_LOOPBACK instead
of dev == loopback_dev.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:48 -07:00
Eric W. Biederman
5967789dbc [IPV4]: Remove unnecessary test for the loopback device from inetdev_destroy
Currently we never call unregister_netdev for the loopback device so
it is impossible for us to reach inetdev_destroy with the loopback
device.  So the test in inetdev_destroy is unnecessary.

Further when testing with my network namespace patches removing
unregistering the loopback device and calling inetdev_destroy works
fine so there appears to be no reason for avoiding unregistering the
loopback device.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:48 -07:00
Eric W. Biederman
9dd776b6d7 [NET]: Add network namespace clone & unshare support.
This patch allows you to create a new network namespace
using sys_clone, or sys_unshare.

As the network namespace is still experimental and under development
clone and unshare support is only made available when CONFIG_NET_NS is
selected at compile time.

As this patch introduces network namespace support into code paths
that exist when the CONFIG_NET is not selected there are a few
additions made to net_namespace.h to allow a few more functions
to be used when the networking stack is not compiled in.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:46 -07:00
Eric W. Biederman
8b41d1887d [NET]: Fix running without sysfs
When sysfs support is compiled out the kernel still keeps and maintains
the kobject tree.  So it is not safe to skip our kobject reference counting or
to avoid becoming members of the kobject tree.  It is safe to not add
the networking specific sysfs attributes.

This patch removes the sysfs special cases from net/core/dev.c
renames functions from netdev_sysfs_xxxx to netdev_kobject_xxxx
and always compiles in net-sysfs.c

net-sysfs.c is modified with a CONFIG_SYSFS guard around the parts
that are actually sysfs specific.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:46 -07:00
Arnaldo Carvalho de Melo
bb293e6a24 [CCID3]: Remove ifdef surrounding BUG_ON
As suggested by DaveM.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:45 -07:00
Gerrit Renker
cecd8d0ec4 [DCCP]: Reduce the number of writable states
Since DCCP requires to close both ends of a connection simultaneously,
permission to write in state DCCP_CLOSING is removed in dccp_sendmsg():
 * if the sending end closed, it would encounter a write error anyhow;
 * if the other end has closed the connection, it accepts no more data.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:45 -07:00
Gerrit Renker
e356d37a09 [DCCP]: Factor out common code for generating Resets
This factors code common to dccp_v{4,6}_ctl_send_reset into a separate function,
and adds support for filling in the Data 1 ... Data 3 fields from RFC 4340, 5.6.

It is useful to have this separate, since the following Reset codes will always
be generated from the control socket rather than via dccp_send_reset:
 * Code 3, "No Connection", cf. 8.3.1;
 * Code 4, "Packet Error" (identification for Data 1 added);
 * Code 5, "Option Error" (identification for Data 1..3 added, will be used later);
 * Code 6, "Mandatory Error" (same as Option Error);
 * Code 7, "Connection Refused" (what on Earth is the difference to "No Connection"?);
 * Code 8, "Bad Service Code";
 * Code 9, "Too Busy";
 * Code 10, "Bad Init Cookie" (not used).

Code 0 is not recommended by the RFC, the following codes would be used in
dccp_send_reset() instead, since they all relate to an established DCCP connection:
 * Code 1, "Closed";
 * Code 2, "Aborted";
 * Code 11, "Aggression Penalty" (12.3).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:44 -07:00
Gerrit Renker
9bf55cda9b [DCCP]: Sequence number wrap-around when sending reset
This replaces normal addition with mod-48 addition so that sequence number
wraparound is respected.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:43 -07:00
Gerrit Renker
a94f0f9705 [DCCP]: Rate-limit DCCP-Syncs
This implements a SHOULD from RFC 4340, 7.5.4:
 "To protect against denial-of-service attacks, DCCP implementations SHOULD
  impose a rate limit on DCCP-Syncs sent in response to sequence-invalid packets,
  such as not more than eight DCCP-Syncs per second."

The rate-limit is maintained on a per-socket basis. This is a more stringent
policy than enforcing the rate-limit on a per-source-address basis and
protects against attacks with forged source addresses.

Moreover, the mechanism is deliberately kept simple. In contrast to
xrlim_allow(), bursts of Sync packets in reply to sequence-invalid packets
are not supported.  This foils such attacks where the receipt of a Sync
triggers further sequence-invalid packets. (I have tested this mechanism against
xrlim_allow algorithm for Syncs, permitting bursts just increases the problems.)

In order to keep flexibility, the timeout parameter can be set via sysctl; and
the whole mechanism can even be disabled (which is however not recommended).

The algorithm in this patch has been improved with regard to wrapping issues
thanks to a suggestion by Arnaldo.

Commiter note: Rate limited the step 6 DCCP_WARN too, as it says we're
               sending a sync.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:43 -07:00
Gerrit Renker
ee1a15922d [DCCP]: Remove duplicate code for Reset from connected socket
In this patch, duplicated code is removed for the case when a Reset packet is
sent from a connected socket. This code duplication is between dccp_make_reset
and dccp_transmit_skb, which already contained an (up to now entirely unused)
switch statement to fill in the reset code from the DCCP_SKB_CB.

The only thing that has been removed is the call to dst_clone(dst), since
the queue_xmit functions use sk_dst_cache anyway.

I wasn't sure which purpose inet_sk_rebuild_header served, so I left it in.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:42 -07:00
Gerrit Renker
0430ee3451 [DCCP]: Add Support for Data 1 .. 3 fields of Reset packets
This adds fields to support the informational Data 1..3 fields of the
DCCP-Reset packets (RFC 4340, 5.6), and makes minor cosmetic changes
to documentation.
Code which fills in these fields follows in subsequent patches, it is
primarily used for reporting option-processing and feature-negotiation
errors.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:42 -07:00
Gerrit Renker
727ecc5faa [DCCP]: Add FIXME for send_delayed_ack
This adds a FIXME to signal that the function dccp_send_delayed_ack is nowhere
used in the entire DCCP/CCID code.

Using a delayed Ack timer is suggested in 11.3 of RFC 4340, but it has also
rather subtle implications for the Ack-Ratio-accounting.

CCID2 does not use this (maybe it should).

I think leaving the function in is good, in case someone wants to implement
this.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:41 -07:00
Gerrit Renker
2e86908f7d [CCID3]: Move NULL-protection into function
This moves several instances of testing against NULL into the function which is
used to de-reference the CCID-private data.

Committer note: Made the BUG_ON depend on having CONFIG_IP_DCCP_CCID3_DEBUG, as it
                is too much to have this on production code. Also made sure that
                the macro is used only after checking if sk_state is not LISTEN,
                to make it equivalent to what we had before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:41 -07:00
Gerrit Renker
08831700cc [DCCP]: Send Reset upon Sync in state REQUEST
This fixes the code to correspond to RFC 4340, 7.5.4, which states the
exception that a Sync received in state REQUEST generates a Reset (not
a SyncAck).

To achieve this, only a small change is required. Since
dccp_rcv_request_sent_state_process() already uses the correct Reset Code
number 4 ("Packet Error"), we only need to shift the if-statement a few
lines further down.

(To test this case: replace DCCP_PKT_RESPONSE with DCCP_PKT_SYNC
                    in dccp_make_response.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:40 -07:00
WANG Cong
53465eb4ab [BLUETOOTH]: Make hidp_setup_input() return int
This patch:
- makes hidp_setup_input() return int to indicate errors;
- checks its return value to handle errors.

And this time it is against -rc7-mm1 tree.

Thanks to roel and Marcel Holtmann for comments.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:39 -07:00
Ilpo Järvinen
912d8f0b1f [TCP] MIB: Count FRTO's successfully detected spurious RTOs
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:39 -07:00
Ilpo Järvinen
93e6802029 [TCP]: Reordered ACK's (old) SACKs not included to discarded MIB
In case of ACK reordering, the SACK block might be valid in it's
time but is already obsoleted since we've received another kind
of confirmation about arrival of the segments through snd_una
advancement of an earlier packet.

I didn't bother to build distinguishing of valid and invalid
SACK blocks but simply made reordered SACK blocks that are too
old always not counted regardless of their "real" validity which
could be determined by using the ack field of the reordered
packet (won't be significant IMHO).

DSACKs can very well be considered useful even in this situation,
so won't do any of this for them.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:38 -07:00
Ilpo Järvinen
a6963a6b3d [TCP]: Re-place highest_sack check to a more robust position
I previously added checking to position that is rather poor as
state has already been adjusted quite a bit. Re-placing it above
all state changes should be more robust though the return should
never ever get executed regardless of its place :-).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:38 -07:00
Gerrit Renker
b0d045ca45 [DCCP]: Parameter renaming
The parameter `seq' of dccp_send_sync() is in fact an acknowledgement number
and not a sequence number - thus renamed by this patch into `ackno'.

Secondly, a `critical' warning is added when a Sync/SyncAck could not be sent.

Sanity: I have checked all other functions that are called in dccp_transmit_skb,
        there are no clashes with the use of dccpd_ack_seq; no other function is
        using this slot at the same time.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:37 -07:00
Gerrit Renker
e155d76922 [DCCP]: Fix Reset/Sync-Flood Bug
This updates sequence number checking with regard to RFC 4340, 7.5.4.
Missing in the code was an exception for sequence-invalid Reset packets,
which get a Sync acknowledging GSR, instead of (as usual) P.seqno.

This can lead to an oscillating ping-pong flood of Reset packets.

In fact, it has been observed on the wire as follows:

 1. client establishes connection to server;
 2. before server can write to client, client crashes without notifying
    the server (NB: now no longer possible due to ABORT function);
 3. server sends DCCP-Data packet (has no ackno);
 4. client generates Reset "No Connection", seqno=0, increments seqno;
 5. server replies with Sync, using ackno = P.seqno;
 6. client generates Reset "No Connection" with seqno = ackno + 1;
 7. goto (5).

The difference is that now in (5) the server uses GSR.  This causes the
Reset sent by the client in (6) to become sequence-valid, so that in (7)
the vicious circle is broken; the Reset is then enqueued and causes the
socket to enter TIMEWAIT state.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:37 -07:00
Gerrit Renker
cbe1f5f88a [DCCP]: Shorten variable names in dccp_check_seqno
This patch is in part required by the next patch; it

 * replaces 6 instances of `DCCP_SKB_CB(skb)->dccpd_seq' with `seqno';
 * replaces 7 instances of `DCCP_SKB_CB(skb)->dccpd_ack_seq' with `ackno';
 * replaces 1 use of dccp_inc_seqno() by unfolding `ADD48' macro in place.

No changes in algorithm, all changes are text replacement/substitution.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:36 -07:00
Gerrit Renker
3393da8241 [DCCP]: Simplify interface of dccp_sample_rtt
The third parameter of dccp_sample_rtt now becomes useless and is removed.

Also combined the subtraction of the timestamp echo and the elapsed time.
This is safe, since (a) presence of timestamp echo is tested first and (b)
elapsed time is either present and non-zero or it is not set and equals 0
due to the memset in dccp_parse_options.

To avoid measuring option-processing time, the timestamp for measuring the
initial Request/Response RTT sample is taken directly when the function is
called (the Linux implementation always adds a timestamp on the Request,
so there is no loss in doing this).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:35 -07:00
Gerrit Renker
4c70f383e0 [DCCP]: Provide 10s of microsecond timesource
This provides a timesource, conveniently used for DCCP timestamps, which
returns the elapsed time in 10s of microseconds since initialisation.
This makes for a wrap-around time of about 11.9 hours, which should be
sufficient for most applications.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:35 -07:00
Gerrit Renker
aa97efd97a [DCCP]: Reuse ktime_get_real() calls again
This patch reduces the number of timestamps taken in the receive path
for each packet.

The ccid3_hc_tx_update_x() routine is called in
 * the receive path for each CCID3-controlled packet
 * for the nofeedback timer (if no feedback arrives during 4 RTT)

Currently, when there is no loss, each packet gets timestamped twice.
The patch resolves this by recycling the first timestamp taken on packet
reception for RTT sampling.

When the no_feedback_timer() is called, then the timestamp argument is
simply set to NULL - so that ccid3_hc_tx_update_x() takes care of the logic.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:34 -07:00
Michael Wu
e0eb685962 [MAC80211]: rename ieee80211_cfg.h to cfg.h
Might as well rename ieee80211_cfg.h to cfg.h to keep things consistent.

Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:34 -07:00