linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-15 05:11:32 +00:00

Author	SHA1	Message	Date
Patrick McHardy	4179477f63	[NET_SCHED]: Add hrtimer based qdisc watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:05 -07:00
Patrick McHardy	641b9e0e8b	[NET_SCHED]: Use ktime as clocksource Get rid of the manual clock source selection mess and use ktime. Also use a scalar representation, which allows to clean up pkt_sched.h a bit more and results in less ktime_to_ns() calls in most cases. The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite inefficient by this patch, following patches will convert all qdiscs to hrtimers and get rid of them entirely. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:04 -07:00
Patrick McHardy	010c7d6f86	[NETFILTER]: nf_conntrack: uninline notifier registration functions Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:46 -07:00
Patrick McHardy	a3c5029cf7	[NETFILTER]: nfnetlink: use mutex instead of semaphore Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:43 -07:00
Patrick McHardy	ac5357ebac	[NETFILTER]: nf_conntrack: remove ugly hack in l4proto registration Remove ugly special-casing of nf_conntrack_l4proto_generic, all it wants is its sysctl tables registered, so do that explicitly in an init function and move the remaining protocol initialization and cleanup code to nf_conntrack_proto.c as well. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:40 -07:00
Patrick McHardy	587aa64163	[NETFILTER]: Remove IPv4 only connection tracking/NAT Remove the obsolete IPv4 only connection tracking/NAT as scheduled in feature-removal-schedule. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:34 -07:00
Arnaldo Carvalho de Melo	9c70220b73	[SK_BUFF]: Introduce skb_transport_header(skb) For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo	aa8223c7bb	[SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo	4bedb45203	[SK_BUFF]: Introduce udp_hdr(), remove skb->h.uh Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:22 -07:00
Arnaldo Carvalho de Melo	ea2ae17d64	[SK_BUFF]: Introduce skb_transport_offset() For the quite common 'skb->h.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:16 -07:00
Arnaldo Carvalho de Melo	0660e03f6b	[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo	eddc9ec53b	[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo	c9bdd4b525	[IP]: Introduce ip_hdrlen() For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open coded skb->nh.iph uses, now to go after the rest... Just out of curiosity, here are the idioms found to get the same result: skb->nh.iph->ihl << 2 skb->nh.iph->ihl<<2 skb->nh.iph->ihl * 4 skb->nh.iph->ihl4 (skb->nh.iph)->ihl sizeof(u32) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:07 -07:00
Arnaldo Carvalho de Melo	d56f90a7c9	[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo	c1d2bbe1cd	[SK_BUFF]: Introduce skb_reset_network_header(skb) For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo	37e6636669	[LLC]: Kill llc_set_pdu_hdr We'll have skb_reset_network_header soon. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:42 -07:00
Arnaldo Carvalho de Melo	459a98ed88	[SK_BUFF]: Introduce skb_reset_mac_header(skb) For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:32 -07:00
Eric Dumazet	92f37fd2ee	[NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS support Now that network timestamps use ktime_t infrastructure, we can add a new SOL_SOCKET sockopt SO_TIMESTAMPNS. This command is similar to SO_TIMESTAMP, but permits transmission of a 'timespec struct' instead of a 'timeval struct' control message. (nanosecond resolution instead of microsecond) Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are mutually exclusive. sock_recv_timestamp() became too big to be fully inlined so I added a __sock_recv_timestamp() helper function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: linux-arch@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:21 -07:00
Stephen Hemminger	a2a316fd06	[NET]: Replace CONFIG_NET_DEBUG with sysctl. Covert network warning messages from a compile time to runtime choice. Removes kernel config option and replaces it with new /proc/sys/net/core/warnings. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:05 -07:00
Eric Dumazet	ae40eb1ef3	[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:04 -07:00
David S. Miller	fe067e8ab5	[TCP]: Abstract out all write queue operations. This allows the write queue implementation to be changed, for example, to one which allows fast interval searching. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:02 -07:00
Herbert Xu	759e5d0064	[UDP]: Clean up UDP-Lite receive checksum This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:51 -07:00
Neil Horman	95c385b4d5	[IPV6] ADDRCONF: Optimistic Duplicate Address Detection (RFC 4429) Support. Nominally an autoconfigured IPv6 address is added to an interface in the Tentative state (as per RFC 2462). Addresses in this state remain in this state while the Duplicate Address Detection process operates on them to determine their uniqueness on the network. During this period, these tentative addresses may not be used for communication, increasing the time before a node may be able to communicate on a network. Using Optimistic Duplicate Address Detection, autoconfigured addresses may be used immediately for communication on the network, as long as certain rules are followed to avoid conflicts with other nodes during the Duplicate Address Detection process. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:43 -07:00
Eric Dumazet	b7aa0bf70c	[NET]: convert network timestamps to ktime_t We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:34 -07:00
James Morris	9d729f72dc	[NET]: Convert xtime.tv_sec to get_seconds() Where appropriate, convert references to xtime.tv_sec to the get_seconds() helper function. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:32 -07:00
Eric Dumazet	fa438ccfdf	[NET]: Keep sk_backlog near sk_lock sk_backlog is a critical field of struct sock. (known famous words) It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(), tcp_v4_rcv(), sk_receive_skb(). It really makes sense to place it next to sk_lock, because sk_backlog is only used after sk_lock locked (and thus memory cache line in L1 cache). This should reduce cache misses and sk_lock acquisition time. (In theory, we could only move the head pointer near sk_lock, and leaving tail far away, because 'tail' is normally not so hot, but keep it simple :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:27 -07:00
Ilpo Järvinen	3cfe3baaf0	[TCP]: Add two new spurious RTO responses to FRTO New sysctl tcp_frto_response is added to select amongst these responses: - Rate halving based; reuses CA_CWR state (default) - Very conservative; used to be the only one available (=1) - Undo cwr; undoes ssthresh and cwnd reductions (=2) The response with rate halving requires a new parameter to tcp_enter_cwr because FRTO has already reduced ssthresh and doing a second reduction there has to be prevented. In addition, to keep things nice on 80 cols screen, a local variable was added. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:23 -07:00
John Heffner	886236c124	[TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:19 -07:00
Ilpo Järvinen	46d0de4ed9	[TCP] FRTO: Entry is allowed only during (New)Reno like recovery This interpretation comes from RFC4138: "If the sender implements some loss recovery algorithm other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT be entered when earlier fast recovery is underway." I think the RFC means to say (especially in the light of Appendix B) that ...recovery is underway (not just fast recovery) or was underway when it was interrupted by an earlier (F-)RTO that hasn't yet been resolved (snd_una has not advanced enough). Thus, my interpretation is that whenever TCP has ever retransmitted other than head, basic version cannot be used because then the order assumptions which are used as FRTO basis do not hold. NewReno has only the head segment retransmitted at a time. Therefore, walk up to the segment that has not been SACKed, if that segment is not retransmitted nor anything before it, we know for sure, that nothing after the non-SACKed segment should be either. This assumption is valid because TCPCB_EVER_RETRANS does not leave holes but each non-SACKed segment is rexmitted in-order. Check for retrans_out > 1 avoids more expensive walk through the skb list, as we can know the result beforehand: F-RTO will not be allowed. SACKed skb can turn into non-SACked only in the extremely rare case of SACK reneging, in this case we might fail to detect retransmissions if there were them for any other than head. To get rid of that feature, whole rexmit queue would have to be walked (always) or FRTO should be prevented when SACK reneging happens. Of course RTO should still trigger after reneging which makes this issue even less likely to show up. And as long as the response is as conservative as it's now, nothing bad happens even then. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:12 -07:00
Ilpo Järvinen	bdaae17da8	[TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c In addition, removed inline. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:02 -07:00
Patrick McHardy	c01003c205	[IFB]: Fix crash on input device removal The input_device pointer is not refcounted, which means the device may disappear while packets are queued, causing a crash when ifb passes packets with a stale skb->dev pointer to netif_rx(). Fix by storing the interface index instead and do a lookup where neccessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-29 11:46:52 -07:00
Jeff Garzik	a9c87a10db	Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes	2007-03-28 02:21:18 -04:00
Jean Tourrilhes	c2805fbb86	[PATCH] WE-22 : prevent information leak on 64 bit Johannes Berg discovered that kernel space was leaking to userspace on 64 bit platform. He made a first patch to fix that. This is an improved version of his patch. Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-03-27 14:10:26 -04:00
David S. Miller	f11e6659ce	[IPV6]: Fix routing round-robin locking. As per RFC2461, section 6.3.6, item #2, when no routers on the matching list are known to be reachable or probably reachable we do round robin on those available routes so that we make sure to probe as many of them as possible to detect when one becomes reachable faster. Each routing table has a rwlock protecting the tree and the linked list of routes at each leaf. The round robin code executes during lookup and thus with the rwlock taken as a reader. A small local spinlock tries to provide protection but this does not work at all for two reasons: 1) The round-robin list manipulation, as coded, goes like this (with read lock held): walk routes finding head and tail spin_lock(); rotate list using head and tail spin_unlock(); While one thread is rotating the list, another thread can end up with stale values of head and tail and then proceed to corrupt the list when it gets the lock. This ends up causing the OOPS in fib6_add() later onthat many people have been hitting. 2) All the other code paths that run with the rwlock held as a reader do not expect the list to change on them, they expect it to remain completely fixed while they hold the lock in that way. So, simply stated, it is impossible to implement this correctly using a manipulation of the list without violating the rwlock locking semantics. Reimplement using a per-fib6_node round-robin pointer. This way we don't need to manipulate the list at all, and since the round-robin pointer can only ever point to real existing entries we don't need to perform any locking on the changing of the round-robin pointer itself. We only need to reset the round-robin pointer to NULL when the entry it is pointing to is removed. The idea is from Thomas Graf and it is very similar to how this was implemented before the advanced router selection code when in. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:05 -07:00
Alexey Kuznetsov	ecbb416939	[NET]: Fix neighbour destructor handling. ->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(), which is called when neighbor entry goes to dead state. At this point everything is still valid: neigh->dev, neigh->parms etc. The device should guarantee that dead neighbor entries (neigh->dead != 0) do not get private part initialized, otherwise nobody will cleanup it. I think this is enough for ipoib which is the only user of this thing. Initialization private part of neighbor entries happens in ipib start_xmit routine, which is not reached when device is down. But it would be better to add explicit test for neigh->dead in any case. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:01 -07:00
Thomas Graf	e1701c68c1	[NET]: Fix fib_rules compatibility breakage Based upon a patch from Patrick McHardy. The fib_rules netlink attribute policy introduced in 2.6.19 broke userspace compatibilty. When specifying a rule with "from all" or "to all", iproute adds a zero byte long netlink attribute, but the policy requires all addresses to have a size equal to sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a validation error. Check attribute length of FRA_SRC/FRA_DST in the generic framework by letting the family specific rules implementation provide the length of an address. Report an error if address length is non zero but no address attribute is provided. Fix actual bug by checking address length for non-zero instead of relying on availability of attribute. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:00 -07:00
Vlad Yasevich	749bf9215e	[SCTP]: Reset some transport and association variables on restart If the association has been restarted, we need to reset the transport congestion variables as well as accumulated error counts and CACC variables. If we do not, the association will use the wrong values and may terminate prematurely. This was found with a scenario where the peer restarted the association when lksctp was in the last HB timeout for its association. The restart happened, but the error counts have not been reset and when the timeout occurred, a newly restarted association was terminated due to excessive retransmits. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-20 00:09:45 -07:00
Vlad Yasevich	0b58a81146	[SCTP]: Clean up stale data during association restart During association restart we may have stale data sitting on the ULP queue waiting for ordering or reassembly. This data may cause severe problems if not cleaned up. In particular stale data pending ordering may cause problems with receive window exhaustion if our peer has decided to restart the association. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-20 00:09:43 -07:00
Eric Paris	ef41aaa0b7	[IPSEC]: xfrm_policy delete security check misplaced The security hooks to check permissions to remove an xfrm_policy were actually done after the policy was removed. Since the unlinking and deletion are done in xfrm_policy_by* functions this moves the hooks inside those 2 functions. There we have all the information needed to do the security check and it can be done before the deletion. Since auditing requires the result of that security check err has to be passed back and forth from the xfrm_policy_by* functions. This patch also fixes a bug where a deletion that failed the security check could cause improper accounting on the xfrm_policy (xfrm_get_policy didn't have a put on the exit path for the hold taken by xfrm_policy_by) It also fixes the return code when no policy is found in xfrm_add_pol_expire. In old code (at least back in the 2.6.18 days) err wasn't used before the return when no policy is found and so the initialization would cause err to be ENOENT. But since err has since been used above when we don't get a policy back from the xfrm_policy_by function we would always return 0 instead of the intended ENOENT. Also fixed some white space damage in the same area. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Venkat Yekkirala <vyekkirala@trustedcs.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-07 16:08:09 -08:00
David S. Miller	64a146513f	[NET]: Revert incorrect accept queue backlog changes. This reverts two changes: `8488df894d` `248f06726e` A backlog value of N really does mean allow "N + 1" connections to queue to a listening socket. This allows one to specify "0" as the backlog and still get 1 connection. Noticed by Gerrit Renker and Rick Jones. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-06 11:21:05 -08:00
Eric Dumazet	187f5f84ef	[INET]: twcal_jiffie should be unsigned long, not int Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:32:48 -08:00
Patrick McHardy	ec68e97ded	[NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling: - unconfirmed entries can not be killed manually, they are removed on confirmation or final destruction of the conntrack entry, which means we might iterate forever without making forward progress. This can happen in combination with the conntrack event cache, which holds a reference to the conntrack entry, which is only released when the packet makes it all the way through the stack or a different packet is handled. - taking references to an unconfirmed entry and using it outside the locked section doesn't work, the list entries are not refcounted and another CPU might already be waiting to destroy the entry What the code really wants to do is make sure the references of the hash table to the selected conntrack entries are released, so they will be destroyed once all references from skbs and the event cache are dropped. Since unconfirmed entries haven't even entered the hash yet, simply mark them as dying and skip confirmation based on that. Reported and tested by Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:18 -08:00
Wei Dong	8488df894d	[NET]: Fix bugs in "Whether sock accept queue is full" checking when I use linux TCP socket, and find there is a bug in function sk_acceptq_is_full(). When a new SYN comes, TCP module first checks its validation. If valid, send SYN,ACK to the client and add the sock to the syn hash table. Next time if received the valid ACK for SYN,ACK from the client. server will accept this connection and increase the sk->sk_ack_backlog -- which is done in function tcp_check_req().We check wether acceptq is full in function tcp_v4_syn_recv_sock(). Consider an example: After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to 1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept() system call is not invoked now. 1. 1st connection comes. invoke sk_acceptq_is_full(). sk- >sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept this connection. Increase the sk->sk_ack_backlog 2. 2nd connection comes. invoke sk_acceptq_is_full(). sk- >sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept this connection. Increase the sk->sk_ack_backlog 3. 3rd connection comes. invoke sk_acceptq_is_full(). sk- >sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. Refuse this connection. I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1 but now it can accept 2 connections. Signed-off-by: Wei Dong <weid@np.css.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-02 20:37:33 -08:00
Patrick McHardy	4498121ca3	[NET]: Handle disabled preemption in gfp_any() ctnetlink uses netlink_unicast from an atomic_notifier_chain (which is called within a RCU read side critical section) without holding further locks. netlink_unicast calls netlink_trim with the result of gfp_any() for the gfp flags, which are passed down to pskb_expand_header. gfp_any() only checks for softirq context and returns GFP_KERNEL, resulting in this warning: BUG: sleeping function called from invalid context at mm/slab.c:3032 in_atomic():1, irqs_disabled():0 no locks held by rmmod/7010. Call Trace: [<ffffffff8109467f>] debug_show_held_locks+0x9/0xb [<ffffffff8100b0b4>] __might_sleep+0xd9/0xdb [<ffffffff810b5082>] __kmalloc+0x68/0x110 [<ffffffff811ba8f2>] pskb_expand_head+0x4d/0x13b [<ffffffff81053147>] netlink_broadcast+0xa5/0x2e0 [<ffffffff881cd1d7>] :nfnetlink:nfnetlink_send+0x83/0x8a [<ffffffff8834f6a6>] :nf_conntrack_netlink:ctnetlink_conntrack_event+0x94c/0x96a [<ffffffff810624d6>] notifier_call_chain+0x29/0x3e [<ffffffff8106251d>] atomic_notifier_call_chain+0x32/0x60 [<ffffffff881d266d>] :nf_conntrack:destroy_conntrack+0xa5/0x1d3 [<ffffffff881d194e>] :nf_conntrack:nf_ct_cleanup+0x8c/0x12c [<ffffffff881d4614>] :nf_conntrack:kill_l3proto+0x0/0x13 [<ffffffff881d482a>] :nf_conntrack:nf_conntrack_l3proto_unregister+0x90/0x94 [<ffffffff883551b3>] :nf_conntrack_ipv4:nf_conntrack_l3proto_ipv4_fini+0x2b/0x5d [<ffffffff8109d44f>] sys_delete_module+0x1b5/0x1e6 [<ffffffff8105f245>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff8105911e>] system_call+0x7e/0x83 Since netlink_unicast is supposed to be callable from within RCU read side critical sections, make gfp_any() check for in_atomic() instead of in_softirq(). Additionally nfnetlink_send needs to use gfp_any() as well for the call to netlink_broadcast). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-28 09:42:13 -08:00
Adrian Bunk	a39a21982c	[IRDA] net/irda/: proper prototypes This patch adds proper prototypes for some functions in include/net/irda/irda.h Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:42:43 -08:00
Kazunori MIYAZAWA	73d605d1ab	[IPSEC]: changing API of xfrm6_tunnel_register This patch changes xfrm6_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. There is no device which conflicts with IPv4 over IPv6 IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:55:55 -08:00
Kazunori MIYAZAWA	c0d56408e3	[IPSEC]: Changing API of xfrm4_tunnel_register. This patch changes xfrm4_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:54:47 -08:00
Patrick McHardy	fe3eb20c1a	[NETFILTER]: nf_conntrack: change nf_conntrack_l[34]proto_unregister to void No caller checks the return value, and since its usually called within the module unload path there's nothing a module could do about errors anyway, so BUG on invalid conditions and return void. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:14:28 -08:00
Patrick McHardy	c0e912d7ed	[NETFILTER]: nf_conntrack: fix invalid conntrack statistics RCU assumption NF_CT_STAT_INC assumes rcu_read_lock in nf_hook_slow disables preemption as well, making it legal to use __get_cpu_var without disabling preemption manually. The assumption is not correct anymore with preemptable RCU, additionally we need to protect against softirqs when not holding nf_conntrack_lock. Add NF_CT_STAT_INC_ATOMIC macro, which disables local softirqs, and use where necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:13:43 -08:00
Patrick McHardy	923f4902fe	[NETFILTER]: nf_conntrack: properly use RCU API for nf_ct_protos/nf_ct_l3protos arrays Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:57 -08:00

1 2 3 4 5 ...

909 Commits