linux/net/ipv4
Eric Dumazet 1ddbcb005c net: fix rtable leak in net/ipv4/route.c
Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :

begin_of_quotation
 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
 patch has at least one critical flaw, and another problem.

 rt_intern_hash calculates rthi pointer, which is later used for new entry
 insertion. The same loop calculates cand pointer which is used to clean the
 list. If the pointers are the same, rtable leak occurs, as first the cand is
 removed then the new entry is appended to it.

 This leak leads to unregister_netdevice problem (usage count > 0).

 Another problem of the patch is that it tries to insert the entries in certain
 order, to facilitate counting of entries distinct by all but QoS parameters.
 Unfortunately, referencing an existing rtable entry moves it to list beginning,
 to speed up further lookups, so the carefully built order is destroyed.

 For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
 it will also destroy the ordering.
end_of_quotation

Problematic commit is 1080d709fb
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)

Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.

A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.

Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:18:02 -07:00
..
netfilter netfilter: revised locking for x_tables 2009-04-28 22:36:33 -07:00
af_inet.c Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2 2009-03-27 17:28:43 +01:00
ah4.c netns xfrm: AH/ESP in netns! 2008-11-25 17:59:27 -08:00
arp.c ipv4: arp announce, arp_proxy and windows ip conflict verification 2009-03-13 16:02:07 -07:00
cipso_ipv4.c netlabel: Label incoming TCP connections correctly in SELinux 2009-03-28 15:01:36 +11:00
datagram.c mib: add net to IP_INC_STATS_BH 2008-07-16 20:20:11 -07:00
devinet.c netlink: change nlmsg_notify() return value logic 2009-02-24 23:18:28 -08:00
esp4.c netns xfrm: AH/ESP in netns! 2008-11-25 17:59:27 -08:00
fib_frontend.c ip: add loose reverse path filtering 2009-02-22 19:54:45 -08:00
fib_hash.c net: clean up net/ipv4/fib_frontend.c fib_hash.c ip_gre.c 2008-11-03 00:25:16 -08:00
fib_lookup.h [IPV4] FIB_HASH: Reduce memory needs and speedup lookups 2008-01-28 15:02:46 -08:00
fib_rules.c net: add fib_rules_ops to flush_cache method 2008-07-05 19:01:28 -07:00
fib_semantics.c netlink: change nlmsg_notify() return value logic 2009-02-24 23:18:28 -08:00
fib_trie.c net: replace NIPQUAD() in net/ipv4/ net/ipv6/ 2008-10-31 00:53:57 -07:00
icmp.c netns: Fix icmp shutdown. 2009-03-03 01:14:15 -08:00
igmp.c netns: igmp: make /proc/net/{igmp,mcfilter} per netns 2008-12-25 16:42:51 -08:00
inet_connection_sock.c net: move bsockets outside of read only beginning of struct inet_hashinfo 2009-02-01 12:31:33 -08:00
inet_diag.c net: Convert TCP/DCCP listening hash tables to use RCU 2008-11-23 17:22:55 -08:00
inet_fragment.c inet fragments: fix sparse warning: context imbalance 2009-02-26 23:13:35 -08:00
inet_hashtables.c net: move bsockets outside of read only beginning of struct inet_hashinfo 2009-02-01 12:31:33 -08:00
inet_lro.c include/net net/ - csum_partial - remove unnecessary casts 2008-11-19 15:44:53 -08:00
inet_timewait_sock.c net: convert TCP/DCCP ehash rwlocks to spinlocks 2008-11-20 20:39:09 -08:00
inetpeer.c net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c inetpeer.c ip_output.c 2008-11-03 00:23:42 -08:00
ip_forward.c net: reduce structures when XFRM=n 2008-10-28 13:24:06 -07:00
ip_fragment.c netns: oops in ip[6]_frag_reasm incrementing stats 2009-03-18 23:26:11 -07:00
ip_gre.c gre: used time_before for comparing jiffies 2009-02-24 23:34:48 -08:00
ip_input.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2008-11-18 23:38:23 -08:00
ip_options.c cipso: Add support for native local labeling and fixup mapping names 2008-10-10 10:16:34 -04:00
ip_output.c ip: support for TX timestamps on UDP and RAW sockets 2009-02-15 22:43:38 -08:00
ip_sockglue.c net: ip_sockglue.c add static, annotate ports' endianness 2008-11-20 01:54:27 -08:00
ipcomp.c netns xfrm: state lookup in netns 2008-11-25 17:30:50 -08:00
ipconfig.c ipconfig: handle case of delayed DHCP server 2009-05-17 20:39:33 -07:00
ipip.c ipip: used time_before for comparing jiffies 2009-02-24 23:36:47 -08:00
ipmr.c ipmr: use goto to common label instead of opencoding 2009-02-06 23:46:51 -08:00
Kconfig ipv4: make default for INET_LRO consistent with help text 2009-05-18 21:48:38 -07:00
Makefile IPVS: Move IPVS to net/netfilter/ipvs 2008-10-07 08:38:24 +11:00
netfilter.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2008-11-28 02:19:15 -08:00
proc.c net: replace commatas with semicolons 2009-02-16 00:08:56 -08:00
protocol.c net: remove CVS keywords 2008-06-11 21:00:38 -07:00
raw.c ip: support for TX timestamps on UDP and RAW sockets 2009-02-15 22:43:38 -08:00
route.c net: fix rtable leak in net/ipv4/route.c 2009-05-20 17:18:02 -07:00
syncookies.c lsm: Relocate the IPv4 security_inet_conn_request() hooks 2009-03-28 15:01:36 +11:00
sysctl_net_ipv4.c net: '&' redux 2008-11-03 18:21:05 -08:00
tcp_bic.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_cong.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_cubic.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_diag.c net: inet_diag_handler structs can be const 2008-11-19 15:43:27 -08:00
tcp_highspeed.c [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid 2008-01-28 14:55:41 -08:00
tcp_htcp.c htcp: merge icsk_ca_state compare 2009-03-02 03:00:14 -08:00
tcp_hybla.c tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd. 2008-10-07 15:58:17 -07:00
tcp_illinois.c [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid 2008-01-28 14:55:41 -08:00
tcp_input.c tcp: Fix tcp_prequeue() to get correct rto_min value 2009-05-04 11:11:01 -07:00
tcp_ipv4.c lsm: Relocate the IPv4 security_inet_conn_request() hooks 2009-03-28 15:01:36 +11:00
tcp_lp.c [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid 2008-01-28 14:55:41 -08:00
tcp_minisocks.c tcp: consolidate paws check 2009-03-15 20:09:52 -07:00
tcp_output.c tcp: fix mid-wq adjustment helper 2009-04-20 02:15:00 -07:00
tcp_probe.c tcp: '< 0' test on unsigned 2009-03-13 16:05:14 -07:00
tcp_scalable.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_timer.c tcp: cleanup ca_state mess in tcp_timer 2009-03-02 03:00:13 -08:00
tcp_vegas.c tcp: tcp_vegas cong avoid fix 2008-12-09 00:13:04 -08:00
tcp_vegas.h [TCP]: congestion control API pass RTT in microseconds 2007-07-31 02:27:57 -07:00
tcp_veno.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_westwood.c [TCP]: congestion control API pass RTT in microseconds 2007-07-31 02:27:57 -07:00
tcp_yeah.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp.c tcp: fix MSG_PEEK race check 2009-05-18 15:05:40 -07:00
tunnel4.c [IPV4] TUNNEL4: Fix incoming packet length check for inter-protocol tunnel. 2008-06-05 04:02:33 +09:00
udp_impl.h udp: introduce struct udp_table and multiple spinlocks 2008-10-29 01:41:45 -07:00
udp.c ipv6: Fix NULL pointer dereference with time-wait sockets 2009-04-11 01:53:06 -07:00
udplite.c udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
xfrm4_input.c ipsec: Remove useless ret variable 2008-12-26 01:31:18 -08:00
xfrm4_mode_beet.c ipsec: Interfamily IPSec BEET 2008-08-06 02:39:30 -07:00
xfrm4_mode_transport.c [IPSEC]: Use IPv6 calling convention as the convention for x->mode->output 2007-10-10 16:55:54 -07:00
xfrm4_mode_tunnel.c xfrm: fix fragmentation for ipv4 xfrm tunnel 2008-06-17 16:38:23 -07:00
xfrm4_output.c [IPSEC]: Fix inter address family IPsec tunnel handling. 2008-03-24 14:51:51 -07:00
xfrm4_policy.c net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
xfrm4_state.c xfrm: remove useless forward declarations 2008-11-25 01:05:54 -08:00
xfrm4_tunnel.c [IPCOMP]: Fix reception of incompressible packets 2008-01-31 19:27:24 -08:00