linux/net
Eric Dumazet 81c3d5470e [INET]: speedup inet (tcp/dccp) lookups
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &head->chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))
     ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
     ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))                 &&  \
     (inet_sk(__sk)->daddr          == (__saddr))   &&  \
     (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))


- Adds a prefetch(head->chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&head->lock);

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:13:38 -07:00
..
802 [TR]: Set correct frame type for SNAP packets 2005-09-22 04:51:56 -03:00
8021q [8021Q]: Add endian annotations. 2005-09-19 15:41:28 -07:00
appletalk [APPLETALK]: Fix broadcast bug. 2005-09-27 16:11:29 -07:00
atm [INET]: speedup inet (tcp/dccp) lookups 2005-10-03 14:13:38 -07:00
ax25 [AX.25]: Reformat ax25_proto_ops initialization 2005-09-12 14:25:25 -07:00
bluetooth [Bluetooth] Prevent RFCOMM connections through the RAW socket 2005-09-13 01:32:31 +02:00
bridge [BRIDGE]: TSO fix in br_dev_queue_push_xmit 2005-09-22 23:35:34 -07:00
core [NET]: Fix packet timestamping. 2005-10-03 13:57:23 -07:00
dccp [INET]: speedup inet (tcp/dccp) lookups 2005-10-03 14:13:38 -07:00
decnet [PATCH] timer initialization cleanup: DEFINE_TIMER 2005-09-09 14:03:48 -07:00
econet [NET]: Store skb->timestamp as offset to a base timestamp 2005-08-29 15:58:24 -07:00
ethernet [NET]: Fix reversed logic in eth_type_trans(). 2005-09-28 22:37:53 -07:00
ieee80211 [PATCH] proc_mkdir() should be used to create procfs directories 2005-09-29 08:46:26 -07:00
ipv4 [INET]: speedup inet (tcp/dccp) lookups 2005-10-03 14:13:38 -07:00
ipv6 [INET]: speedup inet (tcp/dccp) lookups 2005-10-03 14:13:38 -07:00
ipx [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
irda [IRDA]: *irttp cleanup 2005-09-24 16:55:17 -07:00
key
lapb
llc [LLC]: fix llc_ui_recvmsg, making it behave like tcp_recvmsg 2005-09-22 08:29:08 -03:00
netfilter [NET]: Fix packet timestamping. 2005-10-03 13:57:23 -07:00
netlink [NETLINK]: Don't prevent creating sockets when no kernel socket is registered 2005-09-06 15:43:59 -07:00
netrom [NETROM]: Introduct stuct nr_private 2005-09-12 14:28:03 -07:00
packet [NET]: Fix packet timestamping. 2005-10-03 13:57:23 -07:00
rose [ROSE]: fix typo (regeistration) 2005-09-27 15:45:15 -07:00
rxrpc [RXRPC]: Fix build failure introduced by skb->stamp changes. 2005-08-29 16:01:24 -07:00
sched [INET]: speedup inet (tcp/dccp) lookups 2005-10-03 14:13:38 -07:00
sctp [SCTP]: Fix SCTP_SHUTDOWN notifications. 2005-09-22 23:48:38 -07:00
sunrpc [PATCH] Code cleanups in calbacks in svcsock 2005-09-13 08:22:32 -07:00
unix [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
wanrouter
x25
xfrm [XFRM]: Always release dst_entry on error in xfrm_lookup 2005-09-08 15:11:55 -07:00
compat.c [PATCH] Fix 32bit sendmsg() flaw 2005-09-08 08:14:11 -07:00
Kconfig [NETFILTER] move nfnetlink options to right location in kconfig menu 2005-09-17 00:41:21 -07:00
Makefile /spare/repo/netdev-2.6 branch 'master' 2005-09-01 18:02:01 -04:00
nonet.c
socket.c [NET]: Fix module reference counts for loadable protocol modules 2005-09-27 15:23:38 -07:00
sysctl_net.c [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
TUNABLE