linux/net
Willem de Bruijn 77f65ebdca packet: packet fanout rollover during socket overload
Changes:
  v3->v2: rebase (no other changes)
          passes selftest
  v2->v1: read f->num_members only once
          fix bug: test rollover mode + flag

Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.

The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.

Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.

To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.

Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).

Also, passes psock_fanout.c unit test added to selftests.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 17:15:04 -04:00
..
9p Revert parts of "hlist: drop the node parameter from iterators" 2013-03-08 15:05:34 -08:00
802 mrp: make mrp_rcv static 2013-02-11 14:16:26 -05:00
8021q net: proc: change proc_net_remove to remove_proc_entry 2013-02-18 14:53:08 -05:00
appletalk hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
atm hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ax25 hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
batman-adv batman-adv: network coding - receive coded packets and decode them 2013-03-13 22:53:51 +01:00
bluetooth hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
bridge bridge: using for_each_set_bit to simplify the code 2013-03-12 08:04:09 -04:00
caif CAIF: fix indentation for function arguments 2013-03-07 16:24:45 -05:00
can can: dump stack on protocol bugs 2013-03-19 09:44:34 -04:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2013-02-28 17:43:09 -08:00
core net: Fix a comment typo 2013-03-18 13:06:09 -04:00
dcb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-03-12 05:52:22 -04:00
dccp tcp: Remove TCPCT 2013-03-17 14:35:13 -04:00
decnet hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
dns_resolver Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2012-12-16 15:40:50 -08:00
dsa dsa: make dsa_switch_setup check for valid port names 2013-01-21 15:40:12 -05:00
ethernet net: split eth_mac_addr for better error handling 2013-01-21 14:07:44 -05:00
ieee802154 6lowpan: Fix endianness issue in is_addr_link_local(). 2013-03-10 16:49:35 -04:00
ipv4 tcp: Remove TCPCT 2013-03-17 14:35:13 -04:00
ipv6 tcp: Remove TCPCT 2013-03-17 14:35:13 -04:00
ipx hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
irda net/irda: Raise dtr in non-blocking open 2013-03-06 02:47:05 -05:00
iucv hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
key afkey: fix a typo 2013-03-07 16:26:45 -05:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-03-05 18:42:29 -08:00
lapb net/lapb: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:40:01 -08:00
llc hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-06 10:21:17 -05:00
mac802154 Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
netfilter Merge branch 'master' of git://1984.lsi.us.es/nf 2013-03-07 15:20:02 -05:00
netlabel netlabel: fix build problems when CONFIG_IPV6=n 2013-03-08 11:33:51 -05:00
netlink hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
netrom hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
nfc hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
openvswitch Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch 2013-03-17 12:58:47 -04:00
packet packet: packet fanout rollover during socket overload 2013-03-19 17:15:04 -04:00
phonet hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
rds net/rds: zero last byte for strncpy 2013-03-08 00:35:44 -05:00
rfkill rfkill: don't use [delayed_]work_pending() 2012-12-28 13:40:16 -08:00
rose hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
rxrpc Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-03-12 05:52:22 -04:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-03-05 18:42:29 -08:00
sunrpc fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
tipc hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
unix hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
vmw_vsock VSOCK: Support VM sockets connected to the hypervisor. 2013-03-15 08:26:26 -04:00
wimax
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-06 10:21:17 -05:00
x25 hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
xfrm xfrm: use xfrm direction when lookup policy 2013-03-19 10:35:11 -04:00
compat.c
Kconfig Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
Makefile VSOCK: Introduce VM Sockets 2013-02-10 19:41:08 -05:00
nonet.c
socket.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
sysctl_net.c user_ns: get rid of duplicate code in net_ctl_permissions 2012-11-18 20:32:45 -05:00