linux/net
Pablo Neira Ayuso dd7669a92c netfilter: conntrack: optional reliable conntrack event delivery
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.

The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.

At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.

The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.

During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.

A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.

For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.

This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13 12:30:52 +02:00
..
9p net/9p: handle correctly interrupted 9P requests 2009-04-05 16:54:53 -05:00
802 net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
8021q 8021q: Vlan driver should use rcu_barrier() on unload instead of syncronize_net() 2009-06-10 01:11:22 -07:00
appletalk appletalk: Use frag list abstraction interfaces. 2009-06-09 00:17:44 -07:00
atm net: skb->dst accessors 2009-06-03 02:51:04 -07:00
ax25 ax25: proc uid file misses header 2009-04-20 02:14:59 -07:00
bluetooth Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6 2009-06-11 05:47:43 -07:00
bridge Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-06-11 16:00:49 +02:00
can can: af_can.c use rcu_barrier() on module unload. 2009-06-10 01:11:24 -07:00
core neigh: fix state transition INCOMPLETE->FAILED via Netlink request 2009-06-11 04:16:28 -07:00
dcb DCB: fix kfree(skb) 2009-01-04 17:29:21 -08:00
dccp net: skb->dst accessors 2009-06-03 02:51:04 -07:00
decnet net: skb->dst accessors 2009-06-03 02:51:04 -07:00
dsa net: convert unicast addr list 2009-05-29 22:12:32 -07:00
econet econet: Use SKB queue and list helpers instead of doing it by-hand. 2009-05-28 16:46:29 -07:00
ethernet net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
ieee802154 ieee802154: Use '%Zu' printf format for size_t. 2009-06-11 02:10:19 -07:00
ipv4 netfilter: ip_tables: fix build error 2009-06-12 01:53:09 +02:00
ipv6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-06-11 16:00:49 +02:00
ipx ipx: use constant for strings and desciptor 2009-03-21 19:06:51 -07:00
irda irda: Use SKB queue and list helpers instead of doing it by-hand. 2009-05-28 23:26:33 -07:00
iucv af_iucv: Fix merge. 2009-04-23 06:37:16 -07:00
key af_key: remove some pointless conditionals before kfree_skb() 2009-02-26 23:07:32 -08:00
lapb
llc llc: Kill outdated and incorrect comment. 2009-05-28 23:31:56 -07:00
mac80211 mac80211: disable PS while probing AP 2009-06-10 13:28:41 -04:00
netfilter netfilter: conntrack: optional reliable conntrack event delivery 2009-06-13 12:30:52 +02:00
netlabel netlabel: Use genl_register_family_with_ops() 2009-05-21 16:50:24 -07:00
netlink genetlink: Introduce genl_register_family_with_ops() 2009-05-21 16:50:22 -07:00
netrom net/netrom: Fix socket locking 2009-04-22 00:49:51 -07:00
packet net: skb->dst accessors 2009-06-03 02:51:04 -07:00
phonet phonet: Use frag list abstraction interfaces. 2009-06-09 00:24:06 -07:00
rds Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-05-18 21:08:20 -07:00
rfkill rfkill: don't impose global states on resume (just restore the previous states) 2009-06-10 13:28:37 -04:00
rose Revert "rose: zero length frame filtering in af_rose.c" 2009-04-14 20:28:00 -07:00
rxrpc RxRPC: Error handling for rxrpc_alloc_connection() 2009-05-21 15:22:02 -07:00
sched pkt_sched: Use PSCHED_SHIFT in PSCHED time conversion 2009-06-09 05:25:29 -07:00
sctp sctp: protocol.c call rcu_barrier() on unload. 2009-06-10 01:11:25 -07:00
sunrpc sunrpc/auth_gss: Call rcu_barrier() on module unload. 2009-06-10 01:11:27 -07:00
tipc tipc: Use genl_register_family_with_ops() 2009-05-21 16:50:23 -07:00
unix New helper - current_umask() 2009-03-31 23:00:26 -04:00
wanrouter wanrouter: fix sparse warnings: context imbalance 2009-02-26 23:13:36 -08:00
wimax wimax: fix warning caused by not checking retval of rfkill_set_hw_state() 2009-06-11 11:12:48 -07:00
wireless cfg80211: fix rfkill locking problem 2009-06-10 13:28:41 -04:00
x25 af_rose/x25: Sanity check the maximum user frame size 2009-03-27 00:28:21 -07:00
xfrm xfrm: Use frag list abstraction interfaces. 2009-06-09 00:24:07 -07:00
compat.c net: socket infrastructure for SO_TIMESTAMPING 2009-02-15 22:43:35 -08:00
Kconfig net: add IEEE 802.15.4 socket family implementation 2009-06-09 05:25:32 -07:00
Makefile net: add IEEE 802.15.4 socket family implementation 2009-06-09 05:25:32 -07:00
nonet.c
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2009-04-06 18:05:43 -07:00
sysctl_net.c net: sysctl_net - use net_eq to compare nets 2009-03-16 16:23:30 +01:00
TUNABLE