linux/net
Andrey Vagin dbde497966 tcp: don't update snd_nxt, when a socket is switched from repair mode
snd_nxt must be updated synchronously with sk_send_head.  Otherwise
tp->packets_out may be updated incorrectly, what may bring a kernel panic.

Here is a kernel panic from my host.
[  103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
...
[  146.301158] Call Trace:
[  146.301158]  [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0

Before this panic a tcp socket was restored. This socket had sent and
unsent data in the write queue. Sent data was restored in repair mode,
then the socket was switched from reapair mode and unsent data was
restored. After that the socket was switched back into repair mode.

In that moment we had a socket where write queue looks like this:
snd_una    snd_nxt   write_seq
   |_________|________|
             |
	  sk_send_head

After a second switching from repair mode the state of socket was
changed:

snd_una          snd_nxt, write_seq
   |_________ ________|
             |
	  sk_send_head

This state is inconsistent, because snd_nxt and sk_send_head are not
synchronized.

Bellow you can find a call trace, how packets_out can be incremented
twice for one skb, if snd_nxt and sk_send_head are not synchronized.
In this case packets_out will be always positive, even when
sk_write_queue is empty.

tcp_write_wakeup
	skb = tcp_send_head(sk);
	tcp_fragment
		if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
			tcp_adjust_pcount(sk, skb, diff);
	tcp_event_new_data_sent
		tp->packets_out += tcp_skb_pcount(skb);

I think update of snd_nxt isn't required, when a socket is switched from
repair mode.  Because it's initialized in tcp_connect_init. Then when a
write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
so it's always is in consistent state.

I have checked, that the bug is not reproduced with this patch and
all tests about restoring tcp connections work fine.

Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:14:20 -05:00
..
9p file->f_op is never NULL... 2013-10-24 23:34:54 -04:00
802
8021q vlan: Implement vlan_dev_get_egress_qos_mask as an inline. 2013-11-11 00:42:07 -05:00
appletalk
atm
ax25
batman-adv batman-adv: generalize batman-adv icmp packet handling 2013-10-23 17:03:47 +02:00
bluetooth Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-11-08 09:03:10 -05:00
bridge bridge: Fix memory leak when deleting bridge with vlan filtering enabled 2013-11-14 16:16:34 -05:00
caif caif: use pskb_put() instead of reimplementing its functionality 2013-11-07 19:28:59 -05:00
can
ceph
core macvlan: disable LRO on lower device instead of macvlan 2013-11-15 17:55:48 -05:00
dcb
dccp ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE 2013-11-05 21:52:27 -05:00
decnet
dns_resolver
dsa
ethernet
hsr net/hsr: Fix possible leak in 'hsr_get_node_status()' 2013-11-14 17:26:21 -05:00
ieee802154 inet: prevent leakage of uninitialized memory to user in recv syscalls 2013-11-18 15:12:03 -05:00
ipv4 tcp: don't update snd_nxt, when a socket is switched from repair mode 2013-11-19 16:14:20 -05:00
ipv6 ipv6: Fix inet6_init() cleanup order 2013-11-18 15:38:46 -05:00
ipx
irda genetlink: make all genl_ops users const 2013-11-14 17:10:41 -05:00
iucv
key
l2tp inet: prevent leakage of uninitialized memory to user in recv syscalls 2013-11-18 15:12:03 -05:00
lapb
llc
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-11-08 09:03:10 -05:00
mac802154 6lowpan: set and use mac_len for mac header length 2013-10-30 17:18:46 -04:00
mpls
netfilter genetlink: make all genl_ops users const 2013-11-14 17:10:41 -05:00
netlabel genetlink: make all genl_ops users const 2013-11-14 17:10:41 -05:00
netlink netlink: fix documentation typo in netlink_set_err() 2013-11-19 15:07:01 -05:00
netrom
nfc genetlink: make all genl_ops users const 2013-11-14 17:10:41 -05:00
openvswitch genetlink: make all genl_ops users const 2013-11-14 17:10:41 -05:00
packet
phonet inet: prevent leakage of uninitialized memory to user in recv syscalls 2013-11-18 15:12:03 -05:00
rds
rfkill net: rfkill: gpio: add ACPI support 2013-10-28 15:05:25 +01:00
rose
rxrpc
sched pkt_sched: fq: fix pacing for small frames 2013-11-15 21:01:52 -05:00
sctp net: sctp: bug-fixing: retran_path not set properly after transports recovering (v3) 2013-11-14 16:35:09 -05:00
sunrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-11-13 17:40:34 +09:00
tipc tipc: fix dereference before check warning 2013-11-15 03:11:06 -05:00
unix
vmw_vsock
wimax genetlink: make all genl_ops users const 2013-11-14 17:10:41 -05:00
wireless genetlink: make all genl_ops users const 2013-11-14 17:10:41 -05:00
x25 net: x25: Fix dead URLs in Kconfig 2013-10-29 17:35:17 -04:00
xfrm net: move pskb_put() to core code 2013-11-07 19:28:58 -05:00
compat.c
Kconfig net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) 2013-11-03 23:20:14 -05:00
Makefile net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) 2013-11-03 23:20:14 -05:00
nonet.c
socket.c
sysctl_net.c