linux/net/rds
Venkat Venkatsubra 18fc25c94e rds: prevent BUG_ON triggered on congestion update to loopback
After congestion update on a local connection, when rds_ib_xmit returns
less bytes than that are there in the message, rds_send_xmit calls
back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
to trigger.

For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
the message contains 8240 bytes. rds_send_xmit thinks there is more to send
and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
=4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].

The commit 6094628bfd
"rds: prevent BUG_ON triggering on congestion map updates" introduced
this regression. That change was addressing the triggering of a different
BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
 	BUG_ON(ret != 0 &&
    		 conn->c_xmit_sg == rm->data.op_nents);
This was the sequence it was going through:
(rds_ib_xmit)
/* Do not send cong updates to IB loopback */
if (conn->c_loopback
   && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
  	rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
    	return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
}
rds_ib_xmit returns 8240
rds_send_xmit:
  c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
   		 = 8192
  c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
rds_ib_xmit returns 8240
rds_send_xmit:
  c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
  and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
rds_ib_xmit returns 8240
On this iteration this sequence causes the BUG_ON in rds_send_xmit:
    while (ret) {
    	tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
    	[tmp = 65536 - 57632 = 7904]
    	conn->c_xmit_data_off += tmp;
    	[c_xmit_data_off = 57632 + 7904 = 65536]
    	ret -= tmp;
    	[ret = 8240 - 7904 = 336]
    	if (conn->c_xmit_data_off == sg->length) {
    		conn->c_xmit_data_off = 0;
    		sg++;
    		conn->c_xmit_sg++;
    		BUG_ON(ret != 0 &&
    			conn->c_xmit_sg == rm->data.op_nents);
    		[c_xmit_sg = 1, rm->data.op_nents = 1]

What the current fix does:
Since the congestion update over loopback is not actually transmitted
as a message, all that rds_ib_xmit needs to do is let the caller think
the full message has been transmitted and not return partial bytes.
It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
And 64Kb+48 when page size is 64Kb.

Reported-by: Josh Hunt <joshhunt00@gmail.com>
Tested-by: Honggang Li <honli@redhat.com>
Acked-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-03 11:54:18 -05:00
..
af_rds.c rds: Make rds_sock_lock BH rather than IRQ safe. 2012-01-24 17:03:44 -05:00
bind.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
cong.c
connection.c inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once 2013-10-19 19:45:35 -04:00
ib_cm.c IB/rds: suppress incompatible protocol when version is known 2012-12-26 15:17:37 -08:00
ib_rdma.c
ib_recv.c IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len 2012-12-26 15:17:37 -08:00
ib_ring.c
ib_send.c rds: prevent BUG_ON triggered on congestion update to loopback 2013-12-03 11:54:18 -05:00
ib_stats.c
ib_sysctl.c net: Convert uses of typedef ctl_table to struct ctl_table 2013-06-13 02:36:09 -07:00
ib.c
ib.h net: rds: use this_cpu_* per-cpu helper 2012-11-19 18:59:44 -05:00
info.c rds: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:28 +08:00
info.h
iw_cm.c RDS: use gfp flags from caller in conn_alloc() 2012-03-22 19:29:58 -04:00
iw_rdma.c RDS: Remove some unused iWARP code 2012-01-12 20:05:28 -08:00
iw_recv.c Merge branch 'kmap_atomic' of git://github.com/congwang/linux 2012-03-21 09:40:26 -07:00
iw_ring.c
iw_send.c
iw_stats.c
iw_sysctl.c net: Convert uses of typedef ctl_table to struct ctl_table 2013-06-13 02:36:09 -07:00
iw.c
iw.h
Kconfig net/rds: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:40:02 -08:00
loop.c RDS: use gfp flags from caller in conn_alloc() 2012-03-22 19:29:58 -04:00
loop.h
Makefile
message.c rds: simplify a warning message 2013-03-04 14:12:07 -05:00
page.c net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
rdma_transport.c
rdma_transport.h
rdma.c
rds.h net: misc: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
recv.c net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
send.c RDS: fix rds-ping spinlock recursion 2012-10-09 13:57:23 -04:00
stats.c net/rds: zero last byte for strncpy 2013-03-08 00:35:44 -05:00
sysctl.c net: Convert uses of typedef ctl_table to struct ctl_table 2013-06-13 02:36:09 -07:00
tcp_connect.c rds: Don't disable BH on BH context 2012-08-22 22:52:04 -07:00
tcp_listen.c rds: Don't disable BH on BH context 2012-08-22 22:52:04 -07:00
tcp_recv.c rds: Don't disable BH on BH context 2012-08-22 22:52:04 -07:00
tcp_send.c rds: Don't disable BH on BH context 2012-08-22 22:52:04 -07:00
tcp_stats.c
tcp.c
tcp.h
threads.c
transport.c