linux/net/rds
Venkat Venkatsubra 18fc25c94e rds: prevent BUG_ON triggered on congestion update to loopback
After congestion update on a local connection, when rds_ib_xmit returns
less bytes than that are there in the message, rds_send_xmit calls
back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
to trigger.

For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
the message contains 8240 bytes. rds_send_xmit thinks there is more to send
and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
=4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].

The commit 6094628bfd
"rds: prevent BUG_ON triggering on congestion map updates" introduced
this regression. That change was addressing the triggering of a different
BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
 	BUG_ON(ret != 0 &&
    		 conn->c_xmit_sg == rm->data.op_nents);
This was the sequence it was going through:
(rds_ib_xmit)
/* Do not send cong updates to IB loopback */
if (conn->c_loopback
   && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
  	rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
    	return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
}
rds_ib_xmit returns 8240
rds_send_xmit:
  c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
   		 = 8192
  c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
rds_ib_xmit returns 8240
rds_send_xmit:
  c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
  and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
rds_ib_xmit returns 8240
On this iteration this sequence causes the BUG_ON in rds_send_xmit:
    while (ret) {
    	tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
    	[tmp = 65536 - 57632 = 7904]
    	conn->c_xmit_data_off += tmp;
    	[c_xmit_data_off = 57632 + 7904 = 65536]
    	ret -= tmp;
    	[ret = 8240 - 7904 = 336]
    	if (conn->c_xmit_data_off == sg->length) {
    		conn->c_xmit_data_off = 0;
    		sg++;
    		conn->c_xmit_sg++;
    		BUG_ON(ret != 0 &&
    			conn->c_xmit_sg == rm->data.op_nents);
    		[c_xmit_sg = 1, rm->data.op_nents = 1]

What the current fix does:
Since the congestion update over loopback is not actually transmitted
as a message, all that rds_ib_xmit needs to do is let the caller think
the full message has been transmitted and not return partial bytes.
It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
And 64Kb+48 when page size is 64Kb.

Reported-by: Josh Hunt <joshhunt00@gmail.com>
Tested-by: Honggang Li <honli@redhat.com>
Acked-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-03 11:54:18 -05:00
..
af_rds.c
bind.c
cong.c
connection.c inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once 2013-10-19 19:45:35 -04:00
ib_cm.c
ib_rdma.c
ib_recv.c
ib_ring.c
ib_send.c rds: prevent BUG_ON triggered on congestion update to loopback 2013-12-03 11:54:18 -05:00
ib_stats.c
ib_sysctl.c net: Convert uses of typedef ctl_table to struct ctl_table 2013-06-13 02:36:09 -07:00
ib.c
ib.h
info.c
info.h
iw_cm.c
iw_rdma.c
iw_recv.c
iw_ring.c
iw_send.c
iw_stats.c
iw_sysctl.c net: Convert uses of typedef ctl_table to struct ctl_table 2013-06-13 02:36:09 -07:00
iw.c
iw.h
Kconfig
loop.c
loop.h
Makefile
message.c
page.c
rdma_transport.c
rdma_transport.h
rdma.c
rds.h net: misc: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
recv.c net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
send.c
stats.c net/rds: zero last byte for strncpy 2013-03-08 00:35:44 -05:00
sysctl.c net: Convert uses of typedef ctl_table to struct ctl_table 2013-06-13 02:36:09 -07:00
tcp_connect.c
tcp_listen.c
tcp_recv.c
tcp_send.c
tcp_stats.c
tcp.c
tcp.h
threads.c
transport.c