linux/net/sctp
Daniel Borkmann aa4a83ee8b net: sctp: fix suboptimal edge-case on non-active active/retrans path selection
In SCTP, selection of active (T.ACT) and retransmission (T.RET)
transports is being done whenever transport control operations
(UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().

Commits 4c47af4d5e ("net: sctp: rework multihoming retransmission
path selection to rfc4960") and a7288c4dd5 ("net: sctp: improve
sctp_select_active_and_retran_path selection") have both improved
it towards a more fine-grained and optimal path selection.

Currently, the selection algorithm for T.ACT and T.RET is as follows:

1) Elect the two most recently used ACTIVE transports T1, T2 for
   T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used
2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set
   T.ACT<-T.PRI and T.RET<-T1
3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1
4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where
   T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI

Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and
T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is
still slightly suboptimal as it can lead to the following scenario:

Setup:
        <A>                                <B>
    T1: p1p1 (10.0.10.10) <==>  .'`)  <==> p1p1 (10.0.10.12)  <= T.PRI
    T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22)

    net.sctp.rto_min = 1000
    net.sctp.path_max_retrans = 2
    net.sctp.pf_retrans = 0
    net.sctp.hb_interval = 1000

T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
link flapping). Here, the first time transmission is sent over PF path
T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
are sent over the INACTIVE path T1 (T.PRI), which is not good.

After the patch, it's choosing better transports in both cases by
modifying step 4):

4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is
   the most recently used (if avail) in PF, set T.RET<-T.ACT_new

This will still select a best possible path in PF if available (which
can also include T.PRI/T.RET), and set both T.ACT/T.RET to it.

In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE
as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just
for a very short while before going back ACTIVE, it will guarantee that
this path will be reselected for T.ACT/T.RET since T3 (PF) is not
available.

Previously, this was not possible, as we would only select between T.PRI
and T.RET, and a possible T3 would be NULL due to the fact that we have
just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
and would select a suboptimal path when T.PRI/T.RET have worse properties.

In the case that T.ACT_old permanently went to INACTIVE during this
transition and there's no PF path available, plus T.PRI and T.RET are
INACTIVE as well, we would now camp on T.ACT_old, but if everything is
being INACTIVE there's really not much we can do except hoping for a
successful HB to bring one of the transports back up again and, thus
cause a new selection through sctp_assoc_control_transport().

Now both tests work fine:

Case 1:

 1. T1 S(ACTIVE) T.ACT
    T2 S(ACTIVE) T.RET

 2. T1 S(ACTIVE) T.ACT, T.RET
    T2 S(PF)

 3. T1 S(ACTIVE) T.ACT, T.RET
    T2 S(INACTIVE)

 5. T1 S(PF) T.ACT, T.RET
    T2 S(INACTIVE)

[ 5.1 T1 S(INACTIVE) T.ACT, T.RET
      T2 S(INACTIVE) ]

 6. T1 S(ACTIVE) T.ACT, T.RET
    T2 S(INACTIVE)

 7. T1 S(ACTIVE) T.ACT
    T2 S(ACTIVE) T.RET

Case 2:

 1. T1 S(ACTIVE) T.ACT
    T2 S(ACTIVE) T.RET

 2. T1 S(PF)
    T2 S(ACTIVE) T.ACT, T.RET

 3. T1 S(INACTIVE)
    T2 S(ACTIVE) T.ACT, T.RET

 5. T1 S(INACTIVE)
    T2 S(PF) T.ACT, T.RET

[ 5.1 T1 S(INACTIVE)
      T2 S(INACTIVE) T.ACT, T.RET ]

 6. T1 S(INACTIVE)
    T2 S(ACTIVE) T.ACT, T.RET

 7. T1 S(ACTIVE) T.ACT
    T2 S(ACTIVE) T.RET

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22 11:31:30 -07:00
..
associola.c net: sctp: fix suboptimal edge-case on non-active active/retrans path selection 2014-08-22 11:31:30 -07:00
auth.c net: sctp: cache auth_enable per endpoint 2014-04-18 18:32:00 -04:00
bind_addr.c
chunk.c sctp: fix checkpatch errors with space required or prohibited 2013-12-26 13:47:47 -05:00
debug.c
endpointola.c net: sctp: migrate most recently used transport to ktime 2014-06-11 12:23:17 -07:00
input.c net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS 2014-07-31 22:04:18 -07:00
inqueue.c
ipv6.c sctp: Fixup v4mapped behaviour to comply with Sock API 2014-07-31 21:49:06 -07:00
Kconfig
Makefile net: sctp: Inline the functions from command.c 2014-07-08 14:38:48 -07:00
objcnt.c sctp: fix checkpatch errors with (foo*)|foo * bar|foo* bar 2013-12-26 13:47:47 -05:00
output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-08-05 18:46:26 -07:00
outqueue.c net: sctp: Rename SCTP_XMIT_NAGLE_DELAY to SCTP_XMIT_DELAY 2014-07-22 13:32:11 -07:00
primitive.c
probe.c
proc.c snmp: fix some left over of snmp stats 2014-05-14 15:33:47 -04:00
protocol.c sctp: Fixup v4mapped behaviour to comply with Sock API 2014-07-31 21:49:06 -07:00
sm_make_chunk.c ktime: add ktime_after and ktime_before helper 2014-06-11 12:23:17 -07:00
sm_sideeffect.c net: sctp: Don't transition to PF state when transport has exhausted 'Path.Max.Retrans'. 2014-04-27 23:41:14 -04:00
sm_statefuns.c net: sctp: remove unnecessary break after return/goto 2014-07-15 16:27:01 -07:00
sm_statetable.c sctp: fix checkpatch errors with indent 2013-12-26 13:47:48 -05:00
socket.c sctp: Fixup v4mapped behaviour to comply with Sock API 2014-07-31 21:49:06 -07:00
ssnmap.c
sysctl.c net: sctp: only warn in proc_sctp_do_alpha_beta if write 2014-07-02 18:44:07 -07:00
transport.c sctp: Fixup v4mapped behaviour to comply with Sock API 2014-07-31 21:49:06 -07:00
tsnmap.c
ulpevent.c sctp: Fixup v4mapped behaviour to comply with Sock API 2014-07-31 21:49:06 -07:00
ulpqueue.c sctp: add support for busy polling to sctp protocol 2014-04-20 18:18:55 -04:00