Merge branch 'tor-gitlab/mr/132'

This commit is contained in:
David Goulet 2023-05-24 14:48:07 -04:00
commit 51e2149dd3

View File

@ -38,26 +38,27 @@ Status: Needs-Revision
Tor relay queues, and not with any other bottlenecks (such as
intermediate Internet routers), we can avoid this complexity merely by
specifying that any paths that are constructed SHOULD NOT share any
relays (except for the exit). This assumption is valid, because non-relay bottlenecks are managed
by TCP of client-to-relay and relay-to-relay OR connections, and not
Tor's circuit-level congestion control. In this way, we can proceed to
use the exact same congestion control as specified in [PROP324],
for each path.
relays (except for the exit). This assumption is valid, because non-relay
bottlenecks are managed by TCP of client-to-relay and relay-to-relay OR
connections, and not Tor's circuit-level congestion control. In this way,
we can proceed to use the exact same congestion control as specified in
[PROP324], for each path.
For this reason, this proposal will focus on protocol specification, and
the traffic scheduling algorithms, rather than coupling. Note that the
scheduling algorithms are currently in flux, and will be subject to
change as we tune them in Shadow, on the live network, and for future
UDP implementation (see [PROP339]).
UDP implementation (see [PROP339]). This proposal will be kept up to
date with the current implementation.
1.2. Divergence from the initial Conflux design
The initial [CONFLUX] paper doesn't provide any indications on how to
handle the size of out-of-order cell queue, which we consider a
potential dangerous memory DoS vector (see [MEMORY_DOS]). It also used
RTT as the sole heuristic for selecting which circuit to send on, which
RTT as the sole heuristic for selecting which circuit to send on (which
may vary depending on the geographical locations of the participant
relays, without considering their actual available circuit capacity
relays), without considering their actual available circuit capacity
(which will be available to us via Proposal 324). Additionally, since
the publication of [CONFLUX], more modern packet scheduling algorithms
have been developed, which aim to reduce out-of-order queue size.
@ -70,8 +71,7 @@ Status: Needs-Revision
1.3. Design Overview
The following section describes the Conflux design. Each sub-section is
a building block to the multipath design that Conflux proposes.
The following section describes the Conflux design.
The circuit construction is as follows:
@ -110,10 +110,10 @@ Status: Needs-Revision
different scheduling algorithms to determine this.
Initial RTT is measured during circuit linking, as described in
[CONFLUX_HANDSHAKE]. RTT is continually measured using SENDME timing, as
in Proposal 324. This means that during use, the primary circuit and
secondary circuit may switch roles, depending on unrelated network
congestion caused by other Tor clients.
[CONFLUX_HANDSHAKE]. After the initial link, RTT is continually measured
using SENDME timing, as in Proposal 324. This means that during use,
the primary circuit and secondary circuit may switch roles, depending on
unrelated network congestion caused by other Tor clients.
We also support linking onion service circuits together. In this case,
only two rendezvous circuits are linked. Each of these RP circuits will
@ -124,11 +124,11 @@ Status: Needs-Revision
we have consulted (see [ACKNOWLEDGMENTS]), believe Tor's congestion
control from Proposal 324 to be sufficient in this rare case.
In the algorithms we recommend here, only two circuits will be linked together at a time.
However, implementations
SHOULD support more than two paths, as this has been shown to assist in
traffic analysis resistance[WTF_SPLIT], and will also be useful for
maintaining a desired target RTT, for UDP VoIP applications.
In the algorithms we recommend here, only two circuits will be linked
together at a time. However, implementations SHOULD support more than
two paths, as this has been shown to assist in traffic analysis
resistance[WTF_SPLIT], and will also be useful for maintaining a desired
target RTT, for UDP VoIP applications.
If the number of circuits exceeds the current number of guard relays,
guard relays MAY be re-used, but implementations SHOULD use the same
@ -160,11 +160,12 @@ Status: Needs-Revision
The "max-num-circ" value indicate the maximum number of rendezvous
circuits that are allowed to be linked together.
We let the service specify the conflux algorithm to use. Some services may
prefer latency, where as some may prefer throughput. However, clients will
also have to be able to override this request, because the high-throughput
algorithms will require more out-of-order queue memory, which may be
infeasible on mobile.
We let the service specify the conflux algorithm to use, when sending data
to the service. Some services may prefer latency, where as some may prefer
throughput. However, clients also have the ability to request their own UX
for data that the service sends, in the LINK handshake below, in part
because the high-throughput algorithms will require more out-of-order queue
memory, which may be infeasible on mobile.
The next section describes how the circuits are linked together.
@ -263,19 +264,34 @@ Status: Needs-Revision
These three relay commands are sent on *each* leg, to allow each endpoint to
measure the initial RTT of each leg.
The client SHOULD abandon and close circuit if the LINKED message takes too long to arrive.
This timeout MUST be no larger than the normal SOCKS/stream timeout in use
for RELAY_BEGIN, but MAY be the Circuit Build Timeout value, instead.
(The C-Tor implementation currently uses Circuit Build Timeout).
The client SHOULD abandon and close circuit if the LINKED message takes too
long to arrive. This timeout MUST be no larger than the normal SOCKS/stream
timeout in use for RELAY_BEGIN, but MAY be the Circuit Build Timeout value,
instead. (The C-Tor implementation currently uses Circuit Build Timeout).
See [SIDE_CHANNELS] for rules for when to reject unexpected handshake cells.
2.2. Linking Circuits from OP to Exit [LINKING_EXIT]
To link exit circuits, two circuits to the same exit are built. When
each circuit is opened, we ensure that congestion control has been
negotiated. If congestion control negotiation has failed, the circuit
MUST be closed. After this, the linking handshake begins.
To link exit circuits, two circuits to the same exit are built, with
additional restrictions such that they do not share Guard or Middle
relays. This restriction applies via specific relay identity keys,
and not IP addresses, families, or networks. (This is because the purpose
of it is to avoid sharing a bottleneck *inside* relay circuit queues;
bottlenecks caused by a shared network are handled by TCP's congestion
control on the OR conns).
Bridges also are subject to the same constraint as Guard relays;
the C-Tor codebase emits a warn if only one bridge is configured, unless
that bridge has transport "snowflake". Snowflake is exempt from this
Guard restriction because it is actually backed by many bridges. In the
bridge case, we only warn, and do not refuse to build conflux circuits,
because it is not catastrophic that Bridges are shared, it is just
sub-optimal for performance and congestion.
When each circuit is opened, we ensure that congestion control
has been negotiated. If congestion control negotiation has failed, the
circuit MUST be closed. After this, the linking handshake begins.
The RTT times between RELAY_CONFLUX_LINK and RELAY_CONFLUX_LINKED are
measured by the client, to determine primary vs secondary circuit use,
@ -285,9 +301,11 @@ Status: Needs-Revision
Because of the race between initial data and the RELAY_CONFLUX_LINKED_ACK
cell, conditions can arise where an Exit needs to send data before the
slowest circuit delivers this ACK. In these cases, it should prefer the
circuit that has delivered the ACK (which will arrive immediately prior
to any data).
slowest circuit delivers this ACK. In these cases, it should prefer sending
data on the circuit that has delivered the ACK (which will arrive immediately
prior to any data from the client). This circuit will be the lower RTT
circuit anyway, but the code needs to handle the fact that in this case,
there won't yet be an RTT for the second circuit.
2.3. Linking circuits to an onion service [LINKING_SERVICE]
@ -299,7 +317,20 @@ Status: Needs-Revision
meet the client at two separate rendezvous points. These introduce
requests MUST be sent to the same intropoint (due to potential use of
onionbalance), and SHOULD be sent back-to-back on the same intro
circuit. They MAY be combined with Proposal 340.
circuit. They MAY be combined with Proposal 340. (Note that if we do not
use Prop340, we will have to raise the limit on number of intros per
client circuit to 2, here, at intropoints).
When rendezvous circuits are built, they should use the same Guard,
Bridge, and Middle restrictions as specified in 2.2, for Exits. These
restrictions SHOULD also take into consideration all Middles in the path,
including the rendezvous point. All relay identities should be unique
(again, except for when the Snowflake transport is in use). The one
special case here is if the chosen rendezvous points by a client
are the same as the service's guards. In this case, the service SHOULD
NOT use different guards, but instead stick with the guards it has.
The reason for this is that we do not want to create the ability
for a client to force a service to use different guards.
The first rendezvous circuit to get joined SHOULD use Proposal 340 to
append the RELAY_BEGIN command, and the service MUST answer on this
@ -324,9 +355,9 @@ Status: Needs-Revision
In C-Tor, conflux is only used via circuit prebuilding. Pre-built conflux
sets are preferred over other pre-built circuits, but if the pre-built pool
ends up empty, normal pre-built circuits are used. If those run out, regular
non-conflux circuits are built. Conflux sets are never built on-demand, but
this is strictly an implementation decision, to simplify dealing with the
C-Tor codebase.
non-conflux circuits are built. In other words, in C-Tor, conflux sets are
never built on-demand, but this is strictly an implementation decision, to
simplify dealing with the C-Tor codebase
The consensus parameter 'cfx_max_prebuilt_set' specifies the number of
sets to pre-build.
@ -340,13 +371,13 @@ Status: Needs-Revision
When a set is launched, legs begin the handshake in the unlinked state.
As handshakes complete, finalization is attempted, to create a linked set.
On the client, this finalization happens upon receipt of the LINKED cell.
On the exit/service, this finalization happens upon sending the LINKED_ACK.
On the exit/service, this finalization happens upon *sending* the LINKED_ACK.
The initiator of this handshake considers the set fully linked once the
RELAY_CONFLUX_LINKED_ACK is sent (roughly upon receipt of the LINKED cell).
Because of the potential race between LINKED_ACK, and initial data sent by
the client, the receiver of the handshake must consider a leg linked at
the time of sending a LINKED cell.
the time of *sending* a LINKED_ACK cell.
This means that exit legs may not have an RTT measurement, if data on the
faster leg beats the LINKED_ACK on the slower leg. The implementation MUST
@ -379,15 +410,15 @@ Status: Needs-Revision
legs' maximum last_seq_recv, and a lower last_seq_recv than all
current legs last_seq_sent.
This check is performed on finalization, not the receipt of the cell. This
gives the data additional time to arrive.
This check is performed on finalization, not the receipt of first
handshake cell. This gives the data additional time to arrive.
2.5. Congestion Control Application [CONGESTION_CONTROL]
The SENDMEs for congestion control are performed per-leg. As soon as
data arrives, regardless of its ordering, it is counted towards SENDME
delivery. In this way, 'cwnd - package_window' of each leg always
reflects the available data to send on each leg. This is important for
delivery. In this way, 'cwnd - inflight' of each leg always reflects
the available data to send on each leg. This is important for
[SCHEDULING].
The Congestion control Stream XON/XOFF can be sent on either leg, and
@ -397,7 +428,19 @@ Status: Needs-Revision
of their circuit was blocked. Because conflux can send on the other
circuit, which uses a different OR conn, this form of stream blocking
has been decoupled from the OR conn status, and only happens when
congestion control has decided that all circuits are blocked.
congestion control has decided that all circuits are blocked (congestion
control becomes blocked when either 'cwnd - inflight <= 0', *or* when
the local OR conn is blocked, so if all local OR conns of a set are
blocked, then the stream will block that way).
Note also that because congestion control only covers RELAY_COMMAND_DATA
cells, for all algorithms, a special case must be made such that if no
circuit is available to send on due to congestion control blocking,
commands other than RELAY_COMMAN_DATA MUST be sent on the current
circuit, even if the cell scheduler believes that no circuit is available.
Depending on the code structure of Arti, this special case may or may
not be necessary. It arises in C-Tor because nothing can block the
sending of arbitrary non-DATA relay command cells.
2.6. Sequencing [SEQUENCING]
@ -519,11 +562,12 @@ Status: Needs-Revision
RELAY_COMMAND_XON
Currently, this set is the same as the set of cells that have stream ID,
but the property that enforces this is that these cells must be ordered
with respect to all data on the circuit. It is not impossible that future
cells could be invented that don't have stream IDs, but yet must still
arrive in order with respect to circuit data cells. Prop#253 is one
possible example of such a thing (though we won't be implementing that).
but the property that leads to this requirement is not stream usage by
itself, it is that these cells must be ordered with respect to all data
on the circuit. It is not impossible that future relay commands could be
invented that don't have stream IDs, but yet must still arrive in order
with respect to circuit data cells. Prop#253 is one possible example of
such a thing (though we won't be implementing that proposal).
3. Traffic Scheduling [SCHEDULING]
@ -532,60 +576,63 @@ Status: Needs-Revision
original conflux paper used only RTT. However, with Proposal 324, we
will have accurate information on the instantaneous available bandwidth
of each circuit leg, as 'cwnd - inflight' (see Section 3 of
Proposal 324).
Some additional RTT optimizations are also useful, to improve
responsiveness and minimize out-of-order queue sizes.
Proposal 324). We also have the TCP block state of the local OR
connection.
We specify two traffic schedulers from the multipath literature and
adapt them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has
three variants, with different trade offs.
adapt them to Tor: [MINRTT_TOR], and [LOWRTT_TOR]. Additionally,
we create low-memory variants of these that aim to minimize the
out-of-order queue size at the receiving endpoint.
However, see the [TRAFFIC_ANALYSIS] sections of this proposal for
Additionally, see the [TRAFFIC_ANALYSIS] sections of this proposal for
important details on how this selection can be changed, to reduce
website traffic fingerprinting.
XXX: These sections are not accurate, and are subject to change
during the alpha process, via Shadow simulation. We need to specify
candidate algorithms for the UX properties. The latency algorithms
will be related to LOWRTT_TOR, and the throughput algorithms related
to BLEST_TOR, but significant changes will arise during evaluation,
and possibly also live deployment iteration.
3.1. MinRTT scheduling [MINRTT_TOR]
3.1. LowRTT Scheduling [LOWRTT_TOR]
This schedulng algorithm is used for the MIN_LATENCY user experience.
This scheduling algorithm is based on the original [CONFLUX] paper, with
ideas from [MPTCP]'s minRTT/LowRTT scheduler.
It works by always and only sending on the circuit with the current minimum
RTT. With this algorithm, conflux should effectively stay on the circuit with
the lowest initial RTT, unless that circuit's RTT raises above the RTT of the
other circuit (due to relay load or congestion). When the circuit's congestion
window is full (ie: cwnd - inflight <= 0), or if the local OR conn blocks,
the conflux set stops transmitting and stops reading on edge connections,
rather than switch.
In this algorithm, endpoints send cells on the circuit with lower RTT
(primary circuit). This continues while the congestion window on the
circuit has available room: ie whenever cwnd - package_window > 0.
This should result in low out-of-order queues in most situations, unless
the initial RTTs of the two circuits are very close (basically within the
Vegas RTT bounds of queue variance, 'alpha' and 'beta').
Whenever the primary circuit's congestion window becomes full, the
secondary circuit is used. We stop reading on the send window source
(edge connection) when both congestion windows become full.
3.2. LowRTT Scheduling [LOWRTT_TOR]
This scheduling algorithm is based on [MPTCP]'s LowRTT scheduler. This
algorithm is used for the UX choice of HIGH_THROUGHPUT.
In this algorithm, endpoints send cells on the circuit with lowest RTT that
has an unblocked local OR connection, and room in its congestion window (ie:
cwnd - inflight > 0). We stop reading on edge connections only when both
congestion windows become full, or when both local OR connections are blocked.
In this way, unlike original conflux, we switch to the secondary circuit
without causing congestion on the primary circuit. This improves both
load times, and overall throughput.
without causing congestion either locally, or on either circuit. This
improves both load times, and overall throughput. Given a large enough
transmission, both circuits are used to their full capacity,
simultaneously.
This behavior matches minRTT from [MPTCP], sometimes called LowRTT.
3.3. MinRTT Low-Memory Scheduling [MINRTT_LOWMEM_TOR]
It may be better to stop reading on the edge connection when the primary
congestion window becomes full, rather than switch to the secondary
circuit as soon as the primary congestion window becomes full. (Ie: only
switch if the RTTs themselves change which circuit is primary). This is
what was done in the original Conflux paper. This behavior effectively
causes us to optimize for responsiveness and congestion avoidance,
rather than throughput. For evaluation, we will control this switching
behavior with a consensus parameter (see [CONSENSUS_PARAMETERS]).
The low memory version of the MinRTT scheduler ensures that we do not
perform a switch more often than once per congestion window worth of data.
Because of potential side channel risk (see [SIDE_CHANNELS]), a third
variant of this algorithm, where the primary circuit is chosen during
the [LINKING_CIRCUITS] handshake and never changed, is also possible
to control via consensus parameter.
XXX: Other rate limiting, such as not switching unless the RTT changes by
more than X%, may be useful here.
3.2. BLEST Scheduling [BLEST_TOR]
3.4. BLEST Scheduling [BLEST_TOR]
XXX: Something like this might be useful for minimizing OOQ for the UX
choice of LOW_MEM_THROUGHPUT, but we might just be able to reduce switching
frequency instead.
XXX: We want an algorithm that only uses cwnd instead. This algorithm
has issues if the primary cwnd grows while the secondary does not.
@ -701,7 +748,7 @@ Status: Needs-Revision
important to be mindful of ways that an adversary can inject new cell
commands, as well as ways that the adversary can spawn new circuits
arbitrarily.
It is also important, though slightly less so, to be mindful of the uniqueness
of new handshakes, as handshakes can be used to classify usage (such as via
Onion Service Circuit Fingerprinting). Handshake side channels are only