mirror of
https://github.com/torproject/torspec.git
synced 2024-11-26 19:30:32 +00:00
Merge branch 'tor-gitlab/mr/132'
This commit is contained in:
commit
51e2149dd3
@ -38,26 +38,27 @@ Status: Needs-Revision
|
||||
Tor relay queues, and not with any other bottlenecks (such as
|
||||
intermediate Internet routers), we can avoid this complexity merely by
|
||||
specifying that any paths that are constructed SHOULD NOT share any
|
||||
relays (except for the exit). This assumption is valid, because non-relay bottlenecks are managed
|
||||
by TCP of client-to-relay and relay-to-relay OR connections, and not
|
||||
Tor's circuit-level congestion control. In this way, we can proceed to
|
||||
use the exact same congestion control as specified in [PROP324],
|
||||
for each path.
|
||||
relays (except for the exit). This assumption is valid, because non-relay
|
||||
bottlenecks are managed by TCP of client-to-relay and relay-to-relay OR
|
||||
connections, and not Tor's circuit-level congestion control. In this way,
|
||||
we can proceed to use the exact same congestion control as specified in
|
||||
[PROP324], for each path.
|
||||
|
||||
For this reason, this proposal will focus on protocol specification, and
|
||||
the traffic scheduling algorithms, rather than coupling. Note that the
|
||||
scheduling algorithms are currently in flux, and will be subject to
|
||||
change as we tune them in Shadow, on the live network, and for future
|
||||
UDP implementation (see [PROP339]).
|
||||
UDP implementation (see [PROP339]). This proposal will be kept up to
|
||||
date with the current implementation.
|
||||
|
||||
1.2. Divergence from the initial Conflux design
|
||||
|
||||
The initial [CONFLUX] paper doesn't provide any indications on how to
|
||||
handle the size of out-of-order cell queue, which we consider a
|
||||
potential dangerous memory DoS vector (see [MEMORY_DOS]). It also used
|
||||
RTT as the sole heuristic for selecting which circuit to send on, which
|
||||
RTT as the sole heuristic for selecting which circuit to send on (which
|
||||
may vary depending on the geographical locations of the participant
|
||||
relays, without considering their actual available circuit capacity
|
||||
relays), without considering their actual available circuit capacity
|
||||
(which will be available to us via Proposal 324). Additionally, since
|
||||
the publication of [CONFLUX], more modern packet scheduling algorithms
|
||||
have been developed, which aim to reduce out-of-order queue size.
|
||||
@ -70,8 +71,7 @@ Status: Needs-Revision
|
||||
|
||||
1.3. Design Overview
|
||||
|
||||
The following section describes the Conflux design. Each sub-section is
|
||||
a building block to the multipath design that Conflux proposes.
|
||||
The following section describes the Conflux design.
|
||||
|
||||
The circuit construction is as follows:
|
||||
|
||||
@ -110,10 +110,10 @@ Status: Needs-Revision
|
||||
different scheduling algorithms to determine this.
|
||||
|
||||
Initial RTT is measured during circuit linking, as described in
|
||||
[CONFLUX_HANDSHAKE]. RTT is continually measured using SENDME timing, as
|
||||
in Proposal 324. This means that during use, the primary circuit and
|
||||
secondary circuit may switch roles, depending on unrelated network
|
||||
congestion caused by other Tor clients.
|
||||
[CONFLUX_HANDSHAKE]. After the initial link, RTT is continually measured
|
||||
using SENDME timing, as in Proposal 324. This means that during use,
|
||||
the primary circuit and secondary circuit may switch roles, depending on
|
||||
unrelated network congestion caused by other Tor clients.
|
||||
|
||||
We also support linking onion service circuits together. In this case,
|
||||
only two rendezvous circuits are linked. Each of these RP circuits will
|
||||
@ -124,11 +124,11 @@ Status: Needs-Revision
|
||||
we have consulted (see [ACKNOWLEDGMENTS]), believe Tor's congestion
|
||||
control from Proposal 324 to be sufficient in this rare case.
|
||||
|
||||
In the algorithms we recommend here, only two circuits will be linked together at a time.
|
||||
However, implementations
|
||||
SHOULD support more than two paths, as this has been shown to assist in
|
||||
traffic analysis resistance[WTF_SPLIT], and will also be useful for
|
||||
maintaining a desired target RTT, for UDP VoIP applications.
|
||||
In the algorithms we recommend here, only two circuits will be linked
|
||||
together at a time. However, implementations SHOULD support more than
|
||||
two paths, as this has been shown to assist in traffic analysis
|
||||
resistance[WTF_SPLIT], and will also be useful for maintaining a desired
|
||||
target RTT, for UDP VoIP applications.
|
||||
|
||||
If the number of circuits exceeds the current number of guard relays,
|
||||
guard relays MAY be re-used, but implementations SHOULD use the same
|
||||
@ -160,11 +160,12 @@ Status: Needs-Revision
|
||||
The "max-num-circ" value indicate the maximum number of rendezvous
|
||||
circuits that are allowed to be linked together.
|
||||
|
||||
We let the service specify the conflux algorithm to use. Some services may
|
||||
prefer latency, where as some may prefer throughput. However, clients will
|
||||
also have to be able to override this request, because the high-throughput
|
||||
algorithms will require more out-of-order queue memory, which may be
|
||||
infeasible on mobile.
|
||||
We let the service specify the conflux algorithm to use, when sending data
|
||||
to the service. Some services may prefer latency, where as some may prefer
|
||||
throughput. However, clients also have the ability to request their own UX
|
||||
for data that the service sends, in the LINK handshake below, in part
|
||||
because the high-throughput algorithms will require more out-of-order queue
|
||||
memory, which may be infeasible on mobile.
|
||||
|
||||
The next section describes how the circuits are linked together.
|
||||
|
||||
@ -263,19 +264,34 @@ Status: Needs-Revision
|
||||
These three relay commands are sent on *each* leg, to allow each endpoint to
|
||||
measure the initial RTT of each leg.
|
||||
|
||||
The client SHOULD abandon and close circuit if the LINKED message takes too long to arrive.
|
||||
This timeout MUST be no larger than the normal SOCKS/stream timeout in use
|
||||
for RELAY_BEGIN, but MAY be the Circuit Build Timeout value, instead.
|
||||
(The C-Tor implementation currently uses Circuit Build Timeout).
|
||||
The client SHOULD abandon and close circuit if the LINKED message takes too
|
||||
long to arrive. This timeout MUST be no larger than the normal SOCKS/stream
|
||||
timeout in use for RELAY_BEGIN, but MAY be the Circuit Build Timeout value,
|
||||
instead. (The C-Tor implementation currently uses Circuit Build Timeout).
|
||||
|
||||
See [SIDE_CHANNELS] for rules for when to reject unexpected handshake cells.
|
||||
|
||||
2.2. Linking Circuits from OP to Exit [LINKING_EXIT]
|
||||
|
||||
To link exit circuits, two circuits to the same exit are built. When
|
||||
each circuit is opened, we ensure that congestion control has been
|
||||
negotiated. If congestion control negotiation has failed, the circuit
|
||||
MUST be closed. After this, the linking handshake begins.
|
||||
To link exit circuits, two circuits to the same exit are built, with
|
||||
additional restrictions such that they do not share Guard or Middle
|
||||
relays. This restriction applies via specific relay identity keys,
|
||||
and not IP addresses, families, or networks. (This is because the purpose
|
||||
of it is to avoid sharing a bottleneck *inside* relay circuit queues;
|
||||
bottlenecks caused by a shared network are handled by TCP's congestion
|
||||
control on the OR conns).
|
||||
|
||||
Bridges also are subject to the same constraint as Guard relays;
|
||||
the C-Tor codebase emits a warn if only one bridge is configured, unless
|
||||
that bridge has transport "snowflake". Snowflake is exempt from this
|
||||
Guard restriction because it is actually backed by many bridges. In the
|
||||
bridge case, we only warn, and do not refuse to build conflux circuits,
|
||||
because it is not catastrophic that Bridges are shared, it is just
|
||||
sub-optimal for performance and congestion.
|
||||
|
||||
When each circuit is opened, we ensure that congestion control
|
||||
has been negotiated. If congestion control negotiation has failed, the
|
||||
circuit MUST be closed. After this, the linking handshake begins.
|
||||
|
||||
The RTT times between RELAY_CONFLUX_LINK and RELAY_CONFLUX_LINKED are
|
||||
measured by the client, to determine primary vs secondary circuit use,
|
||||
@ -285,9 +301,11 @@ Status: Needs-Revision
|
||||
|
||||
Because of the race between initial data and the RELAY_CONFLUX_LINKED_ACK
|
||||
cell, conditions can arise where an Exit needs to send data before the
|
||||
slowest circuit delivers this ACK. In these cases, it should prefer the
|
||||
circuit that has delivered the ACK (which will arrive immediately prior
|
||||
to any data).
|
||||
slowest circuit delivers this ACK. In these cases, it should prefer sending
|
||||
data on the circuit that has delivered the ACK (which will arrive immediately
|
||||
prior to any data from the client). This circuit will be the lower RTT
|
||||
circuit anyway, but the code needs to handle the fact that in this case,
|
||||
there won't yet be an RTT for the second circuit.
|
||||
|
||||
2.3. Linking circuits to an onion service [LINKING_SERVICE]
|
||||
|
||||
@ -299,7 +317,20 @@ Status: Needs-Revision
|
||||
meet the client at two separate rendezvous points. These introduce
|
||||
requests MUST be sent to the same intropoint (due to potential use of
|
||||
onionbalance), and SHOULD be sent back-to-back on the same intro
|
||||
circuit. They MAY be combined with Proposal 340.
|
||||
circuit. They MAY be combined with Proposal 340. (Note that if we do not
|
||||
use Prop340, we will have to raise the limit on number of intros per
|
||||
client circuit to 2, here, at intropoints).
|
||||
|
||||
When rendezvous circuits are built, they should use the same Guard,
|
||||
Bridge, and Middle restrictions as specified in 2.2, for Exits. These
|
||||
restrictions SHOULD also take into consideration all Middles in the path,
|
||||
including the rendezvous point. All relay identities should be unique
|
||||
(again, except for when the Snowflake transport is in use). The one
|
||||
special case here is if the chosen rendezvous points by a client
|
||||
are the same as the service's guards. In this case, the service SHOULD
|
||||
NOT use different guards, but instead stick with the guards it has.
|
||||
The reason for this is that we do not want to create the ability
|
||||
for a client to force a service to use different guards.
|
||||
|
||||
The first rendezvous circuit to get joined SHOULD use Proposal 340 to
|
||||
append the RELAY_BEGIN command, and the service MUST answer on this
|
||||
@ -324,9 +355,9 @@ Status: Needs-Revision
|
||||
In C-Tor, conflux is only used via circuit prebuilding. Pre-built conflux
|
||||
sets are preferred over other pre-built circuits, but if the pre-built pool
|
||||
ends up empty, normal pre-built circuits are used. If those run out, regular
|
||||
non-conflux circuits are built. Conflux sets are never built on-demand, but
|
||||
this is strictly an implementation decision, to simplify dealing with the
|
||||
C-Tor codebase.
|
||||
non-conflux circuits are built. In other words, in C-Tor, conflux sets are
|
||||
never built on-demand, but this is strictly an implementation decision, to
|
||||
simplify dealing with the C-Tor codebase
|
||||
|
||||
The consensus parameter 'cfx_max_prebuilt_set' specifies the number of
|
||||
sets to pre-build.
|
||||
@ -340,13 +371,13 @@ Status: Needs-Revision
|
||||
When a set is launched, legs begin the handshake in the unlinked state.
|
||||
As handshakes complete, finalization is attempted, to create a linked set.
|
||||
On the client, this finalization happens upon receipt of the LINKED cell.
|
||||
On the exit/service, this finalization happens upon sending the LINKED_ACK.
|
||||
On the exit/service, this finalization happens upon *sending* the LINKED_ACK.
|
||||
|
||||
The initiator of this handshake considers the set fully linked once the
|
||||
RELAY_CONFLUX_LINKED_ACK is sent (roughly upon receipt of the LINKED cell).
|
||||
Because of the potential race between LINKED_ACK, and initial data sent by
|
||||
the client, the receiver of the handshake must consider a leg linked at
|
||||
the time of sending a LINKED cell.
|
||||
the time of *sending* a LINKED_ACK cell.
|
||||
|
||||
This means that exit legs may not have an RTT measurement, if data on the
|
||||
faster leg beats the LINKED_ACK on the slower leg. The implementation MUST
|
||||
@ -379,15 +410,15 @@ Status: Needs-Revision
|
||||
legs' maximum last_seq_recv, and a lower last_seq_recv than all
|
||||
current legs last_seq_sent.
|
||||
|
||||
This check is performed on finalization, not the receipt of the cell. This
|
||||
gives the data additional time to arrive.
|
||||
This check is performed on finalization, not the receipt of first
|
||||
handshake cell. This gives the data additional time to arrive.
|
||||
|
||||
2.5. Congestion Control Application [CONGESTION_CONTROL]
|
||||
|
||||
The SENDMEs for congestion control are performed per-leg. As soon as
|
||||
data arrives, regardless of its ordering, it is counted towards SENDME
|
||||
delivery. In this way, 'cwnd - package_window' of each leg always
|
||||
reflects the available data to send on each leg. This is important for
|
||||
delivery. In this way, 'cwnd - inflight' of each leg always reflects
|
||||
the available data to send on each leg. This is important for
|
||||
[SCHEDULING].
|
||||
|
||||
The Congestion control Stream XON/XOFF can be sent on either leg, and
|
||||
@ -397,7 +428,19 @@ Status: Needs-Revision
|
||||
of their circuit was blocked. Because conflux can send on the other
|
||||
circuit, which uses a different OR conn, this form of stream blocking
|
||||
has been decoupled from the OR conn status, and only happens when
|
||||
congestion control has decided that all circuits are blocked.
|
||||
congestion control has decided that all circuits are blocked (congestion
|
||||
control becomes blocked when either 'cwnd - inflight <= 0', *or* when
|
||||
the local OR conn is blocked, so if all local OR conns of a set are
|
||||
blocked, then the stream will block that way).
|
||||
|
||||
Note also that because congestion control only covers RELAY_COMMAND_DATA
|
||||
cells, for all algorithms, a special case must be made such that if no
|
||||
circuit is available to send on due to congestion control blocking,
|
||||
commands other than RELAY_COMMAN_DATA MUST be sent on the current
|
||||
circuit, even if the cell scheduler believes that no circuit is available.
|
||||
Depending on the code structure of Arti, this special case may or may
|
||||
not be necessary. It arises in C-Tor because nothing can block the
|
||||
sending of arbitrary non-DATA relay command cells.
|
||||
|
||||
2.6. Sequencing [SEQUENCING]
|
||||
|
||||
@ -519,11 +562,12 @@ Status: Needs-Revision
|
||||
RELAY_COMMAND_XON
|
||||
|
||||
Currently, this set is the same as the set of cells that have stream ID,
|
||||
but the property that enforces this is that these cells must be ordered
|
||||
with respect to all data on the circuit. It is not impossible that future
|
||||
cells could be invented that don't have stream IDs, but yet must still
|
||||
arrive in order with respect to circuit data cells. Prop#253 is one
|
||||
possible example of such a thing (though we won't be implementing that).
|
||||
but the property that leads to this requirement is not stream usage by
|
||||
itself, it is that these cells must be ordered with respect to all data
|
||||
on the circuit. It is not impossible that future relay commands could be
|
||||
invented that don't have stream IDs, but yet must still arrive in order
|
||||
with respect to circuit data cells. Prop#253 is one possible example of
|
||||
such a thing (though we won't be implementing that proposal).
|
||||
|
||||
|
||||
3. Traffic Scheduling [SCHEDULING]
|
||||
@ -532,60 +576,63 @@ Status: Needs-Revision
|
||||
original conflux paper used only RTT. However, with Proposal 324, we
|
||||
will have accurate information on the instantaneous available bandwidth
|
||||
of each circuit leg, as 'cwnd - inflight' (see Section 3 of
|
||||
Proposal 324).
|
||||
|
||||
Some additional RTT optimizations are also useful, to improve
|
||||
responsiveness and minimize out-of-order queue sizes.
|
||||
Proposal 324). We also have the TCP block state of the local OR
|
||||
connection.
|
||||
|
||||
We specify two traffic schedulers from the multipath literature and
|
||||
adapt them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has
|
||||
three variants, with different trade offs.
|
||||
adapt them to Tor: [MINRTT_TOR], and [LOWRTT_TOR]. Additionally,
|
||||
we create low-memory variants of these that aim to minimize the
|
||||
out-of-order queue size at the receiving endpoint.
|
||||
|
||||
However, see the [TRAFFIC_ANALYSIS] sections of this proposal for
|
||||
Additionally, see the [TRAFFIC_ANALYSIS] sections of this proposal for
|
||||
important details on how this selection can be changed, to reduce
|
||||
website traffic fingerprinting.
|
||||
|
||||
XXX: These sections are not accurate, and are subject to change
|
||||
during the alpha process, via Shadow simulation. We need to specify
|
||||
candidate algorithms for the UX properties. The latency algorithms
|
||||
will be related to LOWRTT_TOR, and the throughput algorithms related
|
||||
to BLEST_TOR, but significant changes will arise during evaluation,
|
||||
and possibly also live deployment iteration.
|
||||
3.1. MinRTT scheduling [MINRTT_TOR]
|
||||
|
||||
3.1. LowRTT Scheduling [LOWRTT_TOR]
|
||||
This schedulng algorithm is used for the MIN_LATENCY user experience.
|
||||
|
||||
This scheduling algorithm is based on the original [CONFLUX] paper, with
|
||||
ideas from [MPTCP]'s minRTT/LowRTT scheduler.
|
||||
It works by always and only sending on the circuit with the current minimum
|
||||
RTT. With this algorithm, conflux should effectively stay on the circuit with
|
||||
the lowest initial RTT, unless that circuit's RTT raises above the RTT of the
|
||||
other circuit (due to relay load or congestion). When the circuit's congestion
|
||||
window is full (ie: cwnd - inflight <= 0), or if the local OR conn blocks,
|
||||
the conflux set stops transmitting and stops reading on edge connections,
|
||||
rather than switch.
|
||||
|
||||
In this algorithm, endpoints send cells on the circuit with lower RTT
|
||||
(primary circuit). This continues while the congestion window on the
|
||||
circuit has available room: ie whenever cwnd - package_window > 0.
|
||||
This should result in low out-of-order queues in most situations, unless
|
||||
the initial RTTs of the two circuits are very close (basically within the
|
||||
Vegas RTT bounds of queue variance, 'alpha' and 'beta').
|
||||
|
||||
Whenever the primary circuit's congestion window becomes full, the
|
||||
secondary circuit is used. We stop reading on the send window source
|
||||
(edge connection) when both congestion windows become full.
|
||||
3.2. LowRTT Scheduling [LOWRTT_TOR]
|
||||
|
||||
This scheduling algorithm is based on [MPTCP]'s LowRTT scheduler. This
|
||||
algorithm is used for the UX choice of HIGH_THROUGHPUT.
|
||||
|
||||
In this algorithm, endpoints send cells on the circuit with lowest RTT that
|
||||
has an unblocked local OR connection, and room in its congestion window (ie:
|
||||
cwnd - inflight > 0). We stop reading on edge connections only when both
|
||||
congestion windows become full, or when both local OR connections are blocked.
|
||||
|
||||
In this way, unlike original conflux, we switch to the secondary circuit
|
||||
without causing congestion on the primary circuit. This improves both
|
||||
load times, and overall throughput.
|
||||
without causing congestion either locally, or on either circuit. This
|
||||
improves both load times, and overall throughput. Given a large enough
|
||||
transmission, both circuits are used to their full capacity,
|
||||
simultaneously.
|
||||
|
||||
This behavior matches minRTT from [MPTCP], sometimes called LowRTT.
|
||||
3.3. MinRTT Low-Memory Scheduling [MINRTT_LOWMEM_TOR]
|
||||
|
||||
It may be better to stop reading on the edge connection when the primary
|
||||
congestion window becomes full, rather than switch to the secondary
|
||||
circuit as soon as the primary congestion window becomes full. (Ie: only
|
||||
switch if the RTTs themselves change which circuit is primary). This is
|
||||
what was done in the original Conflux paper. This behavior effectively
|
||||
causes us to optimize for responsiveness and congestion avoidance,
|
||||
rather than throughput. For evaluation, we will control this switching
|
||||
behavior with a consensus parameter (see [CONSENSUS_PARAMETERS]).
|
||||
The low memory version of the MinRTT scheduler ensures that we do not
|
||||
perform a switch more often than once per congestion window worth of data.
|
||||
|
||||
Because of potential side channel risk (see [SIDE_CHANNELS]), a third
|
||||
variant of this algorithm, where the primary circuit is chosen during
|
||||
the [LINKING_CIRCUITS] handshake and never changed, is also possible
|
||||
to control via consensus parameter.
|
||||
XXX: Other rate limiting, such as not switching unless the RTT changes by
|
||||
more than X%, may be useful here.
|
||||
|
||||
3.2. BLEST Scheduling [BLEST_TOR]
|
||||
3.4. BLEST Scheduling [BLEST_TOR]
|
||||
|
||||
XXX: Something like this might be useful for minimizing OOQ for the UX
|
||||
choice of LOW_MEM_THROUGHPUT, but we might just be able to reduce switching
|
||||
frequency instead.
|
||||
|
||||
XXX: We want an algorithm that only uses cwnd instead. This algorithm
|
||||
has issues if the primary cwnd grows while the secondary does not.
|
||||
@ -701,7 +748,7 @@ Status: Needs-Revision
|
||||
important to be mindful of ways that an adversary can inject new cell
|
||||
commands, as well as ways that the adversary can spawn new circuits
|
||||
arbitrarily.
|
||||
|
||||
|
||||
It is also important, though slightly less so, to be mindful of the uniqueness
|
||||
of new handshakes, as handshakes can be used to classify usage (such as via
|
||||
Onion Service Circuit Fingerprinting). Handshake side channels are only
|
||||
|
Loading…
Reference in New Issue
Block a user