Merge branch 'netflow_clarification'

This commit is contained in:
Nick Mathewson 2022-05-27 14:26:02 -04:00
commit ffceda4ac2

View File

@ -143,6 +143,12 @@ Table of Contents
user traffic in that time period is multiplexed over a single connection
(as it is with Tor).
Though flow measurement in principle can be bidirectional (counting cells
sent in both directions between a pair of IPs) or unidirectional (counting
only cells sent from one IP to another), we assume for safety that all
measurement is unidirectional, and so traffic must be sent by both parties
in order to prevent record splitting.
2.2. Implementation
Tor clients currently maintain one TLS connection to their Guard node to
@ -154,35 +160,41 @@ Table of Contents
connections, and pad them, but otherwise not pad between normal relays.
Both clients and Guards will maintain a timer for all application (ie:
non-directory) TLS connections. Every time a non-padding packet is sent or
received by either end, that endpoint will sample a timeout value from
between 1.5 seconds and 9.5 seconds using the max(X,X) distribution
described in Section 2.3. The time range is subject to consensus
non-directory) TLS connections. Every time a padding packet sent by an
endpoint, that endpoint will sample a timeout value from
the max(X,X) distribution described in Section 2.3. The default
range is from 1.5 seconds to 9.5 seconds time range, subject to consensus
parameters as specified in Section 2.6.
If the connection becomes active for any reason before this timer
expires, the timer is reset to a new random value between 1.5 and 9.5
seconds. If the connection remains inactive until the timer expires, a
single CELL_PADDING cell will be sent on that connection.
(The timing is randomized to avoid making it obvious which cells are
padding.)
In this way, the connection will only be padded in the event that it is
idle, and will always transmit a packet before the minimum 10 second inactive
timeout.
If another cell is sent for any reason before this timer expires, the timer
is reset to a new random value.
If the connection remains inactive until the timer expires, a
single CELL_PADDING cell will be sent on that connection (which will
also start a new timer).
In this way, the connection will only be padded in a given direction in
the event that it is idle in that direction, and will always transmit a
packet before the minimum 10 second inactive timeout.
(In practice, an implementation may not be able to determine when,
exactly, a cell is sent on a given channel. For example, even though the
cell has been given to the kernel via a call to `send(2)`, the kernel may
still be buffering that cell. In cases such as these, implementations
should use a reasonable proxy for the time at which a cell is sent: for
example, when the cell is queued. If this strategy is used,
implementations should try to observe the innermost (closest to the wire)
queue that the practically can, and if this queue is already nonempty,
padding should not be scheduled until after the queue does become empty.)
2.3. Padding Cell Timeout Distribution Statistics
It turns out that because the padding is bidirectional, and because both
endpoints are maintaining timers, this creates the situation where the time
before sending a padding packet in either direction is actually
min(client_timeout, server_timeout).
If client_timeout and server_timeout are uniformly sampled, then the
distribution of min(client_timeout,server_timeout) is no longer uniform, and
the resulting average timeout (Exp[min(X,X)]) is much lower than the
midpoint of the timeout range.
To compensate for this, instead of sampling each endpoint timeout uniformly,
we instead sample it from max(X,X), where X is uniformly distributed.
To limit the amount of padding sent, instead of sampling each endpoint
timeout uniformly, we instead sample it from max(X,X), where X is
uniformly distributed.
If X is a random variable uniform from 0..R-1 (where R=high-low), then the
random variable Y = max(X,X) has Prob(Y == i) = (2.0*i + 1)/(R*R).
@ -206,9 +218,6 @@ Table of Contents
15000 7499.5 7995 4999.5 9999.5
20000 9900.5 10661 6666.2 13332.8
In this way, we maintain the property that the midpoint of the timeout range
is the expected mean time before a padding packet is sent in either
direction.
2.4. Maximum overhead bounds
@ -253,6 +262,13 @@ Table of Contents
CELL_PADDING_NEGOTIATE to instruct the relay not to pad, and then does not
send any further padding itself.
Currently, clients negotiate padding only when a channel is created,
immediately after sending their NETINFO cell. Recipients SHOULD, however,
accept padding negotiation messages at any time.
Clients and bridges MUST reject padding negotiation messages from relays,
and close the channel if they receive one.
2.6. Consensus Parameters Governing Behavior
Connection-level padding is controlled by the following consensus parameters:
@ -277,11 +293,22 @@ Table of Contents
- Default: 14000
* nf_conntimeout_clients
- The number of seconds to keep circuits opened and available for
clients to use. Note that the actual client timeout is randomized
uniformly from this value to twice this value. This governs client
OR conn lifespan. Reduced padding clients use half the consensus
- The number of seconds to keep never-used circuits opened and
available for clients to use. Note that the actual client timeout is
randomized uniformly from this value to twice this value.
- The number of seconds to keep idle (not currently used) canonical
channels are open and available. (We do this to ensure a sufficient
time duration of padding, which is the ultimate goal.)
- This value is also used to determine how long, after a port has been
used, we should attempt to keep building predicted circuits for that
port. (See path-spec.txt section 2.1.1.) This behavior was
originally added to work around implementation limitations, but it
serves as a reasonable default regardless of implementation.
- For all use cases, reduced padding clients use half the consensus
value.
- Implementations MAY mark circuits held open past the reduced padding
quantity (half the consensus value) as "not to be used for streams",
to prevent their use from becoming a distinguisher.
- Default: 1800
* nf_pad_before_usage