mirror of
https://github.com/torproject/torspec.git
synced 2024-12-14 05:58:33 +00:00
Update proposal to match implementation.
This commit is contained in:
parent
02b36f5bee
commit
84e82f5cc0
@ -20,7 +20,7 @@ Motivation
|
|||||||
|
|
||||||
Implementation
|
Implementation
|
||||||
|
|
||||||
Storing Build Times
|
Gathering Build Times
|
||||||
|
|
||||||
Circuit build times are stored in the circular array
|
Circuit build times are stored in the circular array
|
||||||
'circuit_build_times' consisting of uint32_t elements as milliseconds.
|
'circuit_build_times' consisting of uint32_t elements as milliseconds.
|
||||||
@ -30,8 +30,16 @@ Implementation
|
|||||||
too large, because it will make it difficult for clients to adapt to
|
too large, because it will make it difficult for clients to adapt to
|
||||||
moving between different links.
|
moving between different links.
|
||||||
|
|
||||||
From our observations, this value appears to be on the order of 1000,
|
From our observations, the minimum value for a reasonable fit appears
|
||||||
but is configurable in a #define NCIRCUITS_TO_OBSERVE.
|
to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
|
||||||
|
a good fit over the long term, we store 5000 most recent circuits in
|
||||||
|
the array (NCIRCUITS_TO_OBSERVE).
|
||||||
|
|
||||||
|
The Tor client will build test circuits at a rate of one per
|
||||||
|
minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
|
||||||
|
MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
|
||||||
|
a CircuitBuildTimeout estimated within 8 hours after install,
|
||||||
|
upgrade, or network change (see below).
|
||||||
|
|
||||||
Long Term Storage
|
Long Term Storage
|
||||||
|
|
||||||
@ -43,9 +51,9 @@ Implementation
|
|||||||
Example:
|
Example:
|
||||||
|
|
||||||
TotalBuildTimes 100
|
TotalBuildTimes 100
|
||||||
CircuitBuildTimeBin 0 50
|
CircuitBuildTimeBin 25 50
|
||||||
CircuitBuildTimeBin 50 25
|
CircuitBuildTimeBin 75 25
|
||||||
CircuitBuildTimeBin 100 13
|
CircuitBuildTimeBin 125 13
|
||||||
...
|
...
|
||||||
|
|
||||||
Reading the histogram in will entail inserting <count> values
|
Reading the histogram in will entail inserting <count> values
|
||||||
@ -57,7 +65,12 @@ Implementation
|
|||||||
Learning the CircuitBuildTimeout
|
Learning the CircuitBuildTimeout
|
||||||
|
|
||||||
Based on studies of build times, we found that the distribution of
|
Based on studies of build times, we found that the distribution of
|
||||||
circuit buildtimes appears to be a Pareto distribution.
|
circuit buildtimes appears to be a Frechet distribution. However,
|
||||||
|
estimators and quantile functions of the Frechet distribution are
|
||||||
|
difficult to work with and slow to converge. So instead, since we
|
||||||
|
are only interested in the accuracy of the tail, we approximate
|
||||||
|
the tail of the distribution with a Pareto curve starting at
|
||||||
|
the mode of the circuit build time sample set.
|
||||||
|
|
||||||
We will calculate the parameters for a Pareto distribution
|
We will calculate the parameters for a Pareto distribution
|
||||||
fitting the data using the estimators at
|
fitting the data using the estimators at
|
||||||
@ -73,11 +86,8 @@ Implementation
|
|||||||
|
|
||||||
Detecting Changing Network Conditions
|
Detecting Changing Network Conditions
|
||||||
|
|
||||||
We attempt to detect both network connectivty loss and drastic
|
We attempt to detect both network connectivity loss and drastic
|
||||||
changes in the timeout characteristics. Network connectivity loss
|
changes in the timeout characteristics.
|
||||||
is detected by recording a timestamp every time Tor either completes
|
|
||||||
a TLS connection or receives a cell. If this timestamp is more than
|
|
||||||
90 seconds in the past, circuit timeouts are no longer counted.
|
|
||||||
|
|
||||||
If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
|
If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
|
||||||
RECENT_CIRCUITS (20) time out, we assume the network connection
|
RECENT_CIRCUITS (20) time out, we assume the network connection
|
||||||
@ -86,6 +96,11 @@ Implementation
|
|||||||
position on the Pareto Quartile function for the ratio of
|
position on the Pareto Quartile function for the ratio of
|
||||||
timeouts.
|
timeouts.
|
||||||
|
|
||||||
|
Network connectivity loss is detected by recording a timestamp every
|
||||||
|
time Tor either completes a TLS connection or receives a cell. If
|
||||||
|
this timestamp is more than CircuitBuildTimeout*RECENT_CIRCUITS/3
|
||||||
|
seconds in the past, circuit timeouts are no longer counted.
|
||||||
|
|
||||||
Testing
|
Testing
|
||||||
|
|
||||||
After circuit build times, storage, and learning are implemented,
|
After circuit build times, storage, and learning are implemented,
|
||||||
@ -96,7 +111,18 @@ Implementation
|
|||||||
the python produces matches that which is output to the state file in Tor,
|
the python produces matches that which is output to the state file in Tor,
|
||||||
and verify that the Pareto parameters and cutoff points also match.
|
and verify that the Pareto parameters and cutoff points also match.
|
||||||
|
|
||||||
Soft timeout vs Hard Timeout
|
We will also verify that there are no unexpected large deviations from
|
||||||
|
node selection, such as nodes from distant geographical locations being
|
||||||
|
completely excluded.
|
||||||
|
|
||||||
|
Dealing with Timeouts
|
||||||
|
|
||||||
|
Timeouts should be counted as the expectation of the region of
|
||||||
|
of the Pareto distribution beyond the cutoff. This is done by
|
||||||
|
generating a random sample for each timeout at points on the
|
||||||
|
curve beyond the current timeout cutoff.
|
||||||
|
|
||||||
|
Future Work
|
||||||
|
|
||||||
At some point, it may be desirable to change the cutoff from a
|
At some point, it may be desirable to change the cutoff from a
|
||||||
single hard cutoff that destroys the circuit to a soft cutoff and
|
single hard cutoff that destroys the circuit to a soft cutoff and
|
||||||
@ -104,36 +130,9 @@ Implementation
|
|||||||
of a new circuit, and the hard cutoff triggers destruction of the
|
of a new circuit, and the hard cutoff triggers destruction of the
|
||||||
circuit.
|
circuit.
|
||||||
|
|
||||||
Good values for hard and soft cutoffs seem to be 80% and 60%
|
It may also be beneficial to learn separate timeouts for each
|
||||||
respectively, but we should eventually justify this with observation.
|
guard node, as they will have slightly different distributions.
|
||||||
|
This will take longer to generate initial values though.
|
||||||
When to Begin Calculation
|
|
||||||
|
|
||||||
The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
|
|
||||||
changing the CircuitBuildTimeout will be tunable via a #define. From
|
|
||||||
our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
|
|
||||||
on the order of 100.
|
|
||||||
|
|
||||||
Dealing with Timeouts
|
|
||||||
|
|
||||||
Timeouts should be counted as the expectation of the region of
|
|
||||||
of the Pareto distribution beyond the cutoff. The proposal will
|
|
||||||
be updated with this value soon.
|
|
||||||
|
|
||||||
Also, in the event of network failure, the observation mechanism
|
|
||||||
should stop collecting timeout data.
|
|
||||||
|
|
||||||
Client Hints
|
|
||||||
|
|
||||||
Some research still needs to be done to provide initial values
|
|
||||||
for CircuitBuildTimeout based on values learned from modem
|
|
||||||
users, DSL users, Cable Modem users, and dedicated links. A
|
|
||||||
radiobutton in Vidalia should eventually be provided that
|
|
||||||
sets CircuitBuildTimeout to one of these values and also
|
|
||||||
provide the option of purging all learned data, should any exist.
|
|
||||||
|
|
||||||
These values can either be published in the directory, or
|
|
||||||
shipped hardcoded for a particular Tor version.
|
|
||||||
|
|
||||||
Issues
|
Issues
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user