Update proposal to match implementation.

This commit is contained in:
Mike Perry 2009-09-16 17:03:54 -07:00
parent 02b36f5bee
commit 84e82f5cc0

View File

@ -20,7 +20,7 @@ Motivation
Implementation Implementation
Storing Build Times Gathering Build Times
Circuit build times are stored in the circular array Circuit build times are stored in the circular array
'circuit_build_times' consisting of uint32_t elements as milliseconds. 'circuit_build_times' consisting of uint32_t elements as milliseconds.
@ -30,8 +30,16 @@ Implementation
too large, because it will make it difficult for clients to adapt to too large, because it will make it difficult for clients to adapt to
moving between different links. moving between different links.
From our observations, this value appears to be on the order of 1000, From our observations, the minimum value for a reasonable fit appears
but is configurable in a #define NCIRCUITS_TO_OBSERVE. to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
a good fit over the long term, we store 5000 most recent circuits in
the array (NCIRCUITS_TO_OBSERVE).
The Tor client will build test circuits at a rate of one per
minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
a CircuitBuildTimeout estimated within 8 hours after install,
upgrade, or network change (see below).
Long Term Storage Long Term Storage
@ -43,9 +51,9 @@ Implementation
Example: Example:
TotalBuildTimes 100 TotalBuildTimes 100
CircuitBuildTimeBin 0 50 CircuitBuildTimeBin 25 50
CircuitBuildTimeBin 50 25 CircuitBuildTimeBin 75 25
CircuitBuildTimeBin 100 13 CircuitBuildTimeBin 125 13
... ...
Reading the histogram in will entail inserting <count> values Reading the histogram in will entail inserting <count> values
@ -57,7 +65,12 @@ Implementation
Learning the CircuitBuildTimeout Learning the CircuitBuildTimeout
Based on studies of build times, we found that the distribution of Based on studies of build times, we found that the distribution of
circuit buildtimes appears to be a Pareto distribution. circuit buildtimes appears to be a Frechet distribution. However,
estimators and quantile functions of the Frechet distribution are
difficult to work with and slow to converge. So instead, since we
are only interested in the accuracy of the tail, we approximate
the tail of the distribution with a Pareto curve starting at
the mode of the circuit build time sample set.
We will calculate the parameters for a Pareto distribution We will calculate the parameters for a Pareto distribution
fitting the data using the estimators at fitting the data using the estimators at
@ -73,11 +86,8 @@ Implementation
Detecting Changing Network Conditions Detecting Changing Network Conditions
We attempt to detect both network connectivty loss and drastic We attempt to detect both network connectivity loss and drastic
changes in the timeout characteristics. Network connectivity loss changes in the timeout characteristics.
is detected by recording a timestamp every time Tor either completes
a TLS connection or receives a cell. If this timestamp is more than
90 seconds in the past, circuit timeouts are no longer counted.
If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
RECENT_CIRCUITS (20) time out, we assume the network connection RECENT_CIRCUITS (20) time out, we assume the network connection
@ -86,6 +96,11 @@ Implementation
position on the Pareto Quartile function for the ratio of position on the Pareto Quartile function for the ratio of
timeouts. timeouts.
Network connectivity loss is detected by recording a timestamp every
time Tor either completes a TLS connection or receives a cell. If
this timestamp is more than CircuitBuildTimeout*RECENT_CIRCUITS/3
seconds in the past, circuit timeouts are no longer counted.
Testing Testing
After circuit build times, storage, and learning are implemented, After circuit build times, storage, and learning are implemented,
@ -96,7 +111,18 @@ Implementation
the python produces matches that which is output to the state file in Tor, the python produces matches that which is output to the state file in Tor,
and verify that the Pareto parameters and cutoff points also match. and verify that the Pareto parameters and cutoff points also match.
Soft timeout vs Hard Timeout We will also verify that there are no unexpected large deviations from
node selection, such as nodes from distant geographical locations being
completely excluded.
Dealing with Timeouts
Timeouts should be counted as the expectation of the region of
of the Pareto distribution beyond the cutoff. This is done by
generating a random sample for each timeout at points on the
curve beyond the current timeout cutoff.
Future Work
At some point, it may be desirable to change the cutoff from a At some point, it may be desirable to change the cutoff from a
single hard cutoff that destroys the circuit to a soft cutoff and single hard cutoff that destroys the circuit to a soft cutoff and
@ -104,36 +130,9 @@ Implementation
of a new circuit, and the hard cutoff triggers destruction of the of a new circuit, and the hard cutoff triggers destruction of the
circuit. circuit.
Good values for hard and soft cutoffs seem to be 80% and 60% It may also be beneficial to learn separate timeouts for each
respectively, but we should eventually justify this with observation. guard node, as they will have slightly different distributions.
This will take longer to generate initial values though.
When to Begin Calculation
The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
changing the CircuitBuildTimeout will be tunable via a #define. From
our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
on the order of 100.
Dealing with Timeouts
Timeouts should be counted as the expectation of the region of
of the Pareto distribution beyond the cutoff. The proposal will
be updated with this value soon.
Also, in the event of network failure, the observation mechanism
should stop collecting timeout data.
Client Hints
Some research still needs to be done to provide initial values
for CircuitBuildTimeout based on values learned from modem
users, DSL users, Cable Modem users, and dedicated links. A
radiobutton in Vidalia should eventually be provided that
sets CircuitBuildTimeout to one of these values and also
provide the option of purging all learned data, should any exist.
These values can either be published in the directory, or
shipped hardcoded for a particular Tor version.
Issues Issues