mirror of
https://github.com/torproject/torspec.git
synced 2024-12-13 21:48:45 +00:00
Update proposal to bring it more in-line with implementation.
This commit is contained in:
parent
92c40e9727
commit
0b439985f4
@ -2,7 +2,7 @@ Filename: 151-path-selection-improvements.txt
|
||||
Title: Improving Tor Path Selection
|
||||
Author: Fallon Chen, Mike Perry
|
||||
Created: 5-Jul-2008
|
||||
Status: Draft
|
||||
Status: Implemented
|
||||
|
||||
Overview
|
||||
|
||||
@ -22,51 +22,37 @@ Implementation
|
||||
|
||||
Storing Build Times
|
||||
|
||||
Circuit build times will be stored in the circular array
|
||||
'circuit_build_times' consisting of uint16_t elements as milliseconds.
|
||||
The total size of this array will be based on the number of circuits
|
||||
Circuit build times are stored in the circular array
|
||||
'circuit_build_times' consisting of uint32_t elements as milliseconds.
|
||||
The total size of this array is based on the number of circuits
|
||||
it takes to converge on a good fit of the long term distribution of
|
||||
the circuit builds for a fixed link. We do not want this value to be
|
||||
too large, because it will make it difficult for clients to adapt to
|
||||
moving between different links.
|
||||
|
||||
From our initial observations, this value appears to be on the order
|
||||
of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
|
||||
The exact value for this #define will be determined by performing
|
||||
goodness of fit tests using measurments obtained from the shufflebt.py
|
||||
script from TorFlow.
|
||||
From our observations, this value appears to be on the order of 1000,
|
||||
but is configurable in a #define NCIRCUITS_TO_OBSERVE.
|
||||
|
||||
Long Term Storage
|
||||
|
||||
The long-term storage representation will be implemented by storing a
|
||||
The long-term storage representation is implemented by storing a
|
||||
histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
|
||||
writing out the statistics to disk. The format of this histogram on disk
|
||||
is yet to be finalized, but it will likely be of the format
|
||||
'CircuitBuildTime <bin> <count>', with the total specified as
|
||||
'TotalBuildTimes <total>'
|
||||
writing out the statistics to disk. The format this takes in the
|
||||
state file is 'CircuitBuildTime <bin-ms> <count>', with the total
|
||||
specified as 'TotalBuildTimes <total>'
|
||||
Example:
|
||||
|
||||
TotalBuildTimes 100
|
||||
CircuitBuildTimeBin 1 50
|
||||
CircuitBuildTimeBin 2 25
|
||||
CircuitBuildTimeBin 3 13
|
||||
CircuitBuildTimeBin 0 50
|
||||
CircuitBuildTimeBin 50 25
|
||||
CircuitBuildTimeBin 100 13
|
||||
...
|
||||
|
||||
Reading the histogram in will entail multiplying each bin by the
|
||||
BUILDTIME_BIN_WIDTH and then inserting <count> values into the
|
||||
circuit_build_times array each with the value of
|
||||
<bin>*BUILDTIME_BIN_WIDTH. In order to evenly distribute the
|
||||
values in the circular array, a form of index skipping must
|
||||
be employed. Values from bin #N with bin count C and total T
|
||||
will occupy indexes specified by N+((T/C)*k)-1, where k is the
|
||||
set of integers ranging from 0 to C-1.
|
||||
|
||||
For example, this would mean that the values from bin 1 would
|
||||
occupy indexes 1+(100/50)*k-1, or 0, 2, 4, 6, 8, 10 and so on.
|
||||
The values for bin 2 would occupy positions 1, 5, 9, 13. Collisions
|
||||
will be inserted at the first empty position in the array greater
|
||||
than the selected index (which may requiring looping around the
|
||||
array back to index 0).
|
||||
Reading the histogram in will entail inserting <count> values
|
||||
into the circuit_build_times array each with the value of
|
||||
<bin-ms> milliseconds. In order to evenly distribute the values
|
||||
in the circular array, the Fisher-Yates shuffle will be performed
|
||||
after reading values from the bins.
|
||||
|
||||
Learning the CircuitBuildTimeout
|
||||
|
||||
@ -77,14 +63,28 @@ Implementation
|
||||
fitting the data using the estimators at
|
||||
http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
|
||||
|
||||
The timeout itself will be calculated by solving the CDF for the
|
||||
a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value
|
||||
represents the percentage of paths the Tor client will accept out of
|
||||
the total number of paths. We have not yet determined a good
|
||||
cutoff for this mathematically, but 85% seems a good choice for now.
|
||||
The timeout itself is calculated by using the Quartile function (the
|
||||
inverted CDF) to give us the value on the CDF such that
|
||||
BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is
|
||||
below the timeout value.
|
||||
|
||||
From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
|
||||
the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm.
|
||||
Thus, we expect that the Tor client will accept the fastest 80% of
|
||||
the total number of paths on the network.
|
||||
|
||||
Detecting Changing Network Conditions
|
||||
|
||||
We attempt to detect both network connectivty loss and drastic
|
||||
changes in the timeout characteristics. Network connectivity loss
|
||||
is detected by recording a timestamp every time Tor either completes
|
||||
a TLS connection or receives a cell. If this timestamp is more than
|
||||
90 seconds in the past, circuit timeouts are no longer counted.
|
||||
|
||||
If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
|
||||
RECENT_CIRCUITS (20) time out, we assume the network connection
|
||||
has changed, and we discard all buildtimes history and compute
|
||||
a new timeout by estimating a new Pareto curve using the
|
||||
position on the Pareto Quartile function for the ratio of
|
||||
timeouts.
|
||||
|
||||
Testing
|
||||
|
||||
@ -104,7 +104,7 @@ Implementation
|
||||
of a new circuit, and the hard cutoff triggers destruction of the
|
||||
circuit.
|
||||
|
||||
Good values for hard and soft cutoffs seem to be 85% and 65%
|
||||
Good values for hard and soft cutoffs seem to be 80% and 60%
|
||||
respectively, but we should eventually justify this with observation.
|
||||
|
||||
When to Begin Calculation
|
||||
|
Loading…
Reference in New Issue
Block a user