Updated to remove dropping of failing guards and just focus

on the specifics of recording, storing, and learning
circuitbuildtimeout parameters.



svn:r16511
This commit is contained in:
Mike Perry 2008-08-12 18:23:38 +00:00
parent dfbeee69a6
commit 95969867fc

View File

@ -10,8 +10,8 @@ Overview
The performance of paths selected can be improved by adjusting the
CircuitBuildTimeout and avoiding failing guard nodes. This proposal
describes a method of tracking buildtime statistics, and using those
statistics to adjust the CircuitBuildTimeout and the number of guards.
describes a method of tracking buildtime statistics at the client, and
using those statistics to adjust the CircuitBuildTimeout.
Motivation
@ -22,71 +22,91 @@ Motivation
Implementation
Storing Build Times
Circuit build times will be stored in the circular array
'circuit_build_times' consisting of uint16_t elements as milliseconds.
The total size of this array will be based on the number of circuits
it takes to converge on a good fit of the long term distribution of
the circuit builds for a fixed link. We do not want this value to be
too large, because it will make it difficult for clients to adapt to
moving between different links.
From our initial observations, this value appears to be on the order
of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
Long Term Storage
The long-term storage representation will be implemented by storing a
histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
writing out the statistics to disk. The format of this histogram on disk
is yet to be finalized, but it will likely be of the format
'CircuitBuildTime <bin> <count>'.
Example:
CircuitBuildTimeBin 1 100
CircuitBuildTimeBin 2 50
...
Reading the histogram in will entail multiplying each bin by the
BUILDTIME_BIN_WIDTH and then inserting <count> values into the
circuit_build_times array each with the value of
<bin>*BUILDTIME_BIN_WIDTH.
Learning the CircuitBuildTimeout
Based on studies of build times, we found that the distribution of
circuit buildtimes appears to be a Pareto distribution. The number
of circuits to observe (ncircuits_to_cutoff) before changing the
CircuitBuildTimeout will be tunable. From out measurements,
ncircuits_to_cuttoff appears to be on the order of 100.
In addition, the total number of circuits gathered
(ncircuits_to_observe) will also be tunable. It is likely that
ncircuits_to_observe will be somewhere on the order of 1000. The values
can be represented compactly in Tor in milliseconds as a circular array
of 16 bit integers. More compact long-term storage representations can
be implemented by simply storing a histogram with 50 millisecond buckets
when writing out the statistics to disk.
circuit buildtimes appears to be a Pareto distribution.
Calculating the preferred CircuitBuildTimeout
We will calculate the parameters for a Pareto distribution
fitting the data using the estimators at
http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
Circuits that have longer buildtimes than some x% of the estimated
CDF of the Pareto distribution will be excluded. x will be tunable
as well.
The timeout itself will be calculated by solving the CDF for the
a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value
represents the percentage of paths the Tor client will accept out of
the total number of paths. We have not yet determined a good
cutoff for this mathematically, but 85% seems a good choice for now.
Circuit timeouts
From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm.
In the event of a timeout, backoff values should include the 100-x%
of expected CDF of timeouts. Also, in the event of network failure,
the observation mechanism should stop collecting timeout data.
When to Begin Calculation
Dropping Failed Guards
The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
changing the CircuitBuildTimeout will be tunable via a #define. From
our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
on the order of 100.
In addition, we have noticed that some entry guards are much more
failure prone than others. In particular, the circuit failure rates for
the fastest entry guards was approximately 20-25%, where as slower
guards exhibit failure rates as high as 45-50%. In [1], it was
demonstrated that failing guard nodes can deliberately bias path
selection to improve their success at capturing traffic. For both these
reasons, failing guards should be avoided.
Dealing with Timeouts
Timeouts should be counted as the expectation of the region of
of the Pareto distribution beyond the cutoff. The proposal will
be updated with this value soon.
Also, in the event of network failure, the observation mechanism
should stop collecting timeout data.
Circuits that timeout will be destroyed, as this indicates one
or more of their respective nodes are currently overloaded.
Client Hints
Some research still needs to be done to provide initial values
for CircuitBuildTimeout based on values learned from modem
users, DSL users, Cable Modem users, and dedicated links. A
radiobutton in Vidalia should eventually be provided that
sets CircuitBuildTimeout to one of these values and also
provide the option of purging all learned data, should any exist.
These values can either be published in the directory, or
shipped hardcoded for a particular Tor version.
We propose increasing the number of entry guards to five, and gathering
circuit failure statistics on each entry guard. Any guards that exceed
the average failure rate of all guards by 10% after we have
gathered ncircuits_to_observe circuits will be replaced.
Issues
Impact on anonymity
Since this follows a Pareto distribution, large reductions on the
timeout can be achieved without cutting off a great number of the
total paths. However, hard statistics on which cutoff percentage
gives optimal performance have not yet been gathered.
Guard Turnover
We contend that the risk from failing guards biasing path selection
outweighs the risk of exposure to larger portions of the network
for the first hop. Furthermore, from our observations, it appears
that circuit failure is strongly correlated to node load. Allowing
clients to migrate away from failing guards should naturally
rebalance the network, and eventually clients should converge on
a stable set of reliable guards. It is also likely that once clients
begin to migrate away from failing guards, their load should go
down, causing their failure rates to drop as well.
[1] http://www.crhc.uiuc.edu/~nikita/papers/relmix-ccs07.pdf
total paths. This will eliminate a great deal of the performance
variation of Tor usage.