Update proposal after feedback from Nick.

svn:r16556
This commit is contained in:
Mike Perry 2008-08-15 04:13:11 +00:00
parent fd05b385e3
commit 95c3f461a6

View File

@ -34,6 +34,9 @@ Implementation
From our initial observations, this value appears to be on the order From our initial observations, this value appears to be on the order
of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE. of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
The exact value for this #define will be determined by performing
goodness of fit tests using measurments obtained from the shufflebt.py
script from TorFlow.
Long Term Storage Long Term Storage
@ -41,17 +44,31 @@ Implementation
histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
writing out the statistics to disk. The format of this histogram on disk writing out the statistics to disk. The format of this histogram on disk
is yet to be finalized, but it will likely be of the format is yet to be finalized, but it will likely be of the format
'CircuitBuildTime <bin> <count>'. 'CircuitBuildTime <bin> <count>', with the total specified as
'TotalBuildTimes <total>'
Example: Example:
CircuitBuildTimeBin 1 100 TotalBuildTimes 100
CircuitBuildTimeBin 2 50 CircuitBuildTimeBin 1 50
CircuitBuildTimeBin 2 25
CircuitBuildTimeBin 3 13
... ...
Reading the histogram in will entail multiplying each bin by the Reading the histogram in will entail multiplying each bin by the
BUILDTIME_BIN_WIDTH and then inserting <count> values into the BUILDTIME_BIN_WIDTH and then inserting <count> values into the
circuit_build_times array each with the value of circuit_build_times array each with the value of
<bin>*BUILDTIME_BIN_WIDTH. <bin>*BUILDTIME_BIN_WIDTH. In order to evenly distribute the
values in the circular array, a form of index skipping must
be employed. Values from bin #N with bin count C and total T
will occupy indexes specified by N+((T/C)*k)-1, where k is the
set of integers ranging from 0 to C-1.
For example, this would mean that the values from bin 1 would
occupy indexes 1+(100/50)*k-1, or 0, 2, 4, 6, 8, 10 and so on.
The values for bin 2 would occupy positions 1, 5, 9, 13. Collisions
will be inserted at the first empty position in the array greater
than the selected index (which may requiring looping around the
array back to index 0).
Learning the CircuitBuildTimeout Learning the CircuitBuildTimeout
@ -71,6 +88,27 @@ Implementation
From http://en.wikipedia.org/wiki/Pareto_distribution#Definition, From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm. the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm.
Testing
After circuit build times, storage, and learning are implemented,
the resulting histogram should be checked for consistency by
verifying it persists across successive Tor invocations where
no circuits are built. In addition, we can also use the existing
buildtime scripts to record build times, and verify that the histogram
the python produces matches that which is output to the state file in Tor,
and verify that the Pareto parameters and cutoff points also match.
Soft timeout vs Hard Timeout
At some point, it may be desirable to change the cutoff from a
single hard cutoff that destroys the circuit to a soft cutoff and
a hard cutoff, where the soft cutoff merely triggers the building
of a new circuit, and the hard cutoff triggers destruction of the
circuit.
Good values for hard and soft cutoffs seem to be 85% and 65%
respectively, but we should eventually justify this with observation.
When to Begin Calculation When to Begin Calculation
The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
@ -87,9 +125,6 @@ Implementation
Also, in the event of network failure, the observation mechanism Also, in the event of network failure, the observation mechanism
should stop collecting timeout data. should stop collecting timeout data.
Circuits that timeout will be destroyed, as this indicates one
or more of their respective nodes are currently overloaded.
Client Hints Client Hints
Some research still needs to be done to provide initial values Some research still needs to be done to provide initial values