Update proposal to match implementation.

2024-12-14 05:58:33 +00:00 · 2009-09-16 17:03:54 -07:00 · 2009-09-16 17:03:54 -07:00 · 84e82f5cc0
commit 84e82f5cc0
parent 02b36f5bee
1 changed files with 42 additions and 43 deletions
--- a/proposals/151-path-selection-improvements.txt
+++ b/proposals/151-path-selection-improvements.txt
@ -20,7 +20,7 @@ Motivation
 Implementation
-  Storing Build Times
+  Gathering Build Times
    Circuit build times are stored in the circular array
    'circuit_build_times' consisting of uint32_t elements as milliseconds.
@ -30,8 +30,16 @@ Implementation
    too large, because it will make it difficult for clients to adapt to
    moving between different links.
-    From our observations, this value appears to be on the order of 1000,
+    From our observations, the minimum value for a reasonable fit appears
-    but is configurable in a #define NCIRCUITS_TO_OBSERVE.
+    to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
    a good fit over the long term, we store 5000 most recent circuits in
    the array (NCIRCUITS_TO_OBSERVE).
    The Tor client will build test circuits at a rate of one per
    minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
    MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
    a CircuitBuildTimeout estimated within 8 hours after install,
    upgrade, or network change (see below).
  Long Term Storage
@ -43,9 +51,9 @@ Implementation
    Example:
    TotalBuildTimes 100
-    CircuitBuildTimeBin 0 50
+    CircuitBuildTimeBin 25 50
-    CircuitBuildTimeBin 50 25
+    CircuitBuildTimeBin 75 25
-    CircuitBuildTimeBin 100 13
+    CircuitBuildTimeBin 125 13
    ...
    Reading the histogram in will entail inserting <count> values
@ -57,7 +65,12 @@ Implementation
  Learning the CircuitBuildTimeout
    Based on studies of build times, we found that the distribution of
-    circuit buildtimes appears to be a Pareto distribution.
+    circuit buildtimes appears to be a Frechet distribution. However,
    estimators and quantile functions of the Frechet distribution are
    difficult to work with and slow to converge. So instead, since we
    are only interested in the accuracy of the tail, we approximate
    the tail of the distribution with a Pareto curve starting at
    the mode of the circuit build time sample set.
    We will calculate the parameters for a Pareto distribution
    fitting the data using the estimators at
@ -73,11 +86,8 @@ Implementation
  Detecting Changing Network Conditions
-    We attempt to detect both network connectivty loss and drastic
+    We attempt to detect both network connectivity loss and drastic
-    changes in the timeout characteristics. Network connectivity loss
+    changes in the timeout characteristics.
    is detected by recording a timestamp every time Tor either completes
    a TLS connection or receives a cell. If this timestamp is more than
    90 seconds in the past, circuit timeouts are no longer counted.
    If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
    RECENT_CIRCUITS (20) time out, we assume the network connection
@ -86,6 +96,11 @@ Implementation
    position on the Pareto Quartile function for the ratio of
    timeouts.
    Network connectivity loss is detected by recording a timestamp every
    time Tor either completes a TLS connection or receives a cell. If
    this timestamp is more than CircuitBuildTimeout*RECENT_CIRCUITS/3
    seconds in the past, circuit timeouts are no longer counted.
  Testing
    After circuit build times, storage, and learning are implemented,
@ -96,7 +111,18 @@ Implementation
    the python produces matches that which is output to the state file in Tor,
    and verify that the Pareto parameters and cutoff points also match.
-  Soft timeout vs Hard Timeout
+    We will also verify that there are no unexpected large deviations from
    node selection, such as nodes from distant geographical locations being
    completely excluded.
  Dealing with Timeouts
    Timeouts should be counted as the expectation of the region of
    of the Pareto distribution beyond the cutoff. This is done by
    generating a random sample for each timeout at points on the
    curve beyond the current timeout cutoff.
  Future Work
    At some point, it may be desirable to change the cutoff from a
    single hard cutoff that destroys the circuit to a soft cutoff and
@ -104,36 +130,9 @@ Implementation
    of a new circuit, and the hard cutoff triggers destruction of the
    circuit.
-    Good values for hard and soft cutoffs seem to be 80% and 60%
+    It may also be beneficial to learn separate timeouts for each
-    respectively, but we should eventually justify this with observation.
+    guard node, as they will have slightly different distributions.
-
+    This will take longer to generate initial values though.
  When to Begin Calculation
    The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
    changing the CircuitBuildTimeout will be tunable via a #define. From
    our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
    on the order of 100.
  Dealing with Timeouts
    Timeouts should be counted as the expectation of the region of
    of the Pareto distribution beyond the cutoff. The proposal will
    be updated with this value soon.
    Also, in the event of network failure, the observation mechanism
    should stop collecting timeout data.
  Client Hints
    Some research still needs to be done to provide initial values
    for CircuitBuildTimeout based on values learned from modem
    users, DSL users, Cable Modem users, and dedicated links. A
    radiobutton in Vidalia should eventually be provided that
    sets CircuitBuildTimeout to one of these values and also
    provide the option of purging all learned data, should any exist.
    These values can either be published in the directory, or
    shipped hardcoded for a particular Tor version.
 Issues