mirror of
https://github.com/torproject/torspec.git
synced 2024-11-30 13:20:37 +00:00
861 lines
37 KiB
Plaintext
861 lines
37 KiB
Plaintext
|
|
Tor Path Specification
|
|
|
|
Roger Dingledine
|
|
Nick Mathewson
|
|
|
|
Note: This is an attempt to specify Tor as currently implemented. Future
|
|
versions of Tor will implement improved algorithms.
|
|
|
|
This document tries to cover how Tor chooses to build circuits and assign
|
|
streams to circuits. Other implementations MAY take other approaches, but
|
|
implementors should be aware of the anonymity and load-balancing implications
|
|
of their choices.
|
|
|
|
THIS SPEC ISN'T DONE YET.
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
|
|
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
|
|
"OPTIONAL" in this document are to be interpreted as described in
|
|
RFC 2119.
|
|
|
|
1. General operation
|
|
|
|
Tor begins building circuits as soon as it has enough directory
|
|
information to do so (see section 5 of dir-spec.txt). Some circuits are
|
|
built preemptively because we expect to need them later (for user
|
|
traffic), and some are built because of immediate need (for user traffic
|
|
that no current circuit can handle, for testing the network or our
|
|
reachability, and so on).
|
|
|
|
[Newer versions of Tor (0.2.6.2-alpha and later):
|
|
If the consensus contains Exits (the typical case), Tor will build both
|
|
exit and internal circuits. When bootstrap completes, Tor will be ready
|
|
to handle an application requesting an exit circuit to services like the
|
|
World Wide Web.
|
|
|
|
If the consensus does not contain Exits, Tor will only build internal
|
|
circuits. In this case, earlier statuses will have included "internal"
|
|
as indicated above. When bootstrap completes, Tor will be ready to handle
|
|
an application requesting an internal circuit to hidden services at
|
|
".onion" addresses.
|
|
|
|
If a future consensus contains Exits, exit circuits may become available.]
|
|
|
|
When a client application creates a new stream (by opening a SOCKS
|
|
connection or launching a resolve request), we attach it to an appropriate
|
|
open circuit if one exists, or wait if an appropriate circuit is
|
|
in-progress. We launch a new circuit only
|
|
if no current circuit can handle the request. We rotate circuits over
|
|
time to avoid some profiling attacks.
|
|
|
|
To build a circuit, we choose all the nodes we want to use, and then
|
|
construct the circuit. Sometimes, when we want a circuit that ends at a
|
|
given hop, and we have an appropriate unused circuit, we "cannibalize" the
|
|
existing circuit and extend it to the new terminus.
|
|
|
|
These processes are described in more detail below.
|
|
|
|
This document describes Tor's automatic path selection logic only; path
|
|
selection can be overridden by a controller (with the EXTENDCIRCUIT and
|
|
ATTACHSTREAM commands). Paths constructed through these means may
|
|
violate some constraints given below.
|
|
|
|
1.1. Terminology
|
|
|
|
A "path" is an ordered sequence of nodes, not yet built as a circuit.
|
|
|
|
A "clean" circuit is one that has not yet been used for any traffic.
|
|
|
|
A "fast" or "stable" or "valid" node is one that has the 'Fast' or
|
|
'Stable' or 'Valid' flag
|
|
set respectively, based on our current directory information. A "fast"
|
|
or "stable" circuit is one consisting only of "fast" or "stable" nodes.
|
|
|
|
In an "exit" circuit, the final node is chosen based on waiting stream
|
|
requests if any, and in any case it avoids nodes with exit policy of
|
|
"reject *:*". An "internal" circuit, on the other hand, is one where
|
|
the final node is chosen just like a middle node (ignoring its exit
|
|
policy).
|
|
|
|
A "request" is a client-side stream or DNS resolve that needs to be
|
|
served by a circuit.
|
|
|
|
A "pending" circuit is one that we have started to build, but which has
|
|
not yet completed.
|
|
|
|
A circuit or path "supports" a request if it is okay to use the
|
|
circuit/path to fulfill the request, according to the rules given below.
|
|
A circuit or path "might support" a request if some aspect of the request
|
|
is unknown (usually its target IP), but we believe the path probably
|
|
supports the request according to the rules given below.
|
|
|
|
1.1. A relay's bandwidth
|
|
|
|
Old versions of Tor did not report bandwidths in network status
|
|
documents, so clients had to learn them from the routers' advertised
|
|
relay descriptors.
|
|
|
|
For versions of Tor prior to 0.2.1.17-rc, everywhere below where we
|
|
refer to a relay's "bandwidth", we mean its clipped advertised
|
|
bandwidth, computed by taking the smaller of the 'rate' and
|
|
'observed' arguments to the "bandwidth" element in the relay's
|
|
descriptor. If a router's advertised bandwidth is greater than
|
|
MAX_BELIEVABLE_BANDWIDTH (currently 10 MB/s), we clipped to that
|
|
value.
|
|
|
|
For more recent versions of Tor, we take the bandwidth value declared
|
|
in the consensus, and fall back to the clipped advertised bandwidth
|
|
only if the consensus does not have bandwidths listed.
|
|
|
|
2. Building circuits
|
|
|
|
2.1. When we build
|
|
|
|
2.1.1. Clients build circuits preemptively
|
|
|
|
When running as a client, Tor tries to maintain at least a certain
|
|
number of clean circuits, so that new streams can be handled
|
|
quickly. To increase the likelihood of success, Tor tries to
|
|
predict what circuits will be useful by choosing from among nodes
|
|
that support the ports we have used in the recent past (by default
|
|
one hour). Specifically, on startup Tor tries to maintain one clean
|
|
fast exit circuit that allows connections to port 80, and at least
|
|
two fast clean stable internal circuits in case we get a resolve
|
|
request or hidden service request (at least three if we _run_ a
|
|
hidden service).
|
|
|
|
After that, Tor will adapt the circuits that it preemptively builds
|
|
based on the requests it sees from the user: it tries to have two fast
|
|
clean exit circuits available for every port seen within the past hour
|
|
(each circuit can be adequate for many predicted ports -- it doesn't
|
|
need two separate circuits for each port), and it tries to have the
|
|
above internal circuits available if we've seen resolves or hidden
|
|
service activity within the past hour. If there are 12 or more clean
|
|
circuits open, it doesn't open more even if it has more predictions.
|
|
|
|
Only stable circuits can "cover" a port that is listed in the
|
|
LongLivedPorts config option. Similarly, hidden service requests
|
|
to ports listed in LongLivedPorts make us create stable internal
|
|
circuits.
|
|
|
|
Note that if there are no requests from the user for an hour, Tor
|
|
will predict no use and build no preemptive circuits.
|
|
|
|
The Tor client SHOULD NOT store its list of predicted requests to a
|
|
persistent medium.
|
|
|
|
2.1.2. Clients build circuits on demand
|
|
|
|
Additionally, when a client request exists that no circuit (built or
|
|
pending) might support, we create a new circuit to support the request.
|
|
For exit connections, we pick an exit node that will handle the
|
|
most pending requests (choosing arbitrarily among ties), launch a
|
|
circuit to end there, and repeat until every unattached request
|
|
might be supported by a pending or built circuit. For internal
|
|
circuits, we pick an arbitrary acceptable path, repeating as needed.
|
|
|
|
In some cases we can reuse an already established circuit if it's
|
|
clean; see Section 2.3 (cannibalizing circuits) for details.
|
|
|
|
2.1.3. Relays build circuits for testing reachability and bandwidth
|
|
|
|
Tor relays test reachability of their ORPort once they have
|
|
successfully built a circuit (on startup and whenever their IP address
|
|
changes). They build an ordinary fast internal circuit with themselves
|
|
as the last hop. As soon as any testing circuit succeeds, the Tor
|
|
relay decides it's reachable and is willing to publish a descriptor.
|
|
|
|
We launch multiple testing circuits (one at a time), until we
|
|
have NUM_PARALLEL_TESTING_CIRC (4) such circuits open. Then we
|
|
do a "bandwidth test" by sending a certain number of relay drop
|
|
cells down each circuit: BandwidthRate * 10 / CELL_NETWORK_SIZE
|
|
total cells divided across the four circuits, but never more than
|
|
CIRCWINDOW_START (1000) cells total. This exercises both outgoing and
|
|
incoming bandwidth, and helps to jumpstart the observed bandwidth
|
|
(see dir-spec.txt).
|
|
|
|
Tor relays also test reachability of their DirPort once they have
|
|
established a circuit, but they use an ordinary exit circuit for
|
|
this purpose.
|
|
|
|
2.1.4. Hidden-service circuits
|
|
|
|
See section 4 below.
|
|
|
|
2.1.5. Rate limiting of failed circuits
|
|
|
|
If we fail to build a circuit N times in a X second period (see Section
|
|
2.3 for how this works), we stop building circuits until the X seconds
|
|
have elapsed.
|
|
XXXX
|
|
|
|
2.1.6. When to tear down circuits
|
|
|
|
XXXX
|
|
|
|
|
|
2.2. Path selection and constraints
|
|
|
|
We choose the path for each new circuit before we build it. We choose the
|
|
exit node first, followed by the other nodes in the circuit. All paths
|
|
we generate obey the following constraints:
|
|
- We do not choose the same router twice for the same path.
|
|
- We do not choose any router in the same family as another in the same
|
|
path. (Two routers are in the same family if each one lists the other
|
|
in the "family" entries of its descriptor.)
|
|
- We do not choose more than one router in a given /16 subnet
|
|
(unless EnforceDistinctSubnets is 0).
|
|
- We don't choose any non-running or non-valid router unless we have
|
|
been configured to do so. By default, we are configured to allow
|
|
non-valid routers in "middle" and "rendezvous" positions.
|
|
- If we're using Guard nodes, the first node must be a Guard (see 5
|
|
below)
|
|
- XXXX Choosing the length
|
|
|
|
For "fast" circuits, we only choose nodes with the Fast flag. For
|
|
non-"fast" circuits, all nodes are eligible.
|
|
|
|
For all circuits, we weight node selection according to router bandwidth.
|
|
|
|
We also weight the bandwidth of Exit and Guard flagged nodes depending on
|
|
the fraction of total bandwidth that they make up and depending upon the
|
|
position they are being selected for.
|
|
|
|
These weights are published in the consensus, and are computed as described
|
|
in Section "Computing Bandwidth Weights" of dir-spec.txt. They are:
|
|
|
|
Wgg - Weight for Guard-flagged nodes in the guard position
|
|
Wgm - Weight for non-flagged nodes in the guard Position
|
|
Wgd - Weight for Guard+Exit-flagged nodes in the guard Position
|
|
|
|
Wmg - Weight for Guard-flagged nodes in the middle Position
|
|
Wmm - Weight for non-flagged nodes in the middle Position
|
|
Wme - Weight for Exit-flagged nodes in the middle Position
|
|
Wmd - Weight for Guard+Exit flagged nodes in the middle Position
|
|
|
|
Weg - Weight for Guard flagged nodes in the exit Position
|
|
Wem - Weight for non-flagged nodes in the exit Position
|
|
Wee - Weight for Exit-flagged nodes in the exit Position
|
|
Wed - Weight for Guard+Exit-flagged nodes in the exit Position
|
|
|
|
Wgb - Weight for BEGIN_DIR-supporting Guard-flagged nodes
|
|
Wmb - Weight for BEGIN_DIR-supporting non-flagged nodes
|
|
Web - Weight for BEGIN_DIR-supporting Exit-flagged nodes
|
|
Wdb - Weight for BEGIN_DIR-supporting Guard+Exit-flagged nodes
|
|
|
|
Wbg - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
|
|
Wbm - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
|
|
Wbe - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
|
|
Wbd - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
|
|
|
|
If any of those weights is malformed or not present in a consensus,
|
|
clients proceed with the regular path selection algorithm setting
|
|
the weights to the default value of 10000.
|
|
|
|
Additionally, we may be building circuits with one or more requests in
|
|
mind. Each kind of request puts certain constraints on paths:
|
|
|
|
- All service-side introduction circuits and all rendezvous paths
|
|
should be Stable.
|
|
- All connection requests for connections that we think will need to
|
|
stay open a long time require Stable circuits. Currently, Tor decides
|
|
this by examining the request's target port, and comparing it to a
|
|
list of "long-lived" ports. (Default: 21, 22, 706, 1863, 5050,
|
|
5190, 5222, 5223, 6667, 6697, 8300.)
|
|
- DNS resolves require an exit node whose exit policy is not equivalent
|
|
to "reject *:*".
|
|
- Reverse DNS resolves require a version of Tor with advertised eventdns
|
|
support (available in Tor 0.1.2.1-alpha-dev and later).
|
|
- All connection requests require an exit node whose exit policy
|
|
supports their target address and port (if known), or which "might
|
|
support it" (if the address isn't known). See 2.2.1.
|
|
- Rules for Fast? XXXXX
|
|
|
|
2.2.1. Choosing an exit
|
|
|
|
If we know what IP address we want to connect to or resolve, we can
|
|
trivially tell whether a given router will support it by simulating
|
|
its declared exit policy.
|
|
|
|
Because we often connect to addresses of the form hostname:port, we do not
|
|
always know the target IP address when we select an exit node. In these
|
|
cases, we need to pick an exit node that "might support" connections to a
|
|
given address port with an unknown address. An exit node "might support"
|
|
such a connection if any clause that accepts any connections to that port
|
|
precedes all clauses (if any) that reject all connections to that port.
|
|
|
|
Unless requested to do so by the user, we never choose an exit node
|
|
flagged as "BadExit" by more than half of the authorities who advertise
|
|
themselves as listing bad exits.
|
|
|
|
2.2.2. User configuration
|
|
|
|
Users can alter the default behavior for path selection with configuration
|
|
options.
|
|
|
|
- If "ExitNodes" is provided, then every request requires an exit node on
|
|
the ExitNodes list. (If a request is supported by no nodes on that list,
|
|
and StrictExitNodes is false, then Tor treats that request as if
|
|
ExitNodes were not provided.)
|
|
|
|
- "EntryNodes" and "StrictEntryNodes" behave analogously.
|
|
|
|
- If a user tries to connect to or resolve a hostname of the form
|
|
<target>.<servername>.exit, the request is rewritten to a request for
|
|
<target>, and the request is only supported by the exit whose nickname
|
|
or fingerprint is <servername>.
|
|
|
|
2.3. Cannibalizing circuits
|
|
|
|
If we need a circuit and have a clean one already established, in
|
|
some cases we can adapt the clean circuit for our new
|
|
purpose. Specifically,
|
|
|
|
For hidden service interactions, we can "cannibalize" a clean internal
|
|
circuit if one is available, so we don't need to build those circuits
|
|
from scratch on demand.
|
|
|
|
We can also cannibalize clean circuits when the client asks to exit
|
|
at a given node -- either via the ".exit" notation or because the
|
|
destination is running at the same location as an exit node.
|
|
|
|
2.4. Learning when to give up ("timeout") on circuit construction
|
|
|
|
Since version 0.2.2.8-alpha, Tor attempts to learn when to give up on
|
|
circuits based on network conditions.
|
|
|
|
2.4.1 Distribution choice and parameter estimation
|
|
|
|
Based on studies of build times, we found that the distribution of
|
|
circuit build times appears to be a Frechet distribution. However,
|
|
estimators and quantile functions of the Frechet distribution are
|
|
difficult to work with and slow to converge. So instead, since we
|
|
are only interested in the accuracy of the tail, we approximate
|
|
the tail of the distribution with a Pareto curve.
|
|
|
|
We calculate the parameters for a Pareto distribution fitting the data
|
|
using the estimators in equation 4 from:
|
|
http://portal.acm.org/citation.cfm?id=1647962.1648139
|
|
|
|
This is:
|
|
|
|
alpha_m = s/(ln(U(X)/Xm^n))
|
|
|
|
where s is the total number of completed circuits we have seen, and
|
|
|
|
U(X) = x_max^u * Prod_s{x_i}
|
|
|
|
with x_i as our i-th completed circuit time, x_max as the longest
|
|
completed circuit build time we have yet observed, u as the
|
|
number of unobserved timeouts that have no exact value recorded,
|
|
and n as u+s, the total number of circuits that either timeout or
|
|
complete.
|
|
|
|
Using log laws, we compute this as the sum of logs to avoid
|
|
overflow and ln(1.0+epsilon) precision issues:
|
|
|
|
alpha_m = s/(u*ln(x_max) + Sum_s{ln(x_i)} - n*ln(Xm))
|
|
|
|
This estimator is closely related to the parameters present in:
|
|
http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation
|
|
except they are adjusted to handle the fact that our samples are
|
|
right-censored at the timeout cutoff.
|
|
|
|
Additionally, because this is not a true Pareto distribution, we alter
|
|
how Xm is computed. The Xm parameter is computed as the midpoint of the most
|
|
frequently occurring 50ms histogram bin, until the point where 1000
|
|
circuits are recorded. After this point, the weighted average of the top
|
|
'cbtnummodes' (default: 3) midpoint modes is used as Xm. All times below
|
|
this value are counted as having the midpoint value of this weighted average
|
|
bin.
|
|
|
|
The timeout itself is calculated by using the Pareto Quantile function (the
|
|
inverted CDF) to give us the value on the CDF such that 80% of the mass
|
|
of the distribution is below the timeout value.
|
|
|
|
Thus, we expect that the Tor client will accept the fastest 80% of
|
|
the total number of paths on the network.
|
|
|
|
2.4.2. How much data to record
|
|
|
|
From our observations, the minimum number of circuit build times for a
|
|
reasonable fit appears to be on the order of 100. However, to keep a
|
|
good fit over the long term, we store 1000 most recent circuit build times
|
|
in a circular array.
|
|
|
|
The Tor client should build test circuits at a rate of one per
|
|
minute up until 100 circuits are built. This allows a fresh Tor to have
|
|
a CircuitBuildTimeout estimated within 1.5 hours after install,
|
|
upgrade, or network change (see below).
|
|
|
|
Timeouts are stored on disk in a histogram of 50ms bin width, the same
|
|
width used to calculate the Xm value above. This histogram must be shuffled
|
|
after being read from disk, to preserve a proper expiration of old values
|
|
after restart.
|
|
|
|
2.4.3. How to record timeouts
|
|
|
|
Circuits that pass the timeout threshold should be allowed to continue
|
|
building until a time corresponding to the point 'cbtclosequantile'
|
|
(default 95) on the Pareto curve, or 60 seconds, whichever is greater.
|
|
|
|
The actual completion times for these circuits should be recorded.
|
|
Implementations should completely abandon a circuit and record a value
|
|
as an 'unknown' timeout if the total build time exceeds this threshold.
|
|
|
|
The reason for this is that right-censored pareto estimators begin to lose
|
|
their accuracy if more than approximately 5% of the values are censored.
|
|
Since we wish to set the cutoff at 20%, we must allow circuits to continue
|
|
building past this cutoff point up to the 95th percentile.
|
|
|
|
2.4.4. Detecting Changing Network Conditions
|
|
|
|
We attempt to detect both network connectivity loss and drastic
|
|
changes in the timeout characteristics.
|
|
|
|
We assume that we've had network connectivity loss if a circuit
|
|
times out and we've received no cells or TLS handshakes since that
|
|
circuit began. We then temporarily stop counting timeouts until
|
|
network activity resumes.
|
|
|
|
To detect changing network conditions, we keep a history of
|
|
the timeout or non-timeout status of the past 20 circuits that
|
|
successfully completed at least one hop. If more than 90% of
|
|
these circuits timeout, we discard all buildtimes history, reset
|
|
the timeout to 60, and then begin recomputing the timeout.
|
|
|
|
If the timeout was already 60 or higher, we double the timeout.
|
|
|
|
2.4.5. Consensus parameters governing behavior
|
|
|
|
Clients that implement circuit build timeout learning should obey the
|
|
following consensus parameters that govern behavior, in order to allow
|
|
us to handle bugs or other emergent behaviors due to client circuit
|
|
construction. If these parameters are not present in the consensus,
|
|
the listed default values should be used instead.
|
|
|
|
cbtdisabled
|
|
Default: 0
|
|
Min: 0
|
|
Max: 1
|
|
Effect: If 1, all CircuitBuildTime learning code should be
|
|
disabled and history should be discarded. For use in
|
|
emergency situations only.
|
|
|
|
cbtnummodes
|
|
Default: 3
|
|
Min: 1
|
|
Max: 20
|
|
Effect: This value governs how many modes to use in the weighted
|
|
average calculation of Pareto parameter Xm. A value of 3 introduces
|
|
some bias (2-5% of CDF) under ideal conditions, but allows for better
|
|
performance in the event that a client chooses guard nodes of radically
|
|
different performance characteristics.
|
|
|
|
cbtrecentcount
|
|
Default: 20
|
|
Min: 3
|
|
Max: 1000
|
|
Effect: This is the number of circuit build times to keep track of
|
|
for the following option.
|
|
|
|
cbtmaxtimeouts
|
|
Default: 18
|
|
Min: 3
|
|
Max: 10000
|
|
Effect: When this many timeouts happen in the last 'cbtrecentcount'
|
|
circuit attempts, the client should discard all of its
|
|
history and begin learning a fresh timeout value.
|
|
|
|
cbtmincircs
|
|
Default: 100
|
|
Min: 1
|
|
Max: 10000
|
|
Effect: This is the minimum number of circuits to build before
|
|
computing a timeout.
|
|
|
|
cbtquantile
|
|
Default: 80
|
|
Min: 10
|
|
Max: 99
|
|
Effect: This is the position on the quantile curve to use to set the
|
|
timeout value. It is a percent (10-99).
|
|
|
|
cbtclosequantile
|
|
Default: 95
|
|
Min: Value of cbtquantile parameter
|
|
Max: 99
|
|
Effect: This is the position on the quantile curve to use to set the
|
|
timeout value to use to actually close circuits. It is a
|
|
percent (0-99).
|
|
|
|
cbttestfreq
|
|
Default: 60
|
|
Min: 1
|
|
Max: 2147483647 (INT32_MAX)
|
|
Effect: Describes how often in seconds to build a test circuit to
|
|
gather timeout values. Only applies if less than 'cbtmincircs'
|
|
have been recorded.
|
|
|
|
cbtmintimeout
|
|
Default: 2000
|
|
Min: 500
|
|
Max: 2147483647 (INT32_MAX)
|
|
Effect: This is the minimum allowed timeout value in milliseconds.
|
|
The minimum is to prevent rounding to 0 (we only check once
|
|
per second).
|
|
|
|
cbtinitialtimeout
|
|
Default: 60000
|
|
Min: Value of cbtmintimeout
|
|
Max: 2147483647 (INT32_MAX)
|
|
Effect: This is the timeout value to use before computing a timeout,
|
|
in milliseconds.
|
|
|
|
|
|
2.5. Handling failure
|
|
|
|
If an attempt to extend a circuit fails (either because the first create
|
|
failed or a subsequent extend failed) then the circuit is torn down and is
|
|
no longer pending. (XXXX really?) Requests that might have been
|
|
supported by the pending circuit thus become unsupported, and a new
|
|
circuit needs to be constructed.
|
|
|
|
If a stream "begin" attempt fails with an EXITPOLICY error, we
|
|
decide that the exit node's exit policy is not correctly advertised,
|
|
so we treat the exit node as if it were a non-exit until we retrieve
|
|
a fresh descriptor for it.
|
|
|
|
Excessive amounts of either type of failure can indicate an
|
|
attack on anonymity. See section 7 for how excessive failure is handled.
|
|
|
|
3. Attaching streams to circuits
|
|
|
|
When a circuit that might support a request is built, Tor tries to attach
|
|
the request's stream to the circuit and sends a BEGIN, BEGIN_DIR,
|
|
or RESOLVE relay
|
|
cell as appropriate. If the request completes unsuccessfully, Tor
|
|
considers the reason given in the CLOSE relay cell. [XXX yes, and?]
|
|
|
|
|
|
After a request has remained unattached for SocksTimeout (2 minutes
|
|
by default), Tor abandons the attempt and signals an error to the
|
|
client as appropriate (e.g., by closing the SOCKS connection).
|
|
|
|
XXX Timeouts and when Tor auto-retries.
|
|
* What stream-end-reasons are appropriate for retrying.
|
|
|
|
If no reply to BEGIN/RESOLVE, then the stream will timeout and fail.
|
|
|
|
4. Hidden-service related circuits
|
|
|
|
XXX Tracking expected hidden service use (client-side and hidserv-side)
|
|
|
|
5. Guard nodes
|
|
|
|
We use Guard nodes (also called "helper nodes" in the literature) to
|
|
prevent certain profiling attacks. Here's the risk: if we choose entry and
|
|
exit nodes at random, and an attacker controls C out of N relays
|
|
(ignoring bandwidth), then the
|
|
attacker will control the entry and exit node of any given circuit with
|
|
probability (C/N)^2. But as we make many different circuits over time,
|
|
then the probability that the attacker will see a sample of about (C/N)^2
|
|
of our traffic goes to 1. Since statistical sampling works, the attacker
|
|
can be sure of learning a profile of our behavior.
|
|
|
|
If, on the other hand, we picked an entry node and held it fixed, we would
|
|
have probability C/N of choosing a bad entry and being profiled, and
|
|
probability (N-C)/N of choosing a good entry and not being profiled.
|
|
|
|
When guard nodes are enabled, Tor maintains an ordered list of entry nodes
|
|
as our chosen guards, and stores this list persistently to disk. If a Guard
|
|
node becomes unusable, rather than replacing it, Tor adds new guards to the
|
|
end of the list. When choosing the first hop of a circuit, Tor
|
|
chooses at
|
|
random from among the first NumEntryGuards (default 3) usable guards on the
|
|
list. If there are not at least 2 usable guards on the list, Tor adds
|
|
routers until there are, or until there are no more usable routers to add.
|
|
|
|
A guard is unusable if any of the following hold:
|
|
- it is not marked as a Guard by the networkstatuses,
|
|
- it is not marked Valid (and the user hasn't set AllowInvalid entry)
|
|
- it is not marked Running
|
|
- Tor couldn't reach it the last time it tried to connect
|
|
|
|
A guard is unusable for a particular circuit if any of the rules for path
|
|
selection in 2.2 are not met. In particular, if the circuit is "fast"
|
|
and the guard is not Fast, or if the circuit is "stable" and the guard is
|
|
not Stable, or if the guard has already been chosen as the exit node in
|
|
that circuit, Tor can't use it as a guard node for that circuit.
|
|
|
|
If the guard is excluded because of its status in the networkstatuses for
|
|
over 30 days, Tor removes it from the list entirely, preserving order.
|
|
|
|
If Tor fails to connect to an otherwise usable guard, it retries
|
|
periodically: every hour for six hours, every 4 hours for 3 days, every
|
|
18 hours for a week, and every 36 hours thereafter. Additionally, Tor
|
|
retries unreachable guards the first time it adds a new guard to the list,
|
|
since it is possible that the old guards were only marked as unreachable
|
|
because the network was unreachable or down.
|
|
|
|
Tor does not add a guard persistently to the list until the first time we
|
|
have connected to it successfully.
|
|
|
|
6. Router descriptor purposes
|
|
|
|
There are currently three "purposes" supported for router descriptors:
|
|
general, controller, and bridge. Most descriptors are of type general
|
|
-- these are the ones listed in the consensus, and the ones fetched
|
|
and used in normal cases.
|
|
|
|
Controller-purpose descriptors are those delivered by the controller
|
|
and labelled as such: they will be kept around (and expire like
|
|
normal descriptors), and they can be used by the controller in its
|
|
CIRCUITEXTEND commands. Otherwise they are ignored by Tor when it
|
|
chooses paths.
|
|
|
|
Bridge-purpose descriptors are for routers that are used as bridges. See
|
|
doc/design-paper/blocking.pdf for more design explanation, or proposal
|
|
125 for specific details. Currently bridge descriptors are used in place
|
|
of normal entry guards, for Tor clients that have UseBridges enabled.
|
|
|
|
7. Detecting route manipulation by Guard nodes (Path Bias)
|
|
|
|
The Path Bias defense is designed to defend against a type of route
|
|
capture where malicious Guard nodes deliberately fail or choke circuits
|
|
that extend to non-colluding Exit nodes to maximize their network
|
|
utilization in favor of carrying only compromised traffic.
|
|
|
|
In the extreme, the attack allows an adversary that carries c/n
|
|
of the network capacity to deanonymize c/n of the network
|
|
connections, breaking the O((c/n)^2) property of Tor's original
|
|
threat model. It also allows targeted attacks aimed at monitoring
|
|
the activity of specific users, bridges, or Guard nodes.
|
|
|
|
There are two points where path selection can be manipulated:
|
|
during construction, and during usage. Circuit construction
|
|
can be manipulated by inducing circuit failures during circuit
|
|
extend steps, which causes the Tor client to transparently retry
|
|
the circuit construction with a new path. Circuit usage can be
|
|
manipulated by abusing the stream retry features of Tor (for
|
|
example by withholding stream attempt responses from the client
|
|
until the stream timeout has expired), at which point the tor client
|
|
will also transparently retry the stream on a new path.
|
|
|
|
The defense as deployed therefore makes two independent sets of
|
|
measurements of successful path use: one during circuit construction,
|
|
and one during circuit usage.
|
|
|
|
The intended behavior is for clients to ultimately disable the use
|
|
of Guards responsible for excessive circuit failure of either type
|
|
(see section 7.4); however known issues with the Tor network currently
|
|
restrict the defense to being informational only at this stage (see
|
|
section 7.5).
|
|
|
|
7.1. Measuring path construction success rates
|
|
|
|
Clients maintain two counts for each of their guards: a count of the
|
|
number of times a circuit was extended to at least two hops through that
|
|
guard, and a count of the number of circuits that successfully complete
|
|
through that guard. The ratio of these two numbers is used to determine
|
|
a circuit success rate for that Guard.
|
|
|
|
Circuit build timeouts are counted as construction failures if the
|
|
circuit fails to complete before the 95% "right-censored" timeout
|
|
interval, not the 80% timeout condition (see section 2.4).
|
|
|
|
If a circuit closes prematurely after construction but before being
|
|
requested to close by the client, this is counted as a failure.
|
|
|
|
7.2. Measuring path usage success rates
|
|
|
|
Clients maintain two usage counts for each of their guards: a count
|
|
of the number of usage attempts, and a count of the number of
|
|
successful usages.
|
|
|
|
A usage attempt means any attempt to attach a stream to a circuit.
|
|
|
|
Usage success status is temporarily recorded by state flags on circuits.
|
|
Guard usage success counts are not incremented until circuit close. A
|
|
circuit is marked as successfully used if we receive a properly
|
|
recognized RELAY cell on that circuit that was expected for the current
|
|
circuit purpose.
|
|
|
|
If subsequent stream attachments fail or time out, the successfully used
|
|
state of the circuit is cleared, causing it once again to be regarded
|
|
as a usage attempt only.
|
|
|
|
Upon close by the client, all circuits that are still marked as usage
|
|
attempts are probed using a RELAY_BEGIN cell constructed with a
|
|
destination of the form 0.a.b.c:25, where a.b.c is a 24 bit random
|
|
nonce. If we get a RELAY_COMMAND_END in response matching our nonce,
|
|
the circuit is counted as successfully used.
|
|
|
|
If any unrecognized RELAY cells arrive after the probe has been sent,
|
|
the circuit is counted as a usage failure.
|
|
|
|
If the stream failure reason codes DESTROY, TORPROTOCOL, or INTERNAL
|
|
are received in response to any stream attempt, such circuits are not
|
|
probed and are declared usage failures.
|
|
|
|
Prematurely closed circuits are not probed, and are counted as usage
|
|
failures.
|
|
|
|
7.3. Scaling success counts
|
|
|
|
To provide a moving average of recent Guard activity while
|
|
still preserving the ability to verify correctness, we periodically
|
|
"scale" the success counts by multiplying them by a scale factor
|
|
between 0 and 1.0.
|
|
|
|
Scaling is performed when either usage or construction attempt counts
|
|
exceed a parametrized value.
|
|
|
|
To avoid error due to scaling during circuit construction and use,
|
|
currently open circuits are subtracted from the usage counts before
|
|
scaling, and added back after scaling.
|
|
|
|
7.4. Parametrization
|
|
|
|
The following consensus parameters tune various aspects of the
|
|
defense.
|
|
|
|
pb_mincircs
|
|
Default: 150
|
|
Min: 5
|
|
Effect: This is the minimum number of circuits that must complete
|
|
at least 2 hops before we begin evaluating construction rates.
|
|
|
|
|
|
pb_noticepct
|
|
Default: 70
|
|
Min: 0
|
|
Max: 100
|
|
Effect: If the circuit success rate falls below this percentage,
|
|
we emit a notice log message.
|
|
|
|
pb_warnpct
|
|
Default: 50
|
|
Min: 0
|
|
Max: 100
|
|
Effect: If the circuit success rate falls below this percentage,
|
|
we emit a warn log message.
|
|
|
|
pb_extremepct
|
|
Default: 30
|
|
Min: 0
|
|
Max: 100
|
|
Effect: If the circuit success rate falls below this percentage,
|
|
we emit a more alarmist warning log message. If
|
|
pb_dropguard is set to 1, we also disable the use of the
|
|
guard.
|
|
|
|
pb_dropguards
|
|
Default: 0
|
|
Min: 0
|
|
Max: 1
|
|
Effect: If the circuit success rate falls below pb_extremepct,
|
|
when pb_dropguard is set to 1, we disable use of that
|
|
guard.
|
|
|
|
pb_scalecircs
|
|
Default: 300
|
|
Min: 10
|
|
Effect: After this many circuits have completed at least two hops,
|
|
Tor performs the scaling described in Section 7.3.
|
|
|
|
pb_multfactor and pb_scalefactor
|
|
Default: 1/2
|
|
Min: 0.0
|
|
Max: 1.0
|
|
Effect: The double-precision result obtained from
|
|
pb_multfactor/pb_scalefactor is multiplied by our current
|
|
counts to scale them.
|
|
|
|
pb_minuse
|
|
Default: 20
|
|
Min: 3
|
|
Effect: This is the minimum number of circuits that we must attempt to
|
|
use before we begin evaluating construction rates.
|
|
|
|
pb_noticeusepct
|
|
Default: 80
|
|
Min: 3
|
|
Effect: If the circuit usage success rate falls below this percentage,
|
|
we emit a notice log message.
|
|
|
|
pb_extremeusepct
|
|
Default: 60
|
|
Min: 3
|
|
Effect: If the circuit usage success rate falls below this percentage,
|
|
we emit a warning log message. We also disable the use of the
|
|
guard if pb_dropguards is set.
|
|
|
|
pb_scaleuse
|
|
Default: 100
|
|
Min: 10
|
|
Effect: After we have attempted to use this many circuits,
|
|
Tor performs the scaling described in Section 7.3.
|
|
|
|
7.5. Known barriers to enforcement
|
|
|
|
Due to intermittent CPU overload at relays, the normal rate of
|
|
successful circuit completion is highly variable. The Guard-dropping
|
|
version of the defense is unlikely to be deployed until the ntor
|
|
circuit handshake is enabled, or the nature of CPU overload induced
|
|
failure is better understood.
|
|
|
|
|
|
|
|
X. Old notes
|
|
|
|
X.1. Do we actually do this?
|
|
|
|
How to deal with network down.
|
|
- While all helpers are down/unreachable and there are no established
|
|
or on-the-way testing circuits, launch a testing circuit. (Do this
|
|
periodically in the same way we try to establish normal circuits
|
|
when things are working normally.)
|
|
(Testing circuits are a special type of circuit, that streams won't
|
|
attach to by accident.)
|
|
- When a testing circuit succeeds, mark all helpers up and hold
|
|
the testing circuit open.
|
|
- If a connection to a helper succeeds, close all testing circuits.
|
|
Else mark that helper down and try another.
|
|
- If the last helper is marked down and we already have a testing
|
|
circuit established, then add the first hop of that testing circuit
|
|
to the end of our helper node list, close that testing circuit,
|
|
and go back to square one. (Actually, rather than closing the
|
|
testing circuit, can we get away with converting it to a normal
|
|
circuit and beginning to use it immediately?)
|
|
|
|
[Do we actually do any of the above? If so, let's spec it. If not, let's
|
|
remove it. -NM]
|
|
|
|
X.2. A thing we could do to deal with reachability.
|
|
|
|
And as a bonus, it leads to an answer to Nick's attack ("If I pick
|
|
my helper nodes all on 18.0.0.0:*, then I move, you'll know where I
|
|
bootstrapped") -- the answer is to pick your original three helper nodes
|
|
without regard for reachability. Then the above algorithm will add some
|
|
more that are reachable for you, and if you move somewhere, it's more
|
|
likely (though not certain) that some of the originals will become useful.
|
|
Is that smart or just complex?
|
|
|
|
X.3. Some stuff that worries me about entry guards. 2006 Jun, Nickm.
|
|
|
|
It is unlikely for two users to have the same set of entry guards.
|
|
Observing a user is sufficient to learn its entry guards. So, as we move
|
|
around, entry guards make us linkable. If we want to change guards when
|
|
our location (IP? subnet?) changes, we have two bad options. We could
|
|
- Drop the old guards. But if we go back to our old location,
|
|
we'll not use our old guards. For a laptop that sometimes gets used
|
|
from work and sometimes from home, this is pretty fatal.
|
|
- Remember the old guards as associated with the old location, and use
|
|
them again if we ever go back to the old location. This would be
|
|
nasty, since it would force us to record where we've been.
|
|
|
|
[Do we do any of this now? If not, this should move into 099-misc or
|
|
098-todo. -NM]
|