mirror of
https://github.com/torproject/torspec.git
synced 2024-12-12 04:35:37 +00:00
510 lines
21 KiB
Plaintext
510 lines
21 KiB
Plaintext
Filename: 268-guard-selection.txt
|
|
Title: New Guard Selection Behaviour
|
|
Author: Isis Lovecruft, George Kadianakis, [Ola Bini]
|
|
Created: 2015-10-28
|
|
Status: Obsolete
|
|
|
|
(Editorial note: this was origianlly written as a revision of
|
|
proposal 259, but it diverges so substantially that it seemed
|
|
better to assign it a new number for reference, so that we
|
|
aren't always talking about "The old 259" and "the new 259". -NM)
|
|
|
|
This proposal has been obsoleted by proposal #271.
|
|
|
|
§1. Overview
|
|
|
|
Tor uses entry guards to prevent an attacker who controls some
|
|
fraction of the network from observing a fraction of every user's
|
|
traffic. If users chose their entries and exits uniformly at
|
|
random from the list of servers every time they build a circuit,
|
|
then an adversary who had (k/N) of the network would deanonymize
|
|
F=(k/N)^2 of all circuits... and after a given user had built C
|
|
circuits, the attacker would see them at least once with
|
|
probability 1-(1-F)^C. With large C, the attacker would get a
|
|
sample of every user's traffic with probability 1.
|
|
|
|
To prevent this from happening, Tor clients choose a small number of
|
|
guard nodes (currently 3). These guard nodes are the only nodes
|
|
that the client will connect to directly. If they are not
|
|
compromised, the user's paths are not compromised.
|
|
|
|
But attacks remain. Consider an attacker who can run a firewall
|
|
between a target user and the Tor network, and make
|
|
many of the guards they don't control appear to be unreachable.
|
|
Or consider an attacker who can identify a user's guards, and mount
|
|
denial-of-service attacks on them until the user picks a guard
|
|
that the attacker controls.
|
|
|
|
In the presence of these attacks, we can't continue to connect to
|
|
the Tor network unconditionally. Doing so would eventually result
|
|
in the user choosing a hostile node as their guard, and losing
|
|
anonymity.
|
|
|
|
This proposal outlines a new entry guard selection algorithm, which
|
|
addresses the following concerns:
|
|
|
|
- Heuristics and algorithms for determining how and which guard(s)
|
|
is(/are) chosen should be kept as simple and easy to understand
|
|
as possible.
|
|
|
|
- Clients in censored regions or who are behind a fascist firewall
|
|
who connect to the Tor network should not experience any
|
|
significant disadvantage in terms of reachability or usability.
|
|
|
|
- Tor should make a best attempt at discovering the most
|
|
appropriate behaviour, with as little user input and
|
|
configuration as possible.
|
|
|
|
|
|
§2. Design
|
|
|
|
Alice, an OP attempting to connect to the Tor network, should
|
|
undertake the following steps to determine information about the
|
|
local network and to select (some) appropriate entry guards. In the
|
|
following scenario, it is assumed that Alice has already obtained a
|
|
recent, valid, and verifiable consensus document.
|
|
|
|
The algorithm is divided into four components such that the full
|
|
algorithm is implemented by first invoking START, then repeatedly
|
|
calling NEXT while adviced it SHOULD_CONTINUE and finally calling
|
|
END. For an example usage see §A. Appendix.
|
|
|
|
Several components of NEXT can be invoked asynchronously. SHOULD_CONTINUE
|
|
is used for the algorithm to be able to tell the caller whether we
|
|
consider the work done or not - this can be used to retry primary
|
|
guards when we finally are able to connect to a guard after a long
|
|
network outage, for example.
|
|
|
|
This algorithm keeps track of the unreachability status for guards
|
|
in state global to the system, so that repeated runs will not have
|
|
to rediscover unreachability over and over again. However, this
|
|
state does not need to be persisted permanently - it is purely an
|
|
optimization.
|
|
|
|
The algorithm expects several arguments to guide its behavior. These
|
|
will be defined in §2.1.
|
|
|
|
The goal of this algorithm is to strongly prefer connecting to the
|
|
same guards we have connected to before, while also trying to detect
|
|
conditions such as a network outage. The way it does this is by keeping
|
|
track of how many guards we have exposed ourselves to, and if we have
|
|
connected to too many we will fall back to only retrying the ones we have
|
|
already tried. The algorithm also decides on sample set that should
|
|
be persisted - in order to minimize the risk of an attacker forcing
|
|
enumeration of the whole network by triggering rebuilding of
|
|
circuits.
|
|
|
|
|
|
§2.1. Definitions
|
|
|
|
Bad guard: a guard is considered bad if it conforms with the function IS_BAD
|
|
(see §G. Appendix for details).
|
|
|
|
Dead guard: a guard is considered dead if it conforms with the function
|
|
IS_DEAD (see §H. Appendix for details).
|
|
|
|
Obsolete guard: a guard is considered obsolete if it conforms with the
|
|
function IS_OBSOLETE (see §I. Appendix for details).
|
|
|
|
Live entry guard: a guard is considered live if it conforms with the function
|
|
IS_LIVE (see §D. Appendix for details).
|
|
|
|
§2.1. The START algorithm
|
|
|
|
In order to start choosing an entry guard, use the START
|
|
algorithm. This takes four arguments that can be used to fine tune
|
|
the workings:
|
|
|
|
USED_GUARDS
|
|
This is a list that contains all the guards that have been used
|
|
before by this client. We will prioritize using guards from this
|
|
list in order to minimize our exposure. The list is expected to
|
|
be sorted based on priority, where the first entry will have the
|
|
highest priority.
|
|
|
|
SAMPLED_GUARDS
|
|
This is a set that contains all guards that should be considered
|
|
for connection. This set should be persisted between runs. It
|
|
should be filled by using NEXT_BY_BANDWIDTH with GUARDS as an
|
|
argument if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD
|
|
guards after winnowing out older guards.
|
|
|
|
N_PRIMARY_GUARDS
|
|
The number of guards we should consider our primary
|
|
guards. These guards will be retried more frequently and will
|
|
take precedence in most situations. By default the primary
|
|
guards will be the first N_PRIMARY_GUARDS guards from USED_GUARDS.
|
|
When the algorith is used in constrained mode (have bridges or entry
|
|
nodes in the configuration file), this value should be 1 otherwise the
|
|
proposed value is 3.
|
|
|
|
DIR
|
|
If this argument is set, we should only consider guards that can
|
|
be directory guards. If not set, we will consider all guards.
|
|
|
|
The primary work of START is to initialize the state machine depicted
|
|
in §2.2. The initial state of the machine is defined by:
|
|
|
|
GUARDS
|
|
This is a set of all guards from the consensus. It will primarily be used
|
|
to fill in SAMPLED_GUARDS
|
|
|
|
FILTERED_SAMPLED
|
|
This is a set that contains all guards that we are willing to connect to.
|
|
It will be obtained from calling FILTER_SET with SAMPLED_GUARDS as
|
|
argument.
|
|
|
|
REMAINING_GUARDS
|
|
This is a running set of the guards we have not yet tried to connect to.
|
|
It should be initialized to be FILTERED_SAMPLED without USED_GUARDS.
|
|
|
|
STATE
|
|
A variable that keeps track of which state in the state
|
|
machine we are currently in. It should be initialized to
|
|
STATE_PRIMARY_GUARDS.
|
|
|
|
PRIMARY_GUARDS
|
|
This list keeps track of our primary guards. These are guards
|
|
that we will prioritize when trying to connect, and will also
|
|
retry more often in case of failure with other guards.
|
|
It should be initialized by calling algorithm
|
|
NEXT_PRIMARY_GUARD repeatedly until PRIMARY_GUARDS contains
|
|
N_PRIMARY_GUARDS elements.
|
|
|
|
|
|
§2.2. The NEXT algorithm
|
|
|
|
The NEXT algorithm is composed of several different possibly flows. The
|
|
first one is a simple state machine that can transfer between two
|
|
different states. Every time NEXT is invoked, it will resume at the
|
|
state where it left off previously. In the course of selecting an
|
|
entry guard, a new consensus can arrive. When that happens we need
|
|
to update the data structures used, but nothing else should change.
|
|
|
|
Before jumping in to the state machine, we should first check if it
|
|
was at least PRIMARY_GUARDS_RETRY_INTERVAL minutes since we tried
|
|
any of the PRIMARY_GUARDS. If this is the case, and we are not in
|
|
STATE_PRIMARY_GUARDS, we should save the previous state and set the
|
|
state to STATE_PRIMARY_GUARDS.
|
|
|
|
|
|
§2.2.1. The STATE_PRIMARY_GUARDS state
|
|
|
|
Return each entry in PRIMARY_GUARDS in turn. For each entry, if the
|
|
guard should be retried and considered suitable use it. A guard is
|
|
considered to eligible to retry if is marked for retry or is live
|
|
and id not bad. Also, a guard is considered to be suitable if is
|
|
live and, if is a directory it should not be a cache.
|
|
|
|
If all entries have been tried transition to STATE_TRY_REMAINING.
|
|
|
|
§2.2.2. The STATE_TRY_REMAINING state
|
|
|
|
Return each entry in USED_GUARDS that is not in PRIMARY_GUARDS in
|
|
turn.For each entry, if a guard is found return it.
|
|
|
|
Return each entry from REMAINING_GUARDS in turn.
|
|
For each entry, if the guard should be retried and considered
|
|
suitable use it and mark it as unreachable. A guard is
|
|
considered to eligible to retry if is marked for retry or is live
|
|
and id not bad. Also, a guard is considered to be suitable if is
|
|
live and, if is a directory it should not be a cache.
|
|
|
|
If no entries remain in REMAINING_GUARDS, transition to
|
|
STATE_PRIMARY_GUARDS.
|
|
|
|
|
|
§2.2.3. ON_NEW_CONSENSUS
|
|
|
|
First, ensure that all guard profiles are updated with information
|
|
about whether they were in the newest consensus or not.
|
|
|
|
Update the bad status for all guards in USED_GUARDS and SAMPLED_GUARDS.
|
|
Remove all dead guards from USED_GUARDS and SAMPLED_GUARDS.
|
|
Remove all obsolete guards from USED_GUARDS and SAMPLED_GUARDS.
|
|
|
|
§2.3. The SHOULD_CONTINUE algorithm
|
|
|
|
This algorithm takes as an argument a boolean indicating whether the
|
|
circuit was successfully built or not.
|
|
|
|
After the caller have tried to build a circuit with a returned
|
|
guard, they should invoke SHOULD_CONTINUE to understand if the
|
|
algorithm is finished or not. SHOULD_CONTINUE will always return
|
|
true if the circuit failed. If the circuit succeeded,
|
|
SHOULD_CONTINUE will always return false, unless the guard that
|
|
succeeded was the first guard to succeed after
|
|
INTERNET_LIKELY_DOWN_INTERVAL minutes - in that case it will set the
|
|
state to STATE_PRIMARY_GUARDS and return true.
|
|
|
|
|
|
§2.4. The END algorithm
|
|
|
|
The goal of this algorithm is simply to make sure that we keep track
|
|
of successful connections made. This algorithm should be invoked
|
|
with the guard that was used to correctly set up a circuit.
|
|
|
|
Once invoked, this algorithm will mark the guard as used, and make
|
|
sure it is in USED_GUARDS, by adding it at the end if it was not there.
|
|
|
|
|
|
§2.5. Helper algorithms
|
|
|
|
These algorithms are used in the above algorithms, but have been
|
|
separated out here in order to make the flow clearer.
|
|
|
|
NEXT_PRIMARY_GUARD
|
|
- Return the first entry from USED_GUARDS that is not in
|
|
PRIMARY_GUARDS and that is in the most recent consensus.
|
|
- If USED_GUARDS is empty, use NEXT_BY_BANDWIDTH with
|
|
REMAINING_GUARDS as the argument.
|
|
|
|
NEXT_BY_BANDWIDTH
|
|
- Takes G as an argument, which should be a set of guards to
|
|
choose from.
|
|
- Return a randomly select element from G, weighted by bandwidth.
|
|
|
|
FILTER_SET
|
|
- Takes G as an argument, which should be a set of guards to filter.
|
|
- Filter out guards in G that don't comply with IS_LIVE (see
|
|
§D. Appendix for details).
|
|
- If the filtered set is smaller than MINIMUM_FILTERED_SAMPLE_SIZE and G
|
|
is smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, expand G and try to
|
|
filter out again. G is expanded by adding one new guard at a time using
|
|
NEXT_BY_BANDWIDTH with GUARDS as an argument.
|
|
- If G is not smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, G should not be
|
|
expanded. Abort execution of this function by returning null and report
|
|
an error to the user.
|
|
|
|
|
|
§3. Consensus Parameters, & Configurable Variables
|
|
|
|
This proposal introduces several new parameters that ideally should
|
|
be set in the consensus but that should also be possible to
|
|
set or override in the client configuration file. Some of these have
|
|
proposed values, but for others more simulation and trial needs to
|
|
happen.
|
|
|
|
PRIMARY_GUARDS_RETRY_INTERVAL
|
|
In order to make it more likely we connect to a primary guard,
|
|
we would like to retry the primary guards more often than other
|
|
types of guards. This parameter controls how many minutes should
|
|
pass before we consider retrying primary guards again. The
|
|
proposed value is 3.
|
|
|
|
SAMPLE_SET_THRESHOLD
|
|
In order to allow us to recognize completely unreachable network,
|
|
we would like to avoid connecting to too many guards before switching
|
|
modes. We also want to avoid exposing ourselves to too many nodes in a
|
|
potentially hostile situation. This parameter, expressed as a
|
|
fraction, determines the number of guards we should keep as the
|
|
sampled set of the only guards we will consider connecting
|
|
to. It will be used as a fraction for the sampled set.
|
|
If we assume there are 1900 guards, a setting of 0.02
|
|
means we will have a sample set of 38 guards.
|
|
This limits our total exposure. Proposed value is 0.02.
|
|
|
|
MINIMUM_FILTERED_SAMPLE_SIZE
|
|
The minimum size of the sampled set after filtering out nodes based on
|
|
client configuration (FILTERED_SAMPLED). Proposed value is ???.
|
|
|
|
MAXIMUM_SAMPLE_SIZE_THRESHOLD
|
|
In order to guarantee a minimum size of guards after filtering,
|
|
we expand SAMPLED_GUARDS until a limit. This fraction of GUARDS will be
|
|
used as an upper bound when expanding SAMPLED_GUARDS.
|
|
Proposed value is 0.03.
|
|
|
|
INTERNET_LIKELY_DOWN_INTERVAL
|
|
The number of minutes since we started trying to find an entry
|
|
guard before we should consider the network down and consider
|
|
retrying primary guards before using a functioning guard
|
|
found. Proposed value 5.
|
|
|
|
§4. Security properties and behavior under various conditions
|
|
|
|
Under normal conditions, this algorithm will allow us to quickly
|
|
connect and use guards we have used before with high likelihood of
|
|
working. Assuming the first primary guard is reachable and in the
|
|
consensus, this algorithm will deterministically always return that
|
|
guard.
|
|
|
|
Under dystopic conditions (when a firewall is in place that blocks
|
|
all ports except for potentially port 80 and 443), this algorithm
|
|
will try to connect to 2% of all guards before switching modes to try
|
|
dystopic guards. Currently, that means trying to connect to circa 40
|
|
guards before getting a successful connection. If we assume a
|
|
connection try will take maximum 10 seconds, that means it will take
|
|
up to 6 minutes to get a working connection.
|
|
|
|
When the network is completely down, we will try to connect to 2% of
|
|
all guards plus 2% of all dystopic guards before realizing we are
|
|
down. This means circa 50 guards tried assuming there are 1900 guards
|
|
in the network.
|
|
|
|
In terms of exposure, we will connect to a maximum of 2% of all
|
|
guards plus 2% of all dystopic guards, or 3% of all guards,
|
|
whichever is lower. If N is the number of guards, and k is the
|
|
number of guards an attacker controls, that means an attacker would
|
|
have a probability of 1-(1-(k/N)^2)^(N * 0.03) to have one of their
|
|
guards selected before we fall back. In real terms, this means an
|
|
attacker would need to control over 10% of all guards in order to
|
|
have a larger than 50% chance of controlling a guard for any given client.
|
|
|
|
In addition, since the sampled set changes slowly (the suggestion
|
|
here is that guards in it expire every month) it is not possible for
|
|
an attacker to force a connection to an entry guard that isn't
|
|
already in the users sampled set.
|
|
|
|
|
|
§A. Appendix: An example usage
|
|
|
|
In order to clarify how this algorithm is supposed to be used, this
|
|
pseudo code illustrates the building of a circuit:
|
|
|
|
ESTABLISH_CIRCUIT:
|
|
|
|
if chosen_entry_node = NULL
|
|
if context = NULL
|
|
context = ALGO_CHOOSE_ENTRY_GUARD_START(used_guards,
|
|
sampled_guards=[],
|
|
options,
|
|
n_primary_guards=3,
|
|
dir=false,
|
|
guards_in_consensus)
|
|
|
|
chosen_entry_node = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context)
|
|
if not IS_SUITABLE(chosen_entry_node)
|
|
try another entry guard
|
|
|
|
circuit = composeCircuit(chosen_entry_node)
|
|
return circuit
|
|
|
|
ON_FIRST_HOP_CALLBACK(channel):
|
|
|
|
if !SHOULD_CONTINUE:
|
|
ALGO_CHOOSE_ENTRY_GUARD_END(entryGuard)
|
|
else
|
|
chosen_entry_node = NULL
|
|
|
|
|
|
§B. Appendix: Entry Points in Tor
|
|
|
|
In order to clarify how this algorithm is supposed to be integrated with
|
|
Tor, here are some entry points to trigger actions mentioned in spec:
|
|
|
|
When establish_circuit:
|
|
|
|
If *chosen_entry_node* doesn't exist
|
|
If *context* exist, populate the first one as *context*
|
|
Otherwise, use ALGO_CHOOSE_ENTRY_GUARD_START to initalize a new *context*.
|
|
|
|
After this when we want to choose_good_entry_server, we will use
|
|
ALGO_CHOOSE_ENTRY_GUARD_NEXT to get a candidate.
|
|
|
|
Use chosen_entry_node to build_circuit and handle_first_hop,
|
|
return this circuit
|
|
|
|
When entry_guard_register_connect_status(should_continue):
|
|
|
|
if !should_continue:
|
|
Call ALGO_CHOOSE_ENTRY_GUARD_END(chosen_entry_node)
|
|
else:
|
|
Set chosen_entry_node to NULL
|
|
|
|
When new directory_info_has_arrived:
|
|
|
|
Do ON_NEW_CONSENSUS
|
|
|
|
|
|
§C. Appendix: IS_SUITABLE helper function
|
|
|
|
A guard is suitable if it satisfies all of the folowing conditions:
|
|
- It's considered to be live, according to IS_LIVE.
|
|
- It's a directory cache if a directory guard is requested.
|
|
- It's not the chosen exit node.
|
|
- It's not in the family of the chosen exit node.
|
|
|
|
This conforms to the existing conditions in "populate_live_entry_guards()".
|
|
|
|
|
|
§D. Appendix: IS_LIVE helper function
|
|
|
|
A guard is considered live if it satisfies all of the folowing conditions:
|
|
- It's not disabled because of path bias issues (path_bias_disabled).
|
|
- It was not observed to become unusable according to the directory or
|
|
the user configuration (bad_since).
|
|
- It's marked for retry (can_retry) or it's been unreachable for some
|
|
time (unreachable_since) but enough time has passed since we last tried
|
|
to connect to it (entry_is_time_to_retry).
|
|
- It's in our node list, meaninig it's present in the latest consensus.
|
|
- It has a usable descriptor (either a routerdescriptor or a
|
|
microdescriptor) unless a directory guard is requested.
|
|
- It's a general-purpose router unless UseBridges is configured.
|
|
- It's reachable by the configuration (fascist_firewall_allows_node).
|
|
|
|
This conforms to the existing conditions in "entry_is_live()".
|
|
|
|
A guard is observed to become unusable according to the directory or the
|
|
user configuration if it satisfies any of the following conditions:
|
|
- It's not in our node list, meaninig it's present in the latest consensus.
|
|
- It's not currently running (is_running).
|
|
- It's not a bridge and not a configured bridge
|
|
(node_is_a_configured_bridge) and UseBridges is True.
|
|
- It's not a possible guard and is not in EntryNodes and UseBridges is
|
|
False.
|
|
- It's in ExcludeNodes. Nevertheless this is ignored when
|
|
loading from config.
|
|
- It's not reachable by the configuration (fascist_firewall_allows_node).
|
|
- It's disabled because of path bias issues (path_bias_disabled).
|
|
|
|
This conforms to the existing conditions in "entry_guards_compute_status()".
|
|
|
|
§E. Appendix: UseBridges and Bridges configurations
|
|
|
|
This is mutually exclusive with EntryNodes.
|
|
|
|
If options->UseBridges OR options->EntryNodes:
|
|
- guards = populate_live_entry_guards() - this is the "bridge flavour" of
|
|
IS_SUITABLE as mentioned before.
|
|
- return node_sl_choose_by_bandwidth(guards, WEIGHT_FOR_GUARD)
|
|
This is "choose a guard from S by bandwidth weight".
|
|
|
|
UseBridges and Bridges must be set together. Bridges go to bridge_list (via
|
|
bridge_add_from_config()), but how is it used?
|
|
learned_bridge_descriptor() adds the bridge to the global entry_guards if
|
|
UseBridges = True.
|
|
|
|
We either keep the existing global entry_guards OR incorporate bridges in the
|
|
proposal (remove non bridges from USED_GUARDS, and REMAINING_GUARDS = bridges?)
|
|
|
|
If UseBridges is set as true, we need to fill the SAMPLED_GUARDS
|
|
with bridges specified and learned from consensus.
|
|
|
|
§F. Appendix: EntryNodes configuration
|
|
|
|
This is mutually exclusive with Bridges.
|
|
|
|
The global entry_guards will be updated with entries in EntryNodes
|
|
(see entry_guards_set_from_config()).
|
|
|
|
If EntryNodes is set, we need to fill the SAMPLED_GUARDS with
|
|
EntryNodes specified in options.
|
|
|
|
§G. Appendix: IS_BAD helper function
|
|
|
|
A guard is considered bad if is not included in the newest
|
|
consensus.
|
|
|
|
§H. Appendix: IS_DEAD helper function
|
|
|
|
A guard is considered dead if it's marked as bad for
|
|
ENTRY_GUARD_REMOVE_AFTER period (30 days) unless they have been disabled
|
|
because of path bias issues (path_bias_disabled).
|
|
|
|
§I. Appendix: IS_OBSOLETE helper function
|
|
|
|
A guard is considered obsolete if it was chosen by an Tor
|
|
version we can't recognize or it was chosen more than GUARD_LIFETIME ago.
|
|
|
|
-*- coding: utf-8 -*-
|