mirror of
https://github.com/torproject/torspec.git
synced 2024-11-23 09:49:45 +00:00
289 lines
12 KiB
Plaintext
289 lines
12 KiB
Plaintext
Filename: 209-path-bias-tuning.txt
|
|
Title: Tuning the Parameters for the Path Bias Defense
|
|
Author: Mike Perry
|
|
Created: 01-10-2012
|
|
Status: Obsolete
|
|
Target: 0.2.4.x+
|
|
|
|
|
|
Overview
|
|
|
|
This proposal describes how we can use the results of simulations in
|
|
combination with network scans to set reasonable limits for the Path
|
|
Bias defense, which causes clients to be informed about and ideally
|
|
rotate away from Guards that provide extremely low circuit success
|
|
rates.
|
|
|
|
Motivation
|
|
|
|
The Path Bias defense is designed to defend against a type of route
|
|
capture where malicious Guard nodes deliberately fail circuits that
|
|
extend to non-colluding Exit nodes to maximize their network
|
|
utilization in favor of carrying only compromised traffic.
|
|
|
|
This attack was explored in the academic literature in [1], and a
|
|
variant involving cryptographic tagging was posted to tor-dev[2] in
|
|
March.
|
|
|
|
In the extreme, the attack allows an adversary that carries c/n
|
|
of the network capacity to deanonymize c/n of the network
|
|
connections, breaking the O((c/n)^2) property of Tor's original
|
|
threat model.
|
|
|
|
In this case, however, the adversary is only carrying circuits for
|
|
which either the entry and exit are compromised, or all three nodes are
|
|
compromised. This means that the adversary's Guards will fail all but
|
|
(c/n) + (c/n)^2 of their circuits for clients that select it. For 10%
|
|
c/n compromise, such an adversary succeeds only 11% of their circuits
|
|
that start at their compromised Guards. For 20% c/n compromise, such
|
|
an adversary would only succeed 24% of their circuit attempts.
|
|
|
|
It is this property which leads me to believe that a simple local
|
|
accounting defense is indeed possible and worthwhile.
|
|
|
|
Design Description
|
|
|
|
The Path Bias defense is a client-side accounting mechanism in Tor that
|
|
tracks the circuit failure rate for each of the client's guards.
|
|
|
|
Clients maintain two integers for each of their guards: a count of the
|
|
number of times a circuit was extended at least one hop through that
|
|
guard, and a count of the number of circuits that successfully complete
|
|
through that guard. The ratio of these two numbers is used to determine
|
|
a circuit success rate for that Guard.
|
|
|
|
The system should issue a notice log message when Guard success rate
|
|
falls below 70%, a warn when Guard success rate falls below 50%, and
|
|
should drop the Guard when the success rate falls below 30%.
|
|
|
|
Circuit build timeouts are only counted as path failures if the
|
|
circuit fails to complete before the 95% "right-censored" (aka
|
|
"MEASUREMENT_EXPIRED") timeout interval, not the 80% timeout
|
|
condition[5]. This was done based on the assumption that destructive
|
|
cryptographic tagging is the primary vector for the path bias attack,
|
|
until such time as Tor's circuit crypto can be upgraded. Therefore,
|
|
being more lenient with timeout makes us more resilient to network
|
|
conditions.
|
|
|
|
To ensure correctness, checks are performed to ensure that
|
|
we do not count successes without also counting the first hop (see
|
|
usage of path_state_t as well as pathbias_* in the source).
|
|
|
|
Similarly, to provide a moving average of recent Guard activity while
|
|
still preserving the ability to ensure correctness, we periodically
|
|
"scale" the success counts by first multiplying by a numerator
|
|
(currently 1) and then dividing by an integer divisor (currently 2).
|
|
|
|
Scaling is performed when when the counts exceed the moving average
|
|
window (300) and when the division does not produce integer truncation.
|
|
|
|
No log messages should be displayed, nor should any Guard be
|
|
dropped until it has completed at least 150 first hops (inclusive).
|
|
|
|
Analysis: Simulation
|
|
|
|
To test the defense in the face of various types of malicious and
|
|
non-malicious Guard behavior, I wrote a simulation program in
|
|
Python[3].
|
|
|
|
The simulation confirmed that without any defense, an adversary
|
|
that provides c/n of the network capacity is able to observe c/n
|
|
of the network flows using circuit failure attacks.
|
|
|
|
It also showed that with the defense, an adversary that wishes to
|
|
evade detection has compromise rates bounded by:
|
|
|
|
P(compromise) <= (c/n)^2 * (100/CUTOFF_PERCENT)
|
|
circs_per_client <= circuit_attempts*(c/n)
|
|
|
|
In this way, the defense restores the O((c/n)^2) compromise property,
|
|
but unfortunately only over long periods of time (see Security
|
|
Considerations below).
|
|
|
|
The spread between the cutoff values and the normal rate of circuit
|
|
success has a substantial effect on false positives. From the
|
|
simulation's results, the sweet spot for the size of this spread
|
|
appears to be 10%. In other words, we want to set the cutoffs such that
|
|
they are 10% below the success rate we expect to see in normal usage.
|
|
|
|
The simulation also demonstrates that larger "scaling window" sizes
|
|
reduce false positives for instances where non-malicious guards
|
|
experience some ambient rate of circuit failure.
|
|
|
|
Analysis: Live Scan
|
|
|
|
Preliminary Guard node scanning using the txtorcon circuit scanner[4]
|
|
shows normal circuit completion rates between 80-90% for most Guard
|
|
nodes.
|
|
|
|
However, it also showed that CPU overload conditions can easily push
|
|
success rates as low as 45%. Even more concerning is that for a brief
|
|
period during the live scan, success rates dropped to 50-60%
|
|
network-wide (regardless of Guard node choice).
|
|
|
|
Based on these results, the notice condition should be 70%, the warn
|
|
condition should be 50%, and the drop condition should be 30%.
|
|
|
|
However, see the Security Considerations sections for reasons
|
|
to choose more lenient bounds.
|
|
|
|
Future Analysis: Deployed Clients
|
|
|
|
It's my belief that further analysis should be done by deploying
|
|
loglines for all three thresholds in clients in the live network
|
|
to utilize user reports on how often high rates of circuit failure
|
|
are seen before we deploy changes to rotate away from failing Guards.
|
|
|
|
I believe these log lines should be deployed in 0.2.3.x clients,
|
|
to maximize the exposure of the code to varying network conditions,
|
|
so that we have enough data to consider deploying the Guard-dropping
|
|
cutoff in 0.2.4.x.
|
|
|
|
Security Considerations: DoS Conditions
|
|
|
|
While the scaling window does provide freshness and can help mitigate
|
|
"bait-and-switch" attacks, it also creates the possibility of conditions
|
|
where clients can be forced off their Guards due to temporary
|
|
network-wide CPU DoS. This provides another reason beyond false positive
|
|
concerns to set the scaling window as large as is reasonable.
|
|
|
|
A DoS directed at specific Guard nodes is unlikely to allow an
|
|
adversary to cause clients to rotate away from that Guard, because it
|
|
is unlikely that the DoS can be precise enough to allow first hops to
|
|
that Guard to succeed, but also cause extends to fail. This leaves
|
|
network-wide DoS as the primary vector for influencing clients.
|
|
|
|
Simulation results show that in order to cause clients to rotate away
|
|
from a Guard node that previously succeeded 80% of its circuits, an
|
|
adversary would need to induce a 25% success rate for around 350 circuit
|
|
attempts before the client would reject it or a 5% success rate
|
|
for around 215 attempts, both using a scaling window of 300 circuits.
|
|
|
|
Assuming one circuit per Guard per 10 minutes of active client
|
|
activity, this is a sustained network-wide DoS attack of 60 hours
|
|
for the 25% case, or 38 hours for the 5% case.
|
|
|
|
Presumably this is enough time for the directory authorities to respond by
|
|
altering the pb_disablepct consensus parameter before clients rotate,
|
|
especially given that most clients are not active for even 38 hours on end,
|
|
and will tend to stop building circuits while idle.
|
|
|
|
If we raised the scaling window to 500 circuits, it would require 1050
|
|
circuits if the DoS brought circuit success down to 25% (175 hours), and
|
|
415 circuits if the DoS brought the circuit success down to 5% (69 hours).
|
|
|
|
The tradeoff, though, is that larger scaling window values allow Guard nodes
|
|
to compromise clients for duty cycles of around the size of this window (up to
|
|
the (c/n)^2 * 100/CUTOFF_PERCENT limit in aggregate), so we do have to find
|
|
balance between these concerns.
|
|
|
|
Security Considerations: Targeted Failure Attacks
|
|
|
|
If an adversary controls a significant portion of the network, they
|
|
may be able to target a Guard node by failing their circuits. In the
|
|
context of cryptographic tagging, both the Middle node and the Exit
|
|
node are able to recognize their colluding peers. The Middle node sees
|
|
the Guard directly, and the Exit node simply reverses a non-existent
|
|
tag, causing a failure.
|
|
|
|
P(EvilMiddle) || P(EvilExit) = 1.0 - P(HonestMiddle) && P(HonestExit)
|
|
= 1.0 - (1.0-(c/n))*(1.0-(c/n))
|
|
|
|
For 10% compromise, this works out to the ability to fail an
|
|
additional 19% of honest Guard circuits, and for 20% compromise,
|
|
it works out to 36%.
|
|
|
|
When added to the ambient circuit failure rates (10-20%), this is
|
|
within range of the notice and warn conditions, but not the guard
|
|
failure condition.
|
|
|
|
However, this attack does become feasible if a network-wide DoS
|
|
(or simply CPU load) is able to elevate the ambient failure
|
|
rate to 51% for the 10% compromise case, or 34% for the 20%
|
|
compromise case.
|
|
|
|
Since both conditions would elicit notices and/or warns from *all*
|
|
clients, this attack should be detectable. It can also be detected
|
|
through the bandwidth authorities (who could possibly even
|
|
set pathbias parameters directly based on measured ambient circuit
|
|
failure rates), should we deploy #7023.
|
|
|
|
Implementation Notes: Log Messages
|
|
|
|
Log messages need to be chosen with care to avoid alarming users.
|
|
I suggest:
|
|
|
|
Notice: "Your Guard %s is failing more circuits than usual. Most likely
|
|
this means the Tor network is overloaded. Success counts are %d/%d."
|
|
|
|
Warn: "Your Guard %s is failing a very large amount of circuits. Most likely
|
|
this means the Tor network is overloaded, but it could also mean an attack
|
|
against you or potentially the Guard itself. Success counts are %d/%d."
|
|
|
|
Drop: "Your Guard %s is failing an extremely large amount of circuits. [Tor
|
|
has disabled use of this Guard.] Success counts are %d/%d."
|
|
|
|
The second piece of the Drop message would not be present in 0.2.3.x,
|
|
since the Guard won't actually be dropped.
|
|
|
|
Implementation Notes: Consensus Parameters
|
|
|
|
The following consensus parameters reflect the constants listed
|
|
in the proposal. These parameters should also be available
|
|
for override in torrc.
|
|
|
|
pb_mincircs=150
|
|
The minimum number of first hops before we log or drop Guards.
|
|
|
|
pb_noticepct=70
|
|
The threshold of circuit success below which we display a notice.
|
|
|
|
pb_warnpct=50
|
|
The threshold of circuit success below which we display a warn.
|
|
|
|
pb_disablepct=30
|
|
The threshold of circuit success below which we disable the guard.
|
|
|
|
pb_scalecircs=300
|
|
The number of first hops at which we scale the counts down.
|
|
|
|
pb_multfactor=1
|
|
The integer numerator by which we scale.
|
|
|
|
pb_scalefactor=2
|
|
The integer divisor by which we scale.
|
|
|
|
pb_dropguards=0
|
|
If non-zero, we should actually drop guards as opposed to warning.
|
|
|
|
Implementation Notes: Differences between proposal and current source
|
|
|
|
This proposal adds a few changes over the implementation currently
|
|
deployed in origin/master.
|
|
|
|
The log messages suggested above are different than those in the
|
|
source.
|
|
|
|
The following consensus parameters had changes to their default
|
|
values, based on results from simulation and scanning:
|
|
pb_mincircs=150
|
|
pb_noticepct=70
|
|
pb_disablepct=30
|
|
pb_scalecircs=300
|
|
|
|
Also, the following consensus parameters are additions:
|
|
pb_multfactor=1
|
|
pb_warnpct=50
|
|
pb_dropguards=0
|
|
|
|
Finally, 0.2.3.x needs to be synced with origin/master, but should
|
|
also ignore the pb_dropguards parameter (but ideally still provide
|
|
the equivalent pb_dropguards torrc option).
|
|
|
|
|
|
1. http://freehaven.net/anonbib/cache/ccs07-doa.pdf
|
|
2. https://lists.torproject.org/pipermail/tor-dev/2012-March/003347.html
|
|
3. https://gitweb.torproject.org/torflow.git/tree/HEAD:/CircuitAnalysis/PathBias
|
|
4. https://github.com/meejah/txtorcon/blob/exit_scanner/apps/exit_scanner/failure-rate-scanner.py
|
|
5. See 2.4.1 of path-spec.txt for further details on circuit timeout calculations.
|