mirror of
https://github.com/torproject/torspec.git
synced 2024-11-23 09:49:45 +00:00
Add proposals 336 and 337.
This commit is contained in:
parent
52a5e71527
commit
ea41a66447
87
proposals/336-randomize-guard-retries.md
Normal file
87
proposals/336-randomize-guard-retries.md
Normal file
@ -0,0 +1,87 @@
|
||||
```
|
||||
Filename: 336-randomize-guard-retries.md
|
||||
Title: Randomized schedule for guard retries
|
||||
Author: Nick Mathewson
|
||||
Created: 2021-10-22
|
||||
Status: Open
|
||||
```
|
||||
|
||||
# Introduction
|
||||
|
||||
When we notice that a guard isn't working, we don't mark it as retriable
|
||||
until a certain interval has passed. Currently, these intervals are
|
||||
fixed, as described in the documentation for `GUARDS_RETRY_SCHED` in
|
||||
`guard-spec` appendix A.1. Here we propose using a randomized retry
|
||||
interval instead, based on the same decorrelated-jitter algorithm we use
|
||||
for directory retries.
|
||||
|
||||
The upside of this approach is that it makes our behavior in
|
||||
the presence of an unreliable network a bit harder for an attacker to
|
||||
predict. It also means that if a guard goes down for a while, its
|
||||
clients will notice that it is up at staggered times, rather than
|
||||
probing it in lock-step.
|
||||
|
||||
The downside of this approach is that we can, if we get unlucky
|
||||
enough, completely fail to notice that a preferred guard is online when
|
||||
we would otherwise have noticed sooner.
|
||||
|
||||
Note that when a guard is marked retriable, it isn't necessarily retried
|
||||
immediately. Instead, its status is changed from "Unreachable" to
|
||||
"Unknown", which will cause it to get retried.
|
||||
|
||||
For reference, our previous schedule was:
|
||||
|
||||
```
|
||||
{param:PRIMARY_GUARDS_RETRY_SCHED}
|
||||
-- every 10 minutes for the first six hours,
|
||||
-- every 90 minutes for the next 90 hours,
|
||||
-- every 4 hours for the next 3 days,
|
||||
-- every 9 hours thereafter.
|
||||
|
||||
{param:GUARDS_RETRY_SCHED} --
|
||||
-- every hour for the first six hours,
|
||||
-- every 4 hours for the next 90 hours,
|
||||
-- every 18 hours for the next 3 days,
|
||||
-- every 36 hours thereafter.
|
||||
```
|
||||
|
||||
# The new algorithm
|
||||
|
||||
We re-use the decorrelated-jitter algorithm from `dir-spec` section 5.5.
|
||||
The specific formula used to compute the 'i+1'th delay is:
|
||||
|
||||
```
|
||||
Delay_{i+1} = MIN(cap, random_between(lower_bound, upper_bound))
|
||||
where upper_bound = MAX(lower_bound+1, Delay_i * 3)
|
||||
lower_bound = MAX(1, base_delay).
|
||||
```
|
||||
|
||||
For primary guards, we set base_delay to 30 seconds and cap to 6 hours.
|
||||
|
||||
For non-primary guards, we set base_delay to 10 minutes and cap to 36
|
||||
hours.
|
||||
|
||||
(These parameters were selected by simulating the results of using them
|
||||
until they looked "a bit more aggressive" than the current algorithm, but
|
||||
not too much.)
|
||||
|
||||
The average behavior for the new primary schedule is:
|
||||
|
||||
```
|
||||
First 1.0 hours: 10.14283 attempts. (Avg delay 4m 47.41s)
|
||||
First 6.0 hours: 19.02377 attempts. (Avg delay 15m 36.95s)
|
||||
First 96.0 hours: 56.11173 attempts. (Avg delay 1h 40m 3.13s)
|
||||
First 168.0 hours: 83.67091 attempts. (Avg delay 1h 58m 43.16s)
|
||||
Steady state: 2h 36m 44.63s between attempts.
|
||||
```
|
||||
|
||||
The average behavior for the new non-primary schedule is:
|
||||
|
||||
```
|
||||
First 1.0 hours: 3.08069 attempts. (Avg delay 14m 26.08s)
|
||||
First 6.0 hours: 8.1473 attempts. (Avg delay 35m 25.27s)
|
||||
First 96.0 hours: 22.57442 attempts. (Avg delay 3h 49m 32.16s)
|
||||
First 168.0 hours: 29.02873 attempts. (Avg delay 5h 27m 2.36s)
|
||||
Steady state: 11h 15m 28.47s between attempts.
|
||||
```
|
||||
|
138
proposals/337-simpler-guard-usability.md
Normal file
138
proposals/337-simpler-guard-usability.md
Normal file
@ -0,0 +1,138 @@
|
||||
```
|
||||
Filename: 337-simpler-guard-usability.md
|
||||
Title: A simpler way to decide, "Is this guard usable?"
|
||||
Author: Nick Mathewson
|
||||
Created: 2021-10-22
|
||||
Status: Open
|
||||
```
|
||||
|
||||
# Introduction
|
||||
|
||||
The current `guard-spec` describes a mechanism for how to behave when
|
||||
our primary guards are unreachable, and we don't know which other guards
|
||||
are reachable. This proposal describes a simpler method, currently
|
||||
implemented in [Arti](https://gitlab.torproject.org/tpo/core/arti/).
|
||||
|
||||
(Note that this method might not actually give different results: its
|
||||
only advantage is that it is much simpler to implement.)
|
||||
|
||||
## The task at hand
|
||||
|
||||
For illustration, we'll assume that our primary guards are P1, P2, and
|
||||
P3, and our subsequent guards (in preference order) are G1, G2, G3, and
|
||||
so on. The status of each guard is Reachable (we think we can connect
|
||||
to it), Unreachable (we think it's down), or Unknown (we haven't tried
|
||||
it recently).
|
||||
|
||||
The question becomes, "What should we do when P1, P2, and P3 are
|
||||
Unreachable, and G1, G2, ... are all Unknown"?
|
||||
|
||||
In this circumstance, we _could_ say that we only build circuits to G1,
|
||||
wait for them to succeed or fail, and only try G2 if we see that the
|
||||
circuits to G1 have failed completely. But that delays in the case that
|
||||
G1 is down.
|
||||
|
||||
Instead, the first time we get a circuit request, we try to build one
|
||||
circuit to G1. On the next circuit request, if the circuit to G1 isn't
|
||||
done yet, we launch a circuit to G2 instead. The next request (if the
|
||||
G1 and G2 circuits are still pending) goes to G3, and so on. But
|
||||
(here's the critical part!) we don't actually _use_ the circuit to G2
|
||||
unless the circuit to G1 fails, and we don't actually _use_ the circuit
|
||||
to G3 unless the circuits to G1 and G2 both fail.
|
||||
|
||||
This approach causes Tor clients to check the status of multiple
|
||||
possible guards in parallel, while not actually _using_ any guard until
|
||||
we're sure that all the guards we'd rather use are down.
|
||||
|
||||
## The current algorithm and its drawbacks
|
||||
|
||||
For the current algorithm, see `guard-spec` section 4.9: circuits are
|
||||
exploratory if they are not using a primary guard. If such an
|
||||
exploratory circuit is `waiting_for_better_guard`, then we advance it
|
||||
(or not) depending on the status of all other _circuits_ using guards that
|
||||
we'd rather be using.
|
||||
|
||||
In other words, the current algorithm is described in terms of actions
|
||||
to take with given circuits.
|
||||
|
||||
For Arti (and for other modular Tor implementations), however, this
|
||||
algorithm is a bit of a pain: it introduces dependencies between the
|
||||
guard code and the circuit handling code, requiring each one to mess
|
||||
with the other.
|
||||
|
||||
# Proposal
|
||||
|
||||
I suggest that we describe an alternative algorithm for handing circuits
|
||||
to non-primary guards, to be used in preference to the current
|
||||
algorithm. Unlike the existing approach, it isolates the guard logic a
|
||||
bit better from the circuit logic.
|
||||
|
||||
## Handling exploratory circuits
|
||||
|
||||
When all primary guards are Unreachable, we need to try non-primary
|
||||
guards. We select the first such guard (in preference order) that is
|
||||
neither Unreachable nor Pending. Whenever we give out such a guard, if
|
||||
the guard's status is Unknown, then we call that guard "Pending" until
|
||||
the attempt to use it succeeds or fails. We remember when the guard
|
||||
became Pending.
|
||||
|
||||
> Aside: None of the above is a change from our existing specification.
|
||||
|
||||
After completing a circuit, the implementation must check whether
|
||||
its guard is usable. A guard is usable according to these rules:
|
||||
|
||||
Primary guards are always usable.
|
||||
|
||||
Non-primary guards are usable for a given circuit if every guard earlier
|
||||
in the preference list is either unsuitable for that circuit
|
||||
(e.g. because of family restrictions), or marked as Unreachable, or has
|
||||
been pending for at least `{NONPRIMARY_GUARD_CONNECT_TIMEOUT}`.
|
||||
|
||||
Non-primary guards are unusable for a given circuit if some guard earlier
|
||||
in the preference list is suitable for the circuit _and_ Reachable.
|
||||
|
||||
Non-primary guards are unusable if they have not become usable after
|
||||
`{NONPRIMARY_GUARD_IDLE_TIMEOUT}` seconds.
|
||||
|
||||
If a circuit's guard is neither usable nor unusable immediately, the
|
||||
circuit is not discarded; instead, it is kept (but not used) until it
|
||||
becomes usable or unusable.
|
||||
|
||||
> I am not 100% sure whether this description produces the same behavior
|
||||
> as the current guard-spec, but it is simpler to describe, and has
|
||||
> proven to be simpler to implement.
|
||||
|
||||
## Implications for program design.
|
||||
|
||||
(This entire section is implementation detail to explain why this is a
|
||||
simplification from the previous algorithm. It is for explanatory
|
||||
purposes only and is not part of the spec.)
|
||||
|
||||
With this algorithm, we cut down the interaction between the guard code
|
||||
and the circuit code considerably, but we do not remove it entirely.
|
||||
Instead, there remains (in Arti terms) a pair of communication channels
|
||||
between the circuit manager and the guard manager:
|
||||
|
||||
* Whenever a guard is given to the circuit manager, the circuit manager
|
||||
receives the write end of a single-use channel to
|
||||
report whether the guard has succeeded or failed.
|
||||
|
||||
* Whenever a non-primary guard is given to the circuit manager, the
|
||||
circuit receives the read end of a single-use channel that will tell
|
||||
it whether the guard is usable or unusable. This channel doesn't
|
||||
report anything until the guard has one status or the other.
|
||||
|
||||
With this design, the circuit manager never needs to look at the list of
|
||||
guards, and the guard manager never needs to look at the list of
|
||||
circuits.
|
||||
|
||||
## Subtleties concerning "guard success"
|
||||
|
||||
Note that the above definitions of a Reachable guard depend on reporting
|
||||
when the _guard_ is successful or failed. This is not necessarily the
|
||||
same as reporting whether the _circuit_ is successful or failed. For
|
||||
example, a circuit that fails after the first hop does not necessarily
|
||||
indicate that there's anything wrong with the guard. Similarly, we can
|
||||
reasonably conclude that the guard is working (at least somewhat) as
|
||||
long as we have an open channel to it.
|
||||
|
Loading…
Reference in New Issue
Block a user