mirror of
https://github.com/torproject/torspec.git
synced 2024-12-12 04:35:37 +00:00
159 lines
6.2 KiB
Plaintext
159 lines
6.2 KiB
Plaintext
Filename: 255-hs-load-balancing.txt
|
|
Title: Controller features to allow for load-balancing hidden services
|
|
Author: Tom van der Woerdt
|
|
Created: 2015-10-12
|
|
Status: Reserve
|
|
|
|
1. Overview and motivation
|
|
|
|
To address scaling concerns with the onion web, we want to be able to
|
|
spread the load of hidden services across multiple machines.
|
|
OnionBalance is a great stab at this, and it can currently give us 60x
|
|
the capacity by publishing 6 separate descriptors, each with 10
|
|
introduction points, but more is better. This proposal aims to address
|
|
hidden service scaling up to a point where we can handle millions of
|
|
concurrent connections.
|
|
|
|
The basic idea involves splitting the 'introduce' from the
|
|
'rendezvous', in the tor implementation, and adding new events and
|
|
commands to the control specification to allow intercepting
|
|
introductions and transmitting them to different nodes, which will then
|
|
take care of the actual rendezvous. External controller code could
|
|
relay the data to another node or a pool of nodes, all which are run by
|
|
the hidden service operator, effectively distributing the load of
|
|
hidden services over multiple processes.
|
|
|
|
By cleverly utilizing the current descriptor methods through
|
|
OnionBalance, we could publish up to sixty unique introduction points,
|
|
which could translate to many thousands of parallel tor workers after
|
|
implementing this proposal. This should allow hidden services to go
|
|
multi-threaded with a few small changes, and continue scaling for a
|
|
long time.
|
|
|
|
|
|
2. Specification
|
|
|
|
We propose two additions to the control specification, of which one is
|
|
an event and the other is a new command. We also introduce two new
|
|
configuration options.
|
|
|
|
|
|
2.1. HiddenServiceAutomaticRendezvous configuration option
|
|
|
|
The syntax is:
|
|
"HiddenServiceAutomaticRendezvous" SP [1|0] CRLF
|
|
|
|
This configuration option is defined to be a boolean toggle which, if
|
|
zero, stops the tor implementation from automatically doing a rendezvous
|
|
when an INTRODUCE2 cell is received. Instead, an event will be sent to
|
|
the controllers. If no controllers are present, the introduction cell
|
|
should be dropped, as acting on it instead of dropping it could open a
|
|
window for a DoS.
|
|
|
|
This configuration option can be specified on a per-hidden service
|
|
level, and can be set through the controller for ephemeral hidden
|
|
services as well.
|
|
|
|
|
|
2.2. HiddenServiceTag configuration option
|
|
|
|
The syntax is:
|
|
"HiddenServiceTag" SP [a-zA-Z0-9] CRLF
|
|
|
|
To identify groups of hidden services more easily across nodes, a
|
|
name/tag can be given to a hidden service. Defaults to the storage path
|
|
of the hidden service (HiddenServiceDir).
|
|
|
|
|
|
2.3. The "INTRODUCE" event
|
|
|
|
The syntax is:
|
|
"650" SP "INTRODUCE" SP HSTag SP RendezvousData CRLF
|
|
|
|
HSTag = the tag of the hidden service
|
|
RendezvousData = implementation-specific, but must not contain
|
|
whitespace, must only contain human-readable
|
|
characters, and should be no longer than 2048 bytes
|
|
|
|
The INTRODUCE event should contain sufficient data to allow continuing
|
|
the rendezvous from another Tor instance. The exact format is left
|
|
unspecified and left up to the implementation. From this follows that
|
|
only matching versions can be used safely to coordinate the rendezvous
|
|
of hidden service connections.
|
|
|
|
|
|
2.4. "PERFORM-RENDEZVOUS" command
|
|
|
|
The syntax is:
|
|
"PERFORM-RENDEZVOUS" SP HSTag SP RendezvousData CRLF
|
|
|
|
This command allows a controller to perform a rendezvous using data
|
|
received through an INTRODUCE event. The format of RendezvousData is
|
|
not specified other than that it must not contain whitespace, and
|
|
should be no longer than 2048 bytes.
|
|
|
|
|
|
2.5. The RendezvousData blob
|
|
|
|
The "RendezvousData" blob is opaque to the controller, however the tor
|
|
implementation should of course know how to deal with it. Its contents
|
|
is the minimal amount of data required to process the INTRODUCE2 cell
|
|
on another machine.
|
|
|
|
Before proposal 224 is implemented, this could consist of the
|
|
INTRODUCE2 cell payload, the key to decrypt the cell if the cell
|
|
is not already decrypted (which may be preferable, for performance
|
|
reasons), and data necessary for other machines to recognize what to do
|
|
with the cell.
|
|
|
|
After proposal 224 is implemented, the blob would contain any
|
|
additional keys needed to perform the rendezvous handshake.
|
|
|
|
Implementations do not need to handle blobs generated by other versions
|
|
of the software. Because of this, it is recommended to include a
|
|
version number which can be used to verify that the blob is from a
|
|
compatible implementation.
|
|
|
|
|
|
3. Compatibility and security
|
|
|
|
The implementation of these methods should, ideally, not change
|
|
anything in the network, and all control changes are opt-in, so this
|
|
proposal is fully backwards compatible.
|
|
|
|
Controllers handling this data must be careful to not leak rendezvous
|
|
data to untrusted parties, as it could be used to intercept and
|
|
manipulate hidden services traffic.
|
|
|
|
|
|
4. Example
|
|
|
|
Let's take an example where a client (Alice) tries to contact Bob's
|
|
hidden service. To do this, Bob follows the normal hidden service
|
|
specification, except he sets up ten servers to do this. One of these
|
|
publishes the descriptor, the others have this disabled. When the
|
|
INTRODUCE2 cell arrives at the node which published the descriptor, it
|
|
does not immediately try to perform the rendezvous, but instead outputs
|
|
this to the controller. Through an out-of-band process this message is
|
|
relayed to a controller of another node of Bob's, and this transmits
|
|
the "PERFORM-RENDEZVOUS" command to that node. This node
|
|
performs the rendezvous, and will continue to serve data to Alice,
|
|
whose client will now not have to talk to the introduction point
|
|
anymore.
|
|
|
|
|
|
5. Other considerations
|
|
|
|
We have left the actual format of the rendezvous data in the control
|
|
protocol unspecified, so that controllers do not need to worry about
|
|
the various types of hidden service connections, most notably proposal
|
|
224.
|
|
|
|
The decision to not implement the actual cell relaying in the tor
|
|
implementation itself was taken to allow more advanced configurations,
|
|
and to leave the actual load-balancing algorithm to the implementor of
|
|
the controller. The developer of the tor implementation should not
|
|
have to choose between a round-robin algorithm and something that could
|
|
pull CPU load averages from a centralized monitoring system.
|
|
|