mirror of
https://github.com/torproject/torspec.git
synced 2024-12-12 04:35:37 +00:00
278 lines
13 KiB
Plaintext
278 lines
13 KiB
Plaintext
Filename: 203-https-frontend.txt
|
|
Title: Avoiding censorship by impersonating an HTTPS server
|
|
Author: Nick Mathewson
|
|
Created: 24 Jun 2012
|
|
Status: Obsolete
|
|
|
|
Note: Obsoleted-by pluggable transports.
|
|
|
|
|
|
Overview:
|
|
|
|
One frequently proposed approach for censorship resistance is that
|
|
Tor bridges ought to act like another TLS-based service, and deliver
|
|
traffic to Tor only if the client can demonstrate some shared
|
|
knowledge with the bridge.
|
|
|
|
In this document, I discuss some design considerations for building
|
|
such systems, and propose a few possible architectures and designs.
|
|
|
|
Background:
|
|
|
|
Most of our previous work on censorship resistance has focused on
|
|
preventing passive attackers from identifying Tor bridges, or from
|
|
doing so cheaply. But active attackers exist, and exist in the wild:
|
|
right now, the most sophisticated censors use their anti-Tor passive
|
|
attacks only as a first round of filtering before launching a
|
|
secondary active attack to confirm suspected Tor nodes.
|
|
|
|
One idea we've been talking about for a while is that of having a
|
|
service that looks like an HTTPS service unless a client does some
|
|
particular secret thing to prove it is allowed to use it as a Tor
|
|
bridge. Such a system would still succumb to passive traffic
|
|
analysis attacks (since the packet timings and sizes for HTTPS don't
|
|
look that much like Tor), but it would be enough to beat many current
|
|
censors.
|
|
|
|
Goals and requirements:
|
|
|
|
We should make it impossible for a passive attacker who examines only
|
|
a few packets at a time to distinguish Tor->Bridge traffic from an
|
|
HTTPS client talking to an HTTPS server.
|
|
|
|
We should make it impossible for an active attacker talking to the
|
|
server to tell a Tor bridge server from a regular HTTPS server.
|
|
|
|
We should make it impossible for an active attacker who can MITM the
|
|
server to learn from the client whether it thought it was connecting
|
|
to an HTTPS server or a Tor bridge. (This implies that an MITM
|
|
attacker shouldn't be able to learn anything that would help it
|
|
convince the server to act like a bridge.)
|
|
|
|
It would be nice to minimize the required code changes to Tor, and
|
|
the required code changes to any other software.
|
|
|
|
It would be good to avoid any requirement of close integration with
|
|
any particular HTTP or HTTPS implementation.
|
|
|
|
If we're replacing our own profile with that of an HTTPS service, we
|
|
should do so in a way that lets us use the profile of a popular
|
|
HTTPS implementation.
|
|
|
|
Efficiency would be good: layering TLS inside TLS is best avoided if
|
|
we can.
|
|
|
|
Discussion:
|
|
|
|
We need an actual web server; HTTP and HTTPS are so complicated that
|
|
there's no practical way to behave in a bug-compatible way with any
|
|
popular webserver short of running that webserver.
|
|
|
|
More obviously, we need a TLS implementation (or we can't implement
|
|
HTTPS), and we need a Tor bridge (since that's the whole point of
|
|
this exercise).
|
|
|
|
So from a top-level point of view, the question becomes: how shall we
|
|
wire these together?
|
|
|
|
There are three obvious ways; I'll discuss them in turn below.
|
|
|
|
Design #1: TLS in Tor
|
|
|
|
Under this design, Tor accepts HTTPS connections, decides which ones
|
|
don't look like the Tor protocol, and relays them to a webserver.
|
|
|
|
+--------------------------------------+
|
|
+------+ TLS | +------------+ http +-----------+ |
|
|
| User |<------> | Tor Bridge |<----->| Webserver | |
|
|
+------+ | +------------+ +-----------+ |
|
|
| trusted host/network |
|
|
+--------------------------------------+
|
|
|
|
This approach would let us use a completely unmodified webserver
|
|
implementation, but would require the most extensive changes in Tor:
|
|
we'd need to add yet another flavor to Tor's TLS ice cream parlor,
|
|
and try to emulate a popular webserver's TLS behavior even more
|
|
thoroughly.
|
|
|
|
To authenticate, we would need to take a hybrid approach, and begin
|
|
forwarding traffic to the webserver as soon as a webserver
|
|
might respond to the traffic. This could be pretty complicated,
|
|
since it requires us to have a model of how the webserver would
|
|
respond to any given set of bytes. As a workaround, we might try
|
|
relaying _all_ input to the webserver, and only replying as Tor in
|
|
the cases where the website hasn't replied. (This would likely
|
|
create recognizable timing patterns, though.)
|
|
|
|
The authentication itself could use a system akin to Tor proposals
|
|
189/190, where an early AUTHORIZE cell shows knowledge of a shared
|
|
secret if the client is a Tor client.
|
|
|
|
Design #2: TLS in the web server
|
|
|
|
+----------------------------------+
|
|
+------+ TLS | +------------+ tor0 +-----+ |
|
|
| User |<------> | Webserver |<------->| Tor | |
|
|
+------+ | +------------+ +-----+ |
|
|
| trusted host/network |
|
|
+----------------------------------+
|
|
|
|
In this design, we write an Apache module or something that can
|
|
recognize an authenticator of some kind in an HTTPS header, or
|
|
recognize a valid AUTHORIZE cell, and respond by forwarding the
|
|
traffic to a Tor instance.
|
|
|
|
To avoid the efficiency issue of doing an extra local
|
|
encrypt/decrypt, we need to have the webserver talk to Tor over a
|
|
local unencrypted connection. (I've denoted this as "tor0" in the
|
|
diagram above.) For implementation convenience, we might want to
|
|
implement that as a NULL TLS connection, so that the Tor server code
|
|
wouldn't have to change except to allow local NULL TLS connections in
|
|
this configuration.
|
|
|
|
For the Tor handshake to work properly here, we'll need a way for the
|
|
Tor instance to know which public key the webserver is configured to
|
|
use.
|
|
|
|
We wouldn't need to support the parts of the Tor link protocol used
|
|
to authenticate clients to servers: relays shouldn't be using this
|
|
subsystem at all.
|
|
|
|
The Tor client would need to connect and prove its status as a Tor
|
|
client. If the client uses some means other than AUTHORIZE cells, or
|
|
if we want to do the authentication in a pluggable transport, and we
|
|
therefore decided to offload the responsibility for TLS itself to the
|
|
pluggable transport, that would scare me: Supporting pluggable
|
|
transports that have the responsibility for TLS would make it fairly
|
|
easy to mess up the crypto, and I'd rather not have it be so easy to
|
|
write a pluggable transport that accidentally makes Tor less secure.
|
|
|
|
Design #3: Reverse proxy
|
|
|
|
|
|
+----------------------------------+
|
|
| +-------+ http +-----------+ |
|
|
| | |<------>| Webserver | |
|
|
+------+ TLS | | | +-----------+ |
|
|
| User |<------> | Proxy | |
|
|
+------+ | | | tor0 +-----------+ |
|
|
| | |<------>| Tor | |
|
|
| +-------+ +-----------+ |
|
|
| trusted host/network |
|
|
+----------------------------------+
|
|
|
|
In this design, we write a server-side proxy to sit in front of Tor
|
|
and a webserver, or repurpose some existing HTTPS proxy. Its role
|
|
will be to do TLS, and then forward connections to Tor or the
|
|
webserver as appropriate. (In the web world, this kind of thing is
|
|
called a "reverse proxy", so that's the term I'm using here.)
|
|
|
|
To avoid fingerprinting, we should choose a proxy that's already in
|
|
common use as a TLS front-end for webservers -- nginx, perhaps.
|
|
Unfortunately, the more popular tools here seem to be pretty complex,
|
|
and the simpler tools less widely deployed. More investigation would
|
|
be needed.
|
|
|
|
The authorization considerations would be as in Design #2 above; for
|
|
the reasons discussed there, it's probably a good idea to build the
|
|
necessary authorization into Tor itself.
|
|
|
|
I generally like this design best: it lets us isolate the "Check for
|
|
a valid authenticator and/or a valid or invalid HTTP header, and
|
|
react accordingly" question to a single program.
|
|
|
|
How to authenticate: The easiest way
|
|
|
|
Designing a good MITM-resistant AUTHORIZE cell, or an equivalent
|
|
HTTP header, is an open problem that we should solve in proposals
|
|
190 and 191 and their successors. I'm calling it out-of-scope here;
|
|
please see those proposals, their attendant discussion, and their
|
|
eventual successors.
|
|
|
|
How to authenticate: a slightly harder way
|
|
|
|
Some proposals in this vein have in the past suggested a special
|
|
HTTP header to distinguish Tor connections from non-Tor connections.
|
|
This could work too, though it would require substantially larger
|
|
changes on the Tor client's part, would still require the client
|
|
take measures to avoid MITM attacks, and would also require the
|
|
client to implement a particular browser's http profile.
|
|
|
|
Some considerations on distinguishability
|
|
|
|
Against a passive eavesdropper, the easiest way to avoid
|
|
distinguishability in server responses will be to use an actual web
|
|
server or reverse web proxy's TLS implementation.
|
|
(Distinguishability based on client TLS use is another topic
|
|
entirely.)
|
|
|
|
Against an active non-MITM attacker, the best probing attacks will be
|
|
ones designed to provoke the system into acting in ways different from
|
|
those in which a webserver would act: responding earlier than a web
|
|
server would respond, or later, or differently. We need to make sure
|
|
that, whatever the front-end program is, it answers anything that
|
|
would qualify as a well-formed or ill-formed HTTP request whenever
|
|
the web server would. This must mean, for example, that whatever the
|
|
correct form of client authorization turns out to be, no prefix of
|
|
that authorization is ever something that the webserver would respond
|
|
to. With some web servers (I believe), that's as easy as making sure
|
|
that any valid authenticator isn't too long, and doesn't contain a CR
|
|
or LF character. With others, the authenticator would need to be a
|
|
valid HTTP request, with all the attendant difficulty that would
|
|
raise.
|
|
|
|
Against an attacker who can MITM the bridge, the best attacks will be
|
|
to wait for clients to connect and see how they behave. In this
|
|
case, the client probably needs to be able to authenticate the bridge
|
|
certificate as presented in the initial TLS handshake -- or some
|
|
other aspect of the TLS handshake if we're feeling insane. If the
|
|
certificate or handshake isn't as expected, the client should behave
|
|
as a web browser that's just received a bad TLS certificate. (The
|
|
alternative there would be to try to impersonate an HTTPS client that
|
|
has just accepted a self-signed certificate. But that would probably
|
|
require the Tor client to impersonate a full web browser, which isn't
|
|
realistic.)
|
|
|
|
Side note: What to put on the webserver?
|
|
|
|
To credibly pretend not to be ourselves, we must pretend to be
|
|
something else in particular -- and something not easily identifiable
|
|
or inherently worthless. We should not, for example, have all
|
|
deployments of this kind use a fixed website, even if that website is
|
|
the default "Welcome to Apache" configuration: A censor would
|
|
probably feel that they weren't breaking anything important by
|
|
blocking all unconfigured websites with nothing on them.
|
|
|
|
Therefore, we should probably conceive of a system like this as
|
|
"Something to add to your HTTPS website" rather than as a standalone
|
|
installation.
|
|
|
|
Related work:
|
|
|
|
meek [1] is a pluggable transport that uses HTTP for carrying bytes
|
|
and TLS for obfuscation. Traffic is relayed through a third-party
|
|
server (Google App Engine). It uses a trick to talk to the third
|
|
party so that it looks like it is talking to an unblocked server.
|
|
|
|
meek itself is not really about HTTP at all. It uses HTTP only
|
|
because it's convenient and the big Internet services we use as cover
|
|
also use HTTP. meek uses HTTP as a transport, and TLS for
|
|
obfuscation, but the key idea is really "domain fronting," where it
|
|
appears to the censor you are talking to one domain (www.google.com),
|
|
but behind the scenes you are talking to another
|
|
(meek-reflect.appspot.com). The meek-server program is an ordinary
|
|
HTTP (not necessarily even HTTPS!) server, whose communication is
|
|
easily fingerprintable; but that doesn't matter because the censor
|
|
never sees that part of the communication, only the communication
|
|
between the client and CDN.
|
|
|
|
One way to think about the difference: if a censor (somehow) learns
|
|
the IP address of a bridge as described in this proposal, it's easy
|
|
and low-cost for the censor to block that bridge by IP address. meek
|
|
aims to make it much more expensive: even if you know a domain is
|
|
being used (in part) for circumvention, in order to block it have to
|
|
block something important like the Google frontend or CloudFlare
|
|
(high collateral damage).
|
|
|
|
1. https://trac.torproject.org/projects/tor/wiki/doc/meek
|