torspec/proposals/261-aez-crypto.txt

283 lines
9.7 KiB
Plaintext

Filename: 261-aez-crypto.txt
Title: AEZ for relay cryptography
Author: Nick Mathewson
Created: 28 Oct 2015
Status: Obsolete
0. History
I wrote the first draft of this around October. This draft takes a
more concrete approach to the open questions from last time around.
1. Summary and preliminaries
This proposal describes an improved algorithm for circuit
encryption, based on the wide-block SPRP AEZ. I also describe the
attendant bookkeeping, including CREATE cells, and several
variants of the proposal.
For more information about AEZ, see
http://web.cs.ucdavis.edu/~rogaway/aez/
For motivations, see proposal 202.
2. Specifications
2.1. New CREATE cell types.
We add a new CREATE cell type that behaves as an ntor cell but which
specifies that the circuit will be created to use this mode of
encryption.
[TODO: Can/should we make this unobservable?]
The ntor handshake is performed as usual, but a different PROTOID is
used:
"ntor-curve25519-sha256-aez-1"
To derive keys under this handshake, we use SHAKE256 to derive the
following output:
struct shake_output {
u8 aez_key[48];
u8 chain_key[32];
u8 chain_val_forward[16];
u8 chain_val_backward[16];
};
The first two two fields are constant for the lifetime of the
circuit.
2.2. New relay cell payload
We specify the following relay cell payload format, to be used when
the exit node circuit hop was created with the CREATE format in 2.1
above:
struct relay_cell_payload {
u32 zero_1;
u16 zero_2;
u16 stream_id;
u16 length IN [0..498];
u8 command;
u8 data[498]; // payload_len - 11
};
Note that the payload length is unchanged. The fields are now
rearranged to be aligned. The 'recognized' and 'length' fields are
replaced with zero_1, zero_2, and the high 7 bits of length, for a
minimum of 55 bits of unambigious verification. (Additional
verification can be done by checking the other fields for
correctness; AEZ users can exploit plaintext redundancy for
additional cryptographic checking.)
When encrypting a cell for a hop that was created using one of these
circuits, clients and relays encrypt them using the AEZ algorithm
with the following parameters:
Let Chain denote chain_val_forward if this is a forward cell
or chain_val_backward otherwise.
tau = 0
# We set tau=0 because want no per-hop ciphertext expansion. Instead
# we use redundancy in the plaintext to authenticate the data.
Nonce =
struct {
u64 cell_number;
u8 is_forward;
u8 is_early;
}
# The cell number is the number of relay cells that have
# traveled in this direction on this circuit before this cell.
# ie, it's zero for the first cell, two for the second, etc.
#
# is_forward is 1 for outbound cells, 0 for inbound cells.
# is_early is 1 for cells packaged as RELAY_EARLY, 0 for
# cells packaged as RELAY.
#
# Technically these two values would be more at home in AD
# than in Nonce; but AEZ doesn't actually distinguish N and AD
# internally.
Define CELL_CHAIN_BYTES = 32
AD = [ XOR(prev_plaintext[:CELL_CHAIN_BYTES],
prev_ciphertext[:CELL_CHAIN_BYTES]),
Chain ]
# Using the previous cell's plaintext/ciphertext as additional data
# guarantees that any corrupt ciphertext received will corrupt the
# plaintext, which will corrupt all future plaintexts.
Set Chain = AES(chain_key, Chain) xor Chain.
# This 'chain' construction is meant to provide forward
# secrecy. Each chain value is replaced after each cell with a
# (hopefully!) hard-to-reverse construction.
This instantiates a wide-block cipher, tweaked based on the cell
index and direction. It authenticates part of the previous cell's
plaintext, thereby ensuring that if the previous cell was corrupted,
this cell will be unrecoverable.
3. Design considerations
3.1. Wide-block pros and cons?
See proposal 202, section 4.
3.2. Given wide-block, why AEZ?
It's a reasonably fast probably secure wide-block cipher. In
particular, it's performance-competitive with AES_CTR, and far better
than what we're doing now. See performance appendix.
It seems secure-ish too. Several cryptographers I know seem to
think it's likely secure enough, and almost surely at least as
good as AES.
[There are many other competing wide-block SPRP constructions if
you like. Many require blocks be an integer number of blocks, or
aren't tweakable. Some are slow. Do you know a good one?]
3.3. Why _not_ AEZ?
There are also some reasons to consider avoiding AEZ, even if we do
decide to use a wide-block cipher.
FIRST it is complicated to implement. As the specification says,
"The easiness claim for AEZ is with respect to ease and versatility
of use, not implementation."
SECOND, it's still more complicated to implement well (fast,
side-channel-free) on systems without AES acceleration. We'll need
to pull the round functions out of fast assembly AES, which is
everybody's favorite hobby.
THIRD, it's really horrible to try to do it in hardware.
FOURTH, it is comparatively new. Although several cryptographers
like it, and it is closely related to a system with a security proof,
you never know.
FIFTH, something better may come along.
4. Alternative designs
4.1. Two keys.
We could have a separate AEZ key for forward and backward encryption.
This would use more space, however.
4.2. Authenticating things differently
In computing the AD, we could replace xor with concat.
In computing the AD, we could replace CELL_CHAIN_BYTES with 16, or
509.
(Another thing we might dislike about the current proposal is
that it appears to requires us to remember 32 bytes of plaintext
until we get another cell. But that part is fixable: note that
in the structure of AEZ, the AD is processed in the AEZ-hash()
function, and then no longer used. We can compute the AEZ-hash()
to be used for the next cell after each cell is en/de crypted.)
4.3. Other hashes.
We could update the ntor definition used in this to use a better hash
than SHA256 inside.
4.4. Less predictable plaintext.
A positively silly option would be to reserve the last X bytes of
each relay cell's plaintext for random bytes, if they are not used
for payload. This might help a little, in a really doofy way.
A.1. Performance notes: memory requirements
Let's ignore Tor overhead here, but not openssl overhead.
IN THE CURRENT PROTOCOL, the total memory required at each relay is: 2
sha1 states, 2 aes states.
Each sha1 state uses 96 bytes. Each aes state uses 244 bytes. (Plus
32 bytes counter-mode overhead.) This works out to 704 bytes at each
hop.
IN THE PROTOCOL ABOVE, using an optimized AEZ implementation, we'll
need 128 bytes for the expanded AEZ key schedule. We'll need another
244 bytes for the AES key schedule for the chain key. And there's 32
bytes of chaining values. This gives us 404 bytes at each hop, for a
savings of 42%.
If we used separate AES and AEZ keys in each direction, we would be
looking at 776 bytes, for a space increase of 10%.
A.2. Performance notes: CPU requirements on AESNI hosts
The cell_ops benchmark in bench.c purports to tell us how long it
takes to encrypt a tor cell. But it wasn't really telling the truth,
since it only did one SHA1 operation every 2^16 cells, when
entries and exits really do one SHA1 operation every end-to-end cell.
I expanded it to consider the slow (SHA1) case as well. I ran this on
my friendly OSX laptop (2.9 GHz Intel Core i5) with AESNI support:
Inbound cells: 169.72 ns per cell.
Outbound cells: 175.74 ns per cell.
Inbound cells, slow case: 934.42 ns per cell.
Outbound cells, slow case: 928.23 ns per cell.
Note that So for an n-hop circuit, each cell does the slow case and
(n-1) fast cases at the entry; the slow case at the exit, and the fast
case at each middle relay.
So for 3 hop circuits, the total current load on the network is
roughly 425 ns per hop, concentrated at the exit.
Then I started messing around with AEZ benchmarks, using the
aesni-optimized version of AEZ on the AEZ website. (Further
optimizations are probably possible.) For the AES256, I
used the usual aesni-based aes construction.
Assuming xor is free in comparison to other operations, and
CELL_CHAIN_BYTES=32, I get roughly 270 ns per cell for the entire
operation.
If we were to pick CELL_CHAIN_BYTES=509, we'd be looking at around 303
ns per cell.
If we were to pick CELL_CHAIN_BYTES=509 and replace XOR with CONCAT,
it would be around 355 ns per cell.
If we were to keep CELL_CHAIN_BYTES=32, and remove the
AES256-chaining, I see values around 260 ns per cell.
(This is all very spotty measurements, with some xors left off, and
not much effort done at optimization beyond what the default
optimized AEZ does today.)
A.3. Performance notes: what if we don't have AESNI?
Here I'll test on a host with sse2 and ssse3, but no aesni instructions.
From Tor's benchmarks I see:
Inbound cells: 1218.96 ns per cell.
Outbound cells: 1230.12 ns per cell.
Inbound cells, slow case: 2099.97 ns per cell.
Outbound cells, slow case: 2107.45 ns per cell.
For a 3-hop circuit, that gives on average 1520 ns per cell.
[XXXX Do a real benchmark with a fast AEZ backend. First, write one.
Preliminary results are a bit disappointing, though, so I'll
need to invetigate alternatives as well.]