Filename: 270-newhope-hybrid-handshake.txt Title: RebelAlliance: A Post-Quantum Secure Hybrid Handshake Based on NewHope Author: Isis Lovecruft, Peter Schwabe Created: 16 Apr 2016 Updated: 22 Jul 2016 Status: Obsolete Depends: prop#220 prop#249 prop#264 prop#270 §0. Introduction RebelAlliance is a post-quantum secure hybrid handshake, comprised of an alliance between the X25519 and NewHope key exchanges. NewHope is a post-quantum-secure lattice-based key-exchange protocol based on the ring-learning-with-errors (Ring-LWE) problem. We propose a hybrid handshake for Tor, based on a combination of Tor's current NTor handshake and a shared key derived through a NewHope ephemeral key exchange. For further details on the NewHope key exchange, the reader is referred to "Post-quantum key exchange - a new hope" by Alkim, Ducas, Pöppelmann, and Schwabe [0][1]. For the purposes of brevity, we consider that NTor is currently the only handshake protocol in Tor; the older TAP protocol is ignored completely, due to the fact that it is currently deprecated and nearly entirely unused. §1. Motivation An attacker currently monitoring and storing circuit-layer NTor handshakes who later has the ability to run Shor's algorithm on a quantum computer will be able to break Tor's current handshake protocol and decrypt previous communications. It is unclear if and when such attackers equipped with large quantum computers will exist, but various estimates by researchers in quantum physics and quantum engineering give estimates of only 1 to 2 decades. Clearly, the security requirements of many Tor users include secrecy of their messages beyond this time span, which means that Tor needs to update the key exchange to protect against such attackers as soon as possible. §2. Design An initiator and responder, in parallel, conduct two handshakes: - An X25519 key exchange, as described in the description of the NTor handshake in Tor proposal #216. - A NewHope key exchange. The shared keys derived from these two handshakes are then concatenated and used as input to the SHAKE-256 extendable output function (XOF), as described in FIPS-PUB-202 [2], in order to produce a shared key of the desired length. The testvectors in §C assume that this key has a length of 32 bytes, but the use of a XOF allows arbitrary lengths to easily support future updates of the symmetric primitives using the key. See also §3.3.1. §3. Specification §3.1. Notation Let `a || b` be the concatenation of a with b. Let `a^b` denote the exponentiation of a to the bth power. Let `a == b` denote the equality of a with b, and vice versa. Let `a := b` be the assignment of the value of b to the variable a. Let `H(x)` be 32-bytes of output of the SHAKE-256 XOF (as described in FIPS-PUB-202) applied to message x. Let X25519 refer to the curve25519-based key agreement protocol described in RFC7748 §6.1. [3] Let `EXP(a, b) == X25519(., b, a)` with `g == 9`. Let X25519_KEYGEN() do the appropriate manipulations when generating the secret key (clearing the low bits, twidding the high bits). Additionally, EXP() MUST include the check for all-zero output due to the input point being of small order (cf. RFC7748 §6). Let `X25519_KEYID(B) == B` where B is a valid X25519 public key. When representing an element of the Curve25519 subgroup as a byte string, use the standard (32-byte, little-endian, x-coordinate-only) representation for Curve25519 points. Let `ID` be a router's identity key taken from the router microdescriptor. In the case for relays possessing Ed25519 identity keys (cf. Tor proposal #220), this is a 32-byte string representing the public Ed25519 identity key. For backwards and forwards compatibility with routers which do not possess Ed25519 identity keys, this is a 32-byte string created via the output of H(ID). We refer to the router as the handshake "responder", and the client (which may be an OR or an OP) as the "initiator". ID_LENGTH [32 bytes] H_LENGTH [32 bytes] G_LENGTH [32 bytes] PROTOID := "pqtor-x25519-newhope-shake256-1" T_MAC := PROTOID || ":mac" T_KEY := PROTOID || ":key_extract" T_VERIFY := PROTOID || ":verify" (X25519_SK, X25519_PK) := X25519_KEYGEN() §3.2. Protocol ======================================================================================== | | | Fig. 1: The NewHope-X25519 Hybrid Handshake. | | | | Before the handshake the Initiator is assumed to know Z, a public X25519 key for | | the Responder, as well as the Responder's ID. | ---------------------------------------------------------------------------------------- | | | Initiator Responder | | | | SEED := H(randombytes(32)) | | x, X := X25519_KEYGEN() | | a, A := NEWHOPE_KEYGEN(SEED) | | CLIENT_HDATA := ID || Z || X || A | | | | --- CLIENT_HDATA ---> | | | | y, Y := X25519_KEYGEN() | | NTOR_KEY, AUTH := NTOR_SHAREDB(X,y,Y,z,Z,ID,B) | | M, NEWHOPE_KEY := NEWHOPE_SHAREDB(A) | | SERVER_HDATA := Y || AUTH || M | | sk := SHAKE-256(NTOR_KEY || NEWHOPE_KEY) | | | | <-- SERVER_HDATA ---- | | | | NTOR_KEY := NTOR_SHAREDA(x, X, Y, Z, ID, AUTH) | | NEWHOPE_KEY := NEWHOPE_SHAREDA(M, a) | | sk := SHAKE-256(NTOR_KEY || NEWHOPE_KEY) | | | ======================================================================================== §3.2.1. The NTor Handshake §3.2.1.1. Prologue Take a router with identity ID. As setup, the router generates a secret key z, and a public onion key Z with: z, Z := X25519_KEYGEN() The router publishes Z in its server descriptor in the "ntor-onion-key" entry. Henceforward, we refer to this router as the "responder". §3.2.1.2. Initiator To send a create cell, the initiator generates a keypair: x, X := X25519_KEYGEN() and creates the NTor portion of a CREATE2V cell's HDATA section: CLIENT_NTOR := ID || Z || X [96 bytes] The initiator includes the responder's ID and Z in the CLIENT_NTOR so that, in the event the responder OR has recently rotated keys, the responder can determine which keypair to use. The initiator then concatenates CLIENT_NTOR with CLIENT_NEWHOPE (see §3.2.2), to create CLIENT_HDATA, and creates and sends a CREATE2V cell (see §A.1) to the responder. CLIENT_NEWHOPE [1824 bytes] (see §3.2.2) CLIENT_HDATA := CLIENT_NTOR || CLIENT_NEWHOPE [1920 bytes] If the responder does not respond with a CREATED2V cell, the initiator SHOULD NOT attempt to extend the circuit through the responder by sending fragmented EXTEND2 cells, since the responder's lack of support for CREATE2V cells is assumed to imply the responder also lacks support for fragmented EXTEND2 cells. Alternatively, for initiators with a sufficiently late consensus method, the initiator MUST check that "proto" line in the responder's descriptor (cf. Tor proposal #264) advertises support for the "Relay" subprotocol version 3 (see §5). §3.2.1.3. Responder The responder generates a keypair of y, Y = X25519_KEYGEN(), and does NTOR_SHAREDB() as follows: (NTOR_KEY, AUTH) ← NTOR_SHAREDB(X, y, Y, z, Z, ID, B): secret_input := EXP(X, y) || EXP(X, z) || ID || B || Z || Y || PROTOID NTOR_KEY := H(secret_input, T_KEY) verify := H(secret_input, T_VERIFY) auth_input := verify || ID || Z || Y || X || PROTOID || "Server" AUTH := H(auth_input, T_MAC) The responder sends a CREATED2V cell containing: SERVER_NTOR := Y || AUTH [64 bytes] SERVER_NEWHOPE [2048 bytes] (see §3.2.2) SERVER_HDATA := SERVER_NTOR || SERVER_NEWHOPE [2112 bytes] and sends this to the initiator. §3.2.1.4. Finalisation The initiator then checks Y is in G^* [see NOTE below], and does NTOR_SHAREDA() as follows: (NTOR_KEY) ← NTOR_SHAREDA(x, X, Y, Z, ID, AUTH) secret_input := EXP(Y, x) || EXP(Z, x) || ID || Z || X || Y || PROTOID NTOR_KEY := H(secret_input, T_KEY) verify := H(secret_input, T_VERIFY) auth_input := verify || ID || Z || Y || X || PROTOID || "Server" if AUTH == H(auth_input, T_MAC) return NTOR_KEY Both parties now have a shared value for NTOR_KEY. They expand this into the keys needed for the Tor relay protocol. [XXX We think we want to omit the final hashing in the production of NTOR_KEY here, and instead put all the inputs through SHAKE-256. --isis, peter] [XXX We probably want to remove ID and B from the input to the shared key material, since they serve for authentication but, as pre-established "prologue" material to the handshake, they should not be used in attempts to strengthen the cryptographic suitability of the shared key. Also, their inclusion is implicit in the DH exponentiations. I should probably ask Ian about the reasoning for the original design choice. --isis] §3.2.2. The NewHope Handshake §3.2.2.1. Parameters & Mathematical Structures Let ℤ be the ring of rational integers. Let ℤq, for q ≥ 1, denote the quotient ring ℤ/qℤ. We define R = ℤ[X]/((X^n)+1) as the ring of integer polynomials modulo ((X^n)+1), and Rq = ℤq[X]/((X^n)+1) as the ring of integer polynomials modulo ((X^n)+1) where each coefficient is reduced modulo q. When we refer to a polynomial, we mean an element of Rq. n := 1024 q := 12289 SEED [32 Bytes] NEWHOPE_POLY [1792 Bytes] NEWHOPE_REC [256 Bytes] NEWHOPE_KEY [32 Bytes] NEWHOPE_MSGA := (NEWHOPE_POLY || SEED) NEWHOPE_MSGB := (NEWHOPE_POLY || NEWHOPE_REC) §3.2.2.2. High-level Description of Newhope API Functions For a description of internal functions, see §B. (NEWHOPE_POLY, NEWHOPE_MSGA) ← NEWHOPE_KEYGEN(SEED): â := gen_a(seed) s := poly_getnoise() e := poly_getnoise() ŝ := poly_ntt(s) ê := poly_ntt(e) b̂ := pointwise(â, ŝ) + ê sp := poly_tobytes(ŝ) bp := poly_tobytes(b̂) return (sp, (bp || seed)) (NEWHOPE_MSGB, NEWHOPE_KEY) ← NEWHOPE_SHAREDB(NEWHOPE_MSGA): s' := poly_getnoise() e' := poly_getnoise() e" := poly_getnoise() b̂ := poly_frombytes(bp) â := gen_a(seed) ŝ' := poly_ntt(s') ê' := poly_ntt(e') û := poly_pointwise(â, ŝ') + ê' v := poly_invntt(poly_pointwise(b̂,ŝ')) + e" r := helprec(v) up := poly_tobytes(û) k := rec(v, r) return ((up || r), k) NEWHOPE_KEY ← NEWHOPE_SHAREDA(NEWHOPE_MSGB, NEWHOPE_POLY): û := poly_frombytes(up) ŝ := poly_frombytes(sp) v' := poly_invntt(poly_pointwise(û, ŝ)) k := rec(v', r) return k When a client uses a SEED within a CREATE2V cell, the client SHOULD NOT use that SEED in any other CREATE2V or EXTEND2 cells. See §4 for further discussion. §3.3. Key Expansion The client and server derive a shared key, SHARED, by: HKDFID := "THESE ARENT THE DROIDS YOURE LOOKING FOR" SHARED := SHAKE_256(HKDFID || NTorKey || NewHopeKey) §3.3.1. Note on the Design Choice The reader may wonder why one would use SHAKE-256 to produce a 256-bit output, since the security strength in bits for SHAKE-256 is min(d/2,256) for collision resistance and min(d,256) for first- and second-order preimages, where d is the output length. The reasoning is that we should be aiming for 256-bit security for all of our symmetric cryptography. One could then argue that we should just use SHA3-256 for the KDF. We choose SHAKE-256 instead in order to provide an easy way to derive longer shared secrets in the future without requiring a new handshake. The construction is odd, but the future is bright. As we are already using SHAKE-256 for the 32-byte output hash, we are also using it for all other 32-byte hashes involved in the protocol. Note that the only difference between SHA3-256 and SHAKE-256 with 32-byte output is one domain-separation byte. [XXX why would you want 256-bit security for the symmetric side? Are you talking pre- or post-quantum security? --peter] §4. Security & Anonymity Implications This handshake protocol is one-way authenticated. That is, the server is authenticated, while the client remains anonymous. The client MUST NOT cache and reuse SEED. Doing so gives non-trivial adversarial advantages w.r.t. all-for-the-price-of-one attacks during the caching period. More importantly, if the SEED used to generate NEWHOPE_MSGA is reused for handshakes along the same circuit or multiple different circuits, an adversary conducting a sybil attack somewhere along the path(s) will be able to correlate the identity of the client across circuits or hops. §5. Compatibility Because our proposal requires both the client and server to send more than the 505 bytes possible within a CREATE2 cell's HDATA section, it depends upon the implementation of a mechanism for allowing larger CREATE cells (cf. Tor proposal #249). We reserve the following handshake type for use in CREATE2V/CREATED2V and EXTEND2V/EXTENDED2V cells: 0x0003 [NEWHOPE + X25519 HYBRID HANDSHAKE] We introduce a new sub-protocol number, "Relay=3", (cf. Tor proposal #264 §5.3) to signify support this handshake, and hence for the CREATE2V and fragmented EXTEND2 cells which it requires. There are no additional entries or changes required within either router descriptors or microdescriptors to support this handshake method, due to the NewHope keys being ephemeral and derived on-the-fly, and due to the NTor X25519 public keys already being included within the "ntor-onion-key" entry. Add a "UseNewHopeKEX" configuration option and a corresponding consensus parameter to control whether clients prefer using this NewHope hybrid handshake or some previous handshake protocol. If the configuration option is "auto", clients SHOULD obey the consensus parameter. The default configuration SHOULD be "auto" and the consensus value SHOULD initially be "0". §6. Implementation The paper by Alkim, Ducas, Pöppelmann and Schwabe describes two software implementations of NewHope, one C reference implementation and an optimized implementation using AVX2 vector instructions. Those implementations are available at [1]. Additionally, there are implementations in Go by Yawning Angel, available from [4] and in Rust by Isis Lovecruft, available from [5]. The software used to generate the test vectors in §C is based on the C reference implementation and available from: https://code.ciph.re/isis/newhope-tor-testvectors https://github.com/isislovecruft/newhope-tor-testvectors §7. Performance & Scalability The computationally expensive part in the current NTor handshake is the X25519 key-pair generation and the X25519 shared-key computation. The current implementation in Tor is a wrapper to support various highly optimized implementations on different architectures. On Intel Haswell processors, the fastest implementation of X25519, as reported by the eBACS benchmarking project [6], takes 169920 cycles for key-pair generation and 161648 cycles for shared-key computation; these add up to a total of 331568 cycles on each side (initiator and responder). The C reference implementation of NewHope, also benchmarked on Intel Haswell, takes 358234 cycles for the initiator and 402058 cycles for the Responder. The core computation of the proposed combination of NewHope and X25519 will thus mean a slowdown of about a factor of 2.1 for the Initiator and a slowdown by a factor of 2.2 for the Responder compared to the current NTor handshake. These numbers assume a fully optimized implementation of the NTor handshake and a C reference implementation of NewHope. With optimized implementations of NewHope, such as the one for Intel Haswell described in [0], the computational slowdown will be considerably smaller than a factor of 2. §8. References [0]: https://cryptojedi.org/papers/newhope-20160328.pdf [1]: https://cryptojedi.org/crypto/#newhope [2]: http://www.nist.gov/customcf/get_pdf.cfm?pub_id=919061 [3]: https://tools.ietf.org/html/rfc7748#section-6.1 [4]: https://github.com/Yawning/newhope [5]: https://code.ciph.re/isis/newhopers [6]: http://bench.cr.yp.to §A. Cell Formats §A.1. CREATE2V Cells The client portion of the handshake should send CLIENT_HDATA, formatted into a CREATE2V cell as follows: CREATE2V { [2114 bytes] HTYPE := 0x0003 [2 bytes] HLEN := 0x0780 [2 bytes] HDATA := CLIENT_HDATA [1920 bytes] IGNORED := 0x00 [194 bytes] } [XXX do we really want to pad with IGNORED to make CLIENT_HDATA the same number of bytes as SERVER_HDATA? --isis] §A.2. CREATED2V Cells The server responds to the client's CREATE2V cell with SERVER_HDATA, formatted into a CREATED2V cell as follows: CREATED2V { [2114 bytes] HLEN := 0x0800 [2 bytes] HDATA := SERVER_HDATA [2112 bytes] IGNORED := 0x00 [0 bytes] } §A.3. Fragmented EXTEND2 Cells When the client wishes to extend a circuit, the client should fragment CLIENT_HDATA into four EXTEND2 cells: EXTEND2 { NSPEC := 0x02 { [1 byte] LINK_ID_SERVER [22 bytes] XXX LINK_ADDRESS_SERVER [8 bytes] XXX } HTYPE := 0x0003 [2 bytes] HLEN := 0x0780 [2 bytes] HDATA := CLIENT_HDATA[0,461] [462 bytes] } EXTEND2 { NSPEC := 0x00 [1 byte] HTYPE := 0xFFFF [2 bytes] HLEN := 0x0000 [2 bytes] HDATA := CLIENT_HDATA[462,954] [492 bytes] } EXTEND2 { NSPEC := 0x00 [1 byte] HTYPE := 0xFFFF [2 bytes] HLEN := 0x0000 [2 bytes] HDATA := CLIENT_HDATA[955,1447] [492 bytes] } EXTEND2 { NSPEC := 0x00 [1 byte] HTYPE := 0xFFFF [2 bytes] HLEN := 0x0000 [2 bytes] HDATA := CLIENT_HDATA[1448,1919] || 0x00[20] [492 bytes] } EXTEND2 { NSPEC := 0x00 [1 byte] HTYPE := 0xFFFF [2 bytes] HLEN := 0x0000 [2 bytes] HDATA := 0x00[172] [172 bytes] } The client sends this to the server to extend the circuit from, and that server should format the fragmented EXTEND2 cells into a CREATE2V cell, as described in §A.1. §A.4. Fragmented EXTENDED2 Cells EXTENDED2 { NSPEC := 0x02 { [1 byte] LINK_ID_SERVER [22 bytes] XXX LINK_ADDRESS_SERVER [8 bytes] XXX } HTYPE := 0x0003 [2 bytes] HLEN := 0x0800 [2 bytes] HDATA := SERVER_HDATA[0,461] [462 bytes] } EXTENDED2 { NSPEC := 0x00 [1 byte] HTYPE := 0xFFFF [2 bytes] HLEN := 0x0000 [2 bytes] HDATA := SERVER_HDATA[462,954] [492 bytes] } EXTEND2 { NSPEC := 0x00 [1 byte] HTYPE := 0xFFFF [2 bytes] HLEN := 0x0000 [2 bytes] HDATA := SERVER_HDATA[955,1447] [492 bytes] } EXTEND2 { NSPEC := 0x00 [1 byte] HTYPE := 0xFFFF [2 bytes] HLEN := 0x0000 [2 bytes] HDATA := SERVER_HDATA[1448,1939] [492 bytes] } EXTEND2 { NSPEC := 0x00 [1 byte] HTYPE := 0xFFFF [2 bytes] HLEN := 0x0000 [2 bytes] HDATA := SERVER_HDATA[1940,2112] [172 bytes] } §B. NewHope Internal Functions gen_a(SEED): returns a uniformly random poly poly_getnoise(): returns a poly sampled from a centered binomial poly_ntt(poly): number-theoretic transform; returns a poly poly_invntt(poly): inverse number-theoretic transform; returns a poly poly_pointwise(poly, poly): pointwise multiplication; returns a poly poly_tobytes(poly): packs a poly to a NEWHOPE_POLY byte array poly_frombytes(NEWHOPE_POLY): unpacks a NEWHOPE_POLY byte array to a poly helprec(poly): returns a NEWHOPE_REC byte array rec(poly, NEWHOPE_REC): returns a NEWHOPE_KEY --- Description of the Newhope internal functions --- gen_a(SEED seed) receives as input a 32-byte (public) seed. It expands this seed through SHAKE-128 from the FIPS202 standard. The output of SHAKE-128 is considered a sequence of 16-bit little-endian integers. This sequence is used to initialize the coefficients of the returned polynomial from the least significant (coefficient of X^0) to the most significant (coefficient of X^1023) coefficient. For each of the 16-bit integers first eliminate the highest two bits (to make it a 14-bit integer) and then use it as the next coefficient if it is smaller than q=12289. Note that the amount of output required from SHAKE to initialize all 1024 coefficients of the polynomial varies depending on the input seed. Note further that this function does not process any secret data and thus does not need any timing-attack protection. poly_getnoise() first generates 4096 bytes of uniformly random data. This can be done by reading these bytes from the system's RNG; efficient implementations will typically only read a 32-byte seed from the system's RNG and expand it through some fast PRG (for example, ChaCha20 or AES-256 in CTR mode). The output of the PRG is considered an array of 2048 16-bit integers r[0],...,r[2047]. The coefficients of the output polynomial are computed as HW(r[0])-HW(r[1]), HW(r[2])-HW(r[3]),...,HW(r[2046])-HW(r[2047]), where HW stands for Hamming weight. Note that the choice of RNG is a local decision; different implementations are free to use different RNGs. Note further that the output of this function is secret; the PRG (and the computation of HW) need to be protected against timing attacks. poly_ntt(poly f): For a mathematical description of poly_ntt see the [0]; a pseudocode description of a very naive in-place transformation of an input polynomial f = f[0] + f[1]*X + f[2]*X^2 + ... + f[1023]*X^1023 is the following code (all arithmetic on coefficients performed modulo q): psi = 7 omega = 49 for i in range(0,n): t[i] = f[i] * psi^i for i in range(0,n): f[i] = 0 for j in range(0,n): f[i] += t[j] * omega^((i*j)%n) Note that this is not how poly_ntt should be implemented if performance is an issue; in particular, efficient algorithms for the number-theoretic transform take time O(n*log(n)) and not O(n^2) Note further that all arithmetic in poly_ntt has to be protected against timing attacks. poly_invntt(poly f): For a mathematical description of poly_invntt see the [0]; a pseudocode description of a very naive in-place transformation of an input polynomial f = f[0] + f[1]*X + f[2]*X^2 + ... + f[1023]*X^1023 is the following code (all arithmetic on coefficients performed modulo q): invpsi = 8778; invomega = 1254; invn = 12277; for i in range(0,n): t[i] = f[i]; for i in range(0,n): f[i]=0; for j in range(0,n): f[i] += t[j] * invomega^((i*j)%n) f[i] *= invpsi^i f[i] *= invn Note that this is not how poly_invntt should be implemented if performance is an issue; in particular, efficient algorithms for the inverse number-theoretic transform take time O(n*log(n)) and not O(n^2) Note further that all arithmetic in poly_invntt has to be protected against timing attacks. poly_pointwise(poly f, poly g) performs pointwise multiplication of the two polynomials. This means that for f = (f0 + f1*X + f2*X^2 + ... + f1023*X^1023) and g = (g0 + g1*X + g2*X^2 + ... + g1023*X^1023) it computes and returns h = (h0 + h1*X + h2*X^2 + ... + h1023*X^1023) with h0 = f0*g0, h1 = f1*g1,..., h1023 = f1023*g1023. poly_tobytes(poly f) first reduces all coefficents of f modulo q, i.e., brings them to the interval [0,q-1]. Denote these reduced coefficients as f0,..., f1023; note that they all fit into 14 bits. The function then packs those coefficients into an array of 1792 bytes r[0],..., r[1792] in "packed little-endian representation", i.e., r[0] = f[0] & 0xff; r[1] = (f[0] >> 8) & ((f[1] & 0x03) << 6) r[2] = (f[1] >> 2) & 0xff; r[3] = (f[1] >> 10) & ((f[2] & 0x0f) << 4) . . . r[1790] = (f[1022]) >> 12) & ((f[1023] & 0x3f) << 2) r[1791] = f[1023] >> 6 Note that this function needs to be protected against timing attacks. In particular, avoid non-constant-time conditional subtractions (or other non-constant-time expressions) in the reduction modulo q of the coefficients. poly_frombytes(NEWHOPE_POLY b) is the inverse of poly_tobytes; it receives as input an array of 1792 bytes and coverts it into the internal representation of a poly. Note that poly_frombytes does not need to check whether the coefficients are reduced modulo q or reduce coefficients modulo q. Note further that the function must not leak any information about its inputs through timing information, as it is also applied to the secret key of the initiator. helprec(poly f) computes 256 bytes of reconciliation information from the input poly f. Internally, one byte of reconciliation information is computed from four coefficients of f by a function helprec4. Let the input polynomial f = (f0 + f1*X + f2*X^2 + ... + f1023*X^1023); let the output byte array be r[0],...r[256]. This output byte array is computed as r[0] = helprec4(f0,f256,f512,f768) r[1] = helprec4(f1,f257,f513,f769) r[2] = helprec4(f2,f258,f514,f770) . . . r[255] = helprec4(f255,f511,f767,f1023), where helprec4 does the following: helprec4(x0,x1,x2,x3): b = randombit() r0,r1,r2,r3 = CVPD4(8*x0+4*b,8*x1+4*b,8*x2+4*b,8*x3+4*b) r = (r0 & 0x03) | ((r1 & 0x03) << 2) | ((r2 & 0x03) << 4) | ((r3 & 0x03) << 6) return r The function CVPD4 does the following: CVPD4(y0,y1,y2,y3): v00 = round(y0/2q) v01 = round(y1/2q) v02 = round(y2/2q) v03 = round(y3/2q) v10 = round((y0-1)/2q) v11 = round((y1-1)/2q) v12 = round((y2-1)/2q) v13 = round((y3-1)/2q) t = abs(y0 - 2q*v00) t += abs(y1 - 2q*v01) t += abs(y2 - 2q*v02) t += abs(y3 - 2q*v03) if(t < 2q): v0 = v00 v1 = v01 v2 = v02 v3 = v03 k = 0 else v0 = v10 v1 = v11 v2 = v12 v3 = v13 r = 1 return (v0-v3,v1-v3,v2-v3,k+2*v3) In this description, round(x) is defined as ⌊x + 0.5⌋, where ⌊x⌋ rounds to the largest integer that does not exceed x; abs() returns the absolute value. Note that all computations involved in helprec operate on secret data and must be protected against timing attacks. rec(poly f, NEWHOPE_REC r) computes the pre-hash (see paper) Newhope key from f and r. Specifically, it computes one bit of key from 4 coefficients of f and one byte of r. Let f = f0 + f1*X + f2*X^2 + ... + f1023*X^1023 and let r = r[0],r[1],...,r[255]. Let the bytes of the output by k[0],...,k[31] and let the bits of the output by k0,...,k255, where k0 = k[0] & 0x01 k1 = (k[0] >> 1) & 0x01 k2 = (k[0] >> 2) & 0x01 . . . k8 = k[1] & 0x01 k9 = (k[1] >> 1) & 0x01 . . . k255 = (k[32] >> 7) The function rec computes k0,...,k255 as k0 = rec4(f0,f256,f512,f768,r[0]) k1 = rec4(f1,f257,f513,f769,r[1]) . . . k255 = rec4(f255,f511,f767,f1023,r[255]) The function rec4 does the following: rec4(y0,y1,y2,y3,r): r0 = r & 0x03 r1 = (r >> 2) & 0x03 r2 = (r >> 4) & 0x03 r3 = (r >> 6) & 0x03 Decode(8*y0-2q*r0, 8*y1-2q*r1, 8*y2-2q*r2, 8*y3-q*r3) The function Decode does the following: Decode(v0,v1,v2,v3): t0 = round(v0/8q) t1 = round(v1/8q) t2 = round(v2/8q) t3 = round(v3/8q) t = abs(v0 - 8q*t0) t += abs(v0 - 8q*t0) t += abs(v0 - 8q*t0) t += abs(v0 - 8q*t0) if(t > 1) return 1 else return 0 §C. Test Vectors