torspec/bandwidth-file-spec.txt

413 lines
13 KiB
Plaintext

Tor Bandwidth List Format
juga
teor
1. Scope and preliminaries
This document describes the format of Tor's Bandwidth List,
version 1.0.0, 1.1.0 and later.
It is new specification for the existing format 1.0.0.
Describes a new format 1.1.0, which is backwards compatible with
1.0.0 parsers.
Since Tor version 0.2.4.12-alpha the directory authorities use
the Bandwidth List file called "V3BandwidthsFile" generated by
Torflow [1]. The format is described in Torflow's README.spec.txt and
is considered to be version 1.0.0.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
1.2. Acknowledgements
The original bandwidth generator (Torflow) and format was
created by mike. Teor suggested to write this specification while
contributing on pastly's new bandwidth generator implementation.
This specification was revised after feedback from:
Nick Mathewson (nickm)
Iain Learmonth (irl)
1.3 Outline
The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1
and 3.4.2, use the term bandwidth measurements, to refer to what
here is called Bandwidth List.
A Bandwidth List file contains information on relays' bandwidth
capacities and is produced by bandwidth generators, previously known
as bandwidth scanners.
1.4. Format Versions
1.0.0 - The legacy fallback Bandwidth List format
1.1.0 - Adds KeyValue Lines to the Header List section, add KeyValues
to RelayLines and format versions.
All Tor versions can consume format version 1.0.0.
All Tor versions can consume format version 1.1.0,
but they warn on additional header Lines.
[TODO: this might be fixed, and if it is fixed should be said which
version of Tor]
2. Format details
The Bandwidth List MUST contain the following sections:
- Header List (exactly once)
- Relays' Bandwidth List (zero or more times)
If it does not contain these sections, parsers SHOULD ignore the file.
2.1. Definitions
The following nonterminals are defined in Tor directory protocol
sections 1.2., 2.1.1., 2.1.3.:
Int
SP (space)
NL (newline)
Keyword
ArgumentChar
nickname
hexdigest (a '$', followed by 40 hexadecimal characters
([A-Fa-f0-9]))
Nonterminal defined section 2 of version-spec.txt [4]:
version_number
We define the following nonterminals:
Line ::= ArgumentChar* NL
RelayLine ::= KeyValue (SP KeyValue)* NL
KeyValue ::= Keyword "=" Value
Value ::= ArgumentCharValue+
ArgumentCharValue ::= any printing ASCII character except NL and SP.
Terminator ::= "====="
Timestamp ::= Int
Bandwidth ::= Int
MasterKey ::= a base64-encoded Ed25519 public key, with
padding characters omitted.
DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601
Note that key_value and value are defined in Tor directory protocol
with different formats to KeyValue and Value here.
All Lines in the file MUST be 510 characters or less, to allow for the
trailing newline and NULL characters.
The previous limit was 254 characters in Tor 0.2.6.2-alpha and
earlier.
The parser MAY ignore longer Lines.
[TODO: Change this restriction in 1.1.0 or later]
2.2. Header List format
Some header Lines MUST appear in specific positions, as documented
below.
All other Lines can appear in any order.
If a parser does not recognize any extra material in a header Line,
the Line MUST be ignored.
If a header Line does not conform to this format, the Line SHOULD be
ignored by parsers.
It consists of:
Timestamp NL
[At start, exactly once.]
The Unix Epoch time in seconds when the file was created.
It does not follow the KeyValue format for backwards
compatibility with version 1.0.0.
"version=" version_number NL
[In second position, zero or one time.]
The specification document format version.
It uses semantic versioning [5].
This Line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this Line, and the
version_number is considered to be "1.0.0".
"software=" Value NL
[Zero or one time.]
The name of the software that created the document.
This Line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this Line, and the software
is considered to be "torflow".
"software_version=" Value NL
[Zero or one time.]
The version of the software that created the document.
The version may be a version_number, a git commit, or some other
version scheme.
This Line has been added in version 1.1.0 of this specification.
"generator_started=" DateTime NL
[Zero or one time.]
The date and time timestamp in ISO 8601 format and UTC time zone
when the generator started.
This Line has been added in version 1.1.0 of this specification.
"earliest_bandwidth=" DateTime NL
[Zero or one time.]
The date and time timestamp in ISO 8601 format and UTC time zone
when the first relay bandwidth was obtained.
This Line has been added in version 1.1.0 of this specification.
KeyValue NL
[Zero or more times.]
There MUST NOT be multiple KeyValue header Lines with the same key.
If there are, the parser SHOULD choose an arbitrary Line.
If a parser does not recognize a Keyword in a KeyValue Line, it
MUST be ignored.
Future format versions may include additional KeyValue header Lines.
Additional header Lines will be accompanied by a minor version
increment.
Implementations MAY add additional header Lines as needed. This
specification SHOULD be updated to avoid conflicting meanings for
the same header keys.
Parsers MUST NOT rely on the order of these additional Lines.
Additional header Lines MUST NOT use any keywords specified in the
relay measurements format.
If there are, the parser MAY ignore conflicting keywords.
Terminator NL
[Zero or one time.]
The Header List section ends with this Terminator.
In version 1.0.0, Header List ends when the first relay bandwidth
is found conforming to the next section.
Implementations of version 1.1.0 SHOULD include this Line.
2.3. Relays' Bandwidth List format
It consists of zero or more RelayLines with the relays' bandwidth
in arbitrary order.
There MUST NOT be multiple KeyValue pairs with the same key in the same
RelayLine.
If there are, the parser SHOULD choose an arbitrary Value.
There MUST NOT be multiple RelayLine per relay identity (node_id or
master_key_ed25519).
If there are, parsers SHOULD issue a warning and MAY choose an arbitrary
value or ignore both values.
If a parser does not recognize any extra material in a RelayLine,
the extra material MUST be ignored.
Each RelayLine MUST include the following KeyValue pairs:
In version 1.0.0, node_id MUST NOT be at the end of the Line.
In version 1.1.0, the KeyValue can be in any arbitrary order.
[TODO: list of Tor version that support it, when it's done]
"node_id=" hexdigest
[Exactly once.]
The fingerprint for the relay's RSA identity key.
"master_key_ed25519=" MasterKey
[Zero or one time.]
The relays's master Ed25519 key, base64 encoded,
without trailing "="s, to avoid ambiguity with KeyValue "="
character.
Implementations of version 1.1.0 SHOULD include both node_id and
master_key_ed25519.
Parsers SHOULD accept Lines that contain at least one of them.
"bw=" Bandwidth
[Exactly once.]
The measured bandwidth of this relay.
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, implementations SHOULD NOT produce zero
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
If there are zero bandwidths, the parser MAY ignore them.
Multiple measurements can be aggregated using an averaging scheme,
such as a mean, median, or decaying average.
Torflow scales bandwidths to kilobytes per second. Other
implementations SHOULD use kilobytes per second for their initial
bandwidth scaling.
If different implementations or configurations are used in votes for
the same network, their measurements MAY need further scaling. See
Appendix B for information about scaling, and one possible scaling
method.
KeyValue
[Zero or more times.]
Future format versions may include additional KeyValue pairs on a
RelayLine.
Additional KeyValue pairs will be accompanied by a minor version
increment.
Implementations MAY add additional relay KeyValue pairs as needed.
This specification SHOULD be updated to avoid conflicting meanings
for the same Keywords.
Parsers MUST NOT rely on the order of these additional KeyValue
pairs.
Additional KeyValue pairs MUST NOT use any keywords specified in the
header format.
If there are, the parser MAY ignore conflicting keywords.
2.4. Implementation notes
KeyValue pairs in RelayLines that current implementations generate.
2.4.1. Simple Bandwidth Scanner
Every RelayLine in sbws version 0.1.0 consists of:
"node_id=" hexdigest SP
As above.
"bw=" Bandwidth SP
As above.
"nick=" nickname SP
[Exactly once.]
The relay nickname.
"rtt=" Int SP
[Exactly once.]
The Round Trip Time in milliseconds to obtain 1 byte of data.
"time=" DateTime NL
[Exactly once.]
The date and time timestamp in ISO 8601 format and UTC time zone
when the last bandwidth was obtained.
2.4.2. Torflow
Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].
References:
1. https://gitweb.torproject.org/torflow.git
2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt
5. https://semver.org/
A. Sample data
The following has not been obtained from any real measurement.
A.1. Generated by Torflow
This an example version 1.0.0 document:
1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
A.2. Generated by sbws version 0.1.X
[TODO: this needs to be implemented when this spec is finished]
1523911758
version=1.1.0
software=sbws
software_version=0.1.0
generator_started=2018-05-08T16:13:25
earliest_bandwidth=2018-05-08T16:13:26
====
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ bw=760 nick=Test rtt=380 time=2018-05-08T16:13:26
node_id=$96C15995F30895689291F455587BD94CA427B6FC master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I bw=189 nick=Test2 rtt=378 time=2018-05-08T16:13:36
B. Scaling bandwidths
B.1. Scaling requirements
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, scaling methods SHOULD perform the
following checks:
* If the total bandwidth is zero, all relays should be given equal
bandwidths.
* If the scaled bandwidth is zero, it should be rounded up to one.
Initial experiments indicate that scaling may not be needed for
torflow and sbws, because their measured bandwidths are similar
enough already.
B.2. A linear scaling method
If scaling is required, here is a simple linear bandwith scaling
method, which ensures that all bandwidth votes contain approximately
the same total bandwidth:
1. Calculate the relay quota by dividing the total measured bandwidth
in all votes, by the number of relays with measured bandwidth
votes. In the public tor network, this is approximately 7500 as of
April 2018. The quota should be a consensus parameter, so it can be
adjusted for all generators on the network.
2. Calculate a vote quota by multiplying the relay quota by the number
of relays this bandwidth authority has measured
bandwidths for.
3. Calculate a scaling factor by dividing the vote quota by the
total unscaled measured bandwidth in this bandwidth
authority's upcoming vote.
4. Multiply each unscaled measured bandwidth by the scaling
factor.
Now, the total scaled bandwidth in the upcoming vote is
approximately equal to the quota.
B.3. Quota changes
If all generators are using scaling, the quota can be gradually
reduced or increased as needed. Smaller quotas decrease the size
of uncompressed consensuses, and may decrease the size of
consensus diffs and compressed consensuses. But if the relay
quota is too small, some relays may be over- or under-weighted.