mirror of
https://github.com/torproject/torspec.git
synced 2025-01-08 16:40:14 +00:00
a3fd193023
The clear standard is trailing "." after each numeric section. This fixes the small handful of outliers. This makes it easy to convert these headers to common markup formats, for example: http://hyperpolyglot.org/lightweight-markup
1158 lines
40 KiB
Plaintext
1158 lines
40 KiB
Plaintext
|
|
Tor Bandwidth File Format
|
|
juga
|
|
teor
|
|
|
|
1. Scope and preliminaries
|
|
|
|
This document describes the format of Tor's Bandwidth File, version
|
|
1.0.0 and later.
|
|
|
|
It is a new specification for the existing bandwidth file format,
|
|
which we call version 1.0.0. It also specifies new format versions
|
|
1.1.0 and later, which are backwards compatible with 1.0.0 parsers.
|
|
|
|
Since Tor version 0.2.4.12-alpha, the directory authorities use
|
|
the Bandwidth File file called "V3BandwidthsFile" generated by
|
|
Torflow [1]. The details of this format are described in Torflow's
|
|
README.spec.txt. We also summarise the format in this specification.
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
|
|
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
|
|
"OPTIONAL" in this document are to be interpreted as described in
|
|
RFC 2119.
|
|
|
|
1.2. Acknowledgements
|
|
|
|
The original bandwidth generator (Torflow) and format was
|
|
created by mike. Teor suggested to write this specification while
|
|
contributing on pastly's new bandwidth generator implementation.
|
|
|
|
This specification was revised after feedback from:
|
|
|
|
Nick Mathewson (nickm)
|
|
Iain Learmonth (irl)
|
|
|
|
1.3. Outline
|
|
|
|
The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1
|
|
and 3.4.2, use the term bandwidth measurements, to refer to what
|
|
here is called Bandwidth File.
|
|
|
|
A Bandwidth File contains information on relays' bandwidth
|
|
capacities and is produced by bandwidth generators, previously known
|
|
as bandwidth scanners.
|
|
|
|
1.4. Format Versions
|
|
|
|
1.0.0 - The legacy Bandwidth File format
|
|
|
|
1.1.0 - Add a header containing information about the bandwidth
|
|
file. Document the sbws and Torflow relay line keys.
|
|
|
|
1.2.0 - If there are not enough eligible relays, the bandwidth file
|
|
SHOULD contain a header, but no relays. (To match Torflow's
|
|
existing behaviour.)
|
|
|
|
Adds new KeyValue Lines to the Header List section with
|
|
statistics about the number of relays included in the file.
|
|
Add new KeyValues to Relay Bandwidth Lines, with different
|
|
bandwidth values (averages and descriptor bandwidths).
|
|
|
|
1.3.0 - Adds scanner and destination countries to the header.
|
|
|
|
1.4.0 - Adds monitoring KeyValues to the header and relay lines.
|
|
|
|
RelayLines for excluded relays MAY be present in the bandwidth
|
|
file for diagnostic reasons. Similarly, if there are not enough
|
|
eligible relays, the bandwidth file MAY contain all known relays.
|
|
|
|
Diagnostic relay lines SHOULD be marked with vote=0, and
|
|
Tor SHOULD NOT use their bandwidths in its votes.
|
|
|
|
All Tor versions can consume format version 1.0.0.
|
|
|
|
All Tor versions can consume format version 1.1.0 and later,
|
|
but Tor versions earlier than 0.3.5.1-alpha warn if the header
|
|
contains any KeyValue lines after the Timestamp.
|
|
|
|
Tor versions 0.4.0.3-alpha, 0.3.5.8, 0.3.4.11, and earlier do not
|
|
understand "vote=0". Instead, they will vote for the actual bandwidths
|
|
that sbws puts in diagnostic relay lines:
|
|
* 1 for relays with "unmeasured=1", and
|
|
* the relay's measured and scaled bandwidth when "under_min_report=1".
|
|
|
|
2. Format details
|
|
|
|
The Bandwidth File MUST contain the following sections:
|
|
- Header List (exactly once), which is a partially ordered list of
|
|
- Header Lines (one or more times), then
|
|
- Relay Lines (zero or more times), in an arbitrary order.
|
|
If it does not contain these sections, parsers SHOULD ignore the file.
|
|
|
|
2.1. Definitions
|
|
|
|
The following nonterminals are defined in Tor directory protocol
|
|
sections 1.2., 2.1.1., 2.1.3.:
|
|
|
|
bool
|
|
Int
|
|
SP (space)
|
|
NL (newline)
|
|
KeywordChar
|
|
ArgumentChar
|
|
nickname
|
|
hexdigest (a '$', followed by 40 hexadecimal characters
|
|
([A-Fa-f0-9]))
|
|
|
|
Nonterminal defined section 2 of version-spec.txt [4]:
|
|
|
|
version_number
|
|
|
|
We define the following nonterminals:
|
|
|
|
Line ::= ArgumentChar* NL
|
|
RelayLine ::= KeyValue (SP KeyValue)* NL
|
|
HeaderLine ::= KeyValue NL
|
|
KeyValue ::= Key "=" Value
|
|
Key ::= (KeywordChar | "_")+
|
|
Value ::= ArgumentCharValue+
|
|
ArgumentCharValue ::= any printing ASCII character except NL and SP.
|
|
Terminator ::= "=====" or "===="
|
|
Generators SHOULD use a 5-character terminator.
|
|
Timestamp ::= Int
|
|
Bandwidth ::= Int
|
|
MasterKey ::= a base64-encoded Ed25519 public key, with
|
|
padding characters omitted.
|
|
DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601
|
|
CountryCode ::= Two capital ASCII letters ([A-Z]{2}), as defined in
|
|
ISO 3166-1 alpha-2 plus "ZZ" to denote unknown country
|
|
(eg the destination is in a Content Delivery Network).
|
|
CountryCodeList ::= One or more CountryCode(s) separated by a comma
|
|
([A-Z]{2}(,[A-Z]{2})*).
|
|
|
|
Note that key_value and value are defined in Tor directory protocol
|
|
with different formats to KeyValue and Value here.
|
|
|
|
Tor versions earlier than 0.3.5.1-alpha require all lines in the file
|
|
to be 510 characters or less. The previous limit was 254 characters in
|
|
Tor 0.2.6.2-alpha and earlier. Parsers MAY ignore longer Lines.
|
|
|
|
Note that directory authorities are only supported on the two most
|
|
recent stable Tor versions, so we expect that line limits will be
|
|
removed after Tor 0.4.0 is released in 2019.
|
|
|
|
2.2. Header List format
|
|
|
|
It consists of a Timestamp line and zero or more HeaderLines.
|
|
|
|
All the header lines MUST conform to the HeaderLine format, except
|
|
the first Timestamp line.
|
|
|
|
The Timestamp line is not a HeaderLine to keep compatibility with
|
|
the legacy Bandwidth File format.
|
|
|
|
Some header Lines MUST appear in specific positions, as documented
|
|
below. All other Lines can appear in any order.
|
|
|
|
If a parser does not recognize any extra material in a header Line,
|
|
the Line MUST be ignored.
|
|
|
|
If a header Line does not conform to this format, the Line SHOULD be
|
|
ignored by parsers.
|
|
|
|
It consists of:
|
|
|
|
Timestamp NL
|
|
|
|
[At start, exactly once.]
|
|
|
|
The Unix Epoch time in seconds of the most recent generator bandwidth
|
|
result.
|
|
|
|
If the generator implementation has multiple threads or
|
|
subprocesses which can fail independently, it SHOULD take the most
|
|
recent timestamp from each thread and use the oldest value. This
|
|
ensures all the threads continue running.
|
|
|
|
If there are threads that do not run continuously, they SHOULD be
|
|
excluded from the timestamp calculation.
|
|
|
|
If there are no recent results, the generator MUST NOT generate a new
|
|
file.
|
|
|
|
It does not follow the KeyValue format for backwards compatibility
|
|
with version 1.0.0.
|
|
|
|
"version=" version_number NL
|
|
|
|
[In second position, zero or one time.]
|
|
|
|
The specification document format version.
|
|
It uses semantic versioning [5].
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
Version 1.0.0 documents do not contain this Line, and the
|
|
version_number is considered to be "1.0.0".
|
|
|
|
"software=" Value NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The name of the software that created the document.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
Version 1.0.0 documents do not contain this Line, and the software
|
|
is considered to be "torflow".
|
|
|
|
"software_version=" Value NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The version of the software that created the document.
|
|
The version may be a version_number, a git commit, or some other
|
|
version scheme.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"file_created=" DateTime NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
when the file was created.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"generator_started=" DateTime NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
when the generator started.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"earliest_bandwidth=" DateTime NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
when the first relay bandwidth was obtained.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"latest_bandwidth=" DateTime NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
of the most recent generator bandwidth result.
|
|
|
|
This time MUST be identical to the initial Timestamp line.
|
|
|
|
This duplicate value is included to make the format easier for people
|
|
to read.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"number_eligible_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that have enough measurements to be
|
|
included in the bandwidth file.
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"minimum_percent_eligible_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The percentage of relays in the consensus that SHOULD be
|
|
included in every generated bandwidth file.
|
|
|
|
If this threshold is not reached, format versions 1.3.0 and earlier
|
|
SHOULD NOT contain any relays. (Bandwidth files always include a
|
|
header.)
|
|
|
|
Format versions 1.4.0 and later SHOULD include all the relays for
|
|
diagnostic purposes, even if this threshold is not reached. But these
|
|
relays SHOULD be marked so that Tor does not vote on them.
|
|
See section 1.4 for details.
|
|
|
|
The minimum percentage is 60% in Torflow, so sbws uses
|
|
60% as the default.
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"number_consensus_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays in the consensus.
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"percent_eligible_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of eligible relays, as a percentage of the number
|
|
of relays in the consensus.
|
|
|
|
This line SHOULD be equal to:
|
|
(number_eligible_relays * 100.0) / number_consensus_relays
|
|
to the number of relays in the consensus to include in this file.
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"minimum_number_eligible_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The minimum number of relays that SHOULD be included in the bandwidth
|
|
file. See minimum_percent_eligible_relays for details.
|
|
|
|
This line SHOULD be equal to:
|
|
number_consensus_relays * (minimum_percent_eligible_relays / 100.0)
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"scanner_country=" CountryCode NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The country, as in political geolocation, where the generator is run.
|
|
|
|
This Line was added in version 1.3.0 of this specification.
|
|
|
|
"destinations_countries=" CountryCodeList NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The country, as in political geolocation, or countries where the
|
|
destination Web server(s) are located.
|
|
The destination Web Servers serve the data that the generator retrieves
|
|
to measure the bandwidth.
|
|
|
|
This Line was added in version 1.3.0 of this specification.
|
|
|
|
"recent_consensus_count=" Int NL
|
|
|
|
[Zero or one time.].
|
|
|
|
The number of the different consensuses seen in the last data_period
|
|
days. (data_period is 5 by default.)
|
|
|
|
Assuming that Tor clients fetch a consensus every 1-2 hours,
|
|
and that the data_period is 5 days, the Value of this Key SHOULD be
|
|
between:
|
|
data_period * 24 / 2 = 60
|
|
data_period * 24 = 120
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_priority_list_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that a list with a subset of relays prioritized
|
|
to be measured has been created in the last data_period days.
|
|
(data_period is 5 by default.)
|
|
|
|
In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
|
|
approximately:
|
|
data_period * 24 / 1.5 = 80
|
|
Being 1.5 the approximate number of hours it takes to measure a
|
|
priority list of 7000 * 0.05 (350) relays, when the fraction of relays
|
|
in a priority list is the 5% (0.05).
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_priority_relay_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that has been in in the list of relays prioritized
|
|
to be measured in the last data_period days. (data_period is 5 by
|
|
default.)
|
|
|
|
In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
|
|
approximately:
|
|
80 * (7000 * 0.05) = 28000
|
|
Being 0.05 (5%) the fraction of relays in a priority list and 80
|
|
the approximate number of priority lists (see
|
|
"recent_priority_list_count").
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurement_attempt_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that any relay has been queued to be measured
|
|
in the last data_period days. (data_period is 5 by default.)
|
|
|
|
In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
|
|
approximately the same as "recent_priority_relay_count",
|
|
assuming that there is one attempt to measure a relay for each relay that
|
|
has been prioritized unless there are system, network or implementation
|
|
issues.
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurement_failure_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the scanner attempted to measure a relay in
|
|
the last data_period days (5 by default), but the relay has not been
|
|
measured because of system, network or implementation issues.
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_error_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that have no successful measurements in the last
|
|
data_period days (5 by default).
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_near_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that have some successful measurements in the last
|
|
data_period days (5 by default), but all those measurements were
|
|
performed in a period of time that was too short (by default 1 day).
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_old_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that have some successful measurements, but all
|
|
those measurements are too old (more than 5 days, by default).
|
|
|
|
Excludes relays that are already counted in
|
|
recent_measurements_excluded_near_count.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_few_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that don't have enough recent successful
|
|
measurements. (Fewer than 2 measurements in the last 5 days, by
|
|
default).
|
|
|
|
Excludes relays that are already counted in
|
|
recent_measurements_excluded_near_count and
|
|
recent_measurements_excluded_old_count.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"time_to_report_half_network=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The time in seconds that it would take to report measurements about the
|
|
half of the network, given the number of eligible relays and the time
|
|
it took in the last days (5 days, by default).
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
KeyValue NL
|
|
|
|
[Zero or more times.]
|
|
|
|
There MUST NOT be multiple KeyValue header Lines with the same key.
|
|
If there are, the parser SHOULD choose an arbitrary Line.
|
|
|
|
If a parser does not recognize a Keyword in a KeyValue Line, it
|
|
MUST be ignored.
|
|
|
|
Future format versions may include additional KeyValue header Lines.
|
|
Additional header Lines will be accompanied by a minor version
|
|
increment.
|
|
|
|
Implementations MAY add additional header Lines as needed. This
|
|
specification SHOULD be updated to avoid conflicting meanings for
|
|
the same header keys.
|
|
|
|
Parsers MUST NOT rely on the order of these additional Lines.
|
|
|
|
Additional header Lines MUST NOT use any keywords specified in the
|
|
relay measurements format.
|
|
If there are, the parser MAY ignore conflicting keywords.
|
|
|
|
Terminator NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The Header List section ends with a Terminator.
|
|
|
|
In version 1.0.0, Header List ends when the first relay bandwidth
|
|
is found conforming to the next section.
|
|
|
|
Implementations of version 1.1.0 and later SHOULD use a 5-character
|
|
terminator.
|
|
|
|
Tor 0.4.0.1-alpha and later look for a 5-character terminator,
|
|
or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2
|
|
used a 4-character terminator, this bug was fixed in 1.0.3.
|
|
|
|
2.3. Relay Line format
|
|
|
|
It consists of zero or more RelayLines containing relay ids and
|
|
bandwidths. The relays and their KeyValues are in arbitrary order.
|
|
|
|
There MUST NOT be multiple KeyValue pairs with the same key in the same
|
|
RelayLine. If there are, the parser SHOULD choose an arbitrary Value.
|
|
|
|
There MUST NOT be multiple RelayLines per relay identity (node_id or
|
|
master_key_ed25519). If there are, parsers SHOULD issue a warning.
|
|
Parers MAY reject the file, choose an arbitrary RelayLine, or ignore
|
|
both RelayLines.
|
|
|
|
If a parser does not recognize any extra material in a RelayLine,
|
|
the extra material MUST be ignored.
|
|
|
|
Each RelayLine includes the following KeyValue pairs:
|
|
|
|
"node_id=" hexdigest
|
|
|
|
[Exactly once.]
|
|
|
|
The fingerprint for the relay's RSA identity key.
|
|
|
|
Note: In bandwidth files read by Tor versions earlier than
|
|
0.3.4.1-alpha, node_id MUST NOT be at the end of the Line.
|
|
These authority versions are no longer supported.
|
|
|
|
Current Tor versions ignore master_key_ed25519, so node_id MUST be
|
|
present in each relay Line.
|
|
|
|
Implementations of version 1.1.0 and later SHOULD include both node_id
|
|
and master_key_ed25519. Parsers SHOULD accept Lines that contain at
|
|
least one of them.
|
|
|
|
"master_key_ed25519=" MasterKey
|
|
|
|
[Zero or one time.]
|
|
|
|
The relays's master Ed25519 key, base64 encoded,
|
|
without trailing "="s, to avoid ambiguity with KeyValue "="
|
|
character.
|
|
|
|
This KeyValue pair SHOULD be present, see the note under node_id.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
|
|
"bw=" Bandwidth
|
|
|
|
[Exactly once.]
|
|
|
|
The bandwidth of this relay in kilobytes per second.
|
|
|
|
No Zero Bandwidths:
|
|
Tor accepts zero bandwidths, but they trigger bugs in older Tor
|
|
implementations. Therefore, implementations SHOULD NOT produce zero
|
|
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
|
|
If there are zero bandwidths, the parser MAY ignore them.
|
|
|
|
Bandwidth Aggregation:
|
|
Multiple measurements can be aggregated using an averaging scheme,
|
|
such as a mean, median, or decaying average.
|
|
|
|
Bandwidth Scaling:
|
|
Torflow scales bandwidths to kilobytes per second. Other
|
|
implementations SHOULD use kilobytes per second for their initial
|
|
bandwidth scaling.
|
|
|
|
If different implementations or configurations are used in votes for
|
|
the same network, their measurements MAY need further scaling. See
|
|
Appendix B for information about scaling, and one possible scaling
|
|
method.
|
|
|
|
MaxAdvertisedBandwidth:
|
|
Bandwidth generators MUST limit the relays' measured bandwidth based
|
|
on the MaxAdvertisedBadwidth.
|
|
A relay's MaxAdvertisedBandwidth limits the bandwidth-avg in its
|
|
descriptor. bandwidth-avg is the minimum of MaxAdvertisedBandwidth,
|
|
BandwidthRate, RelayBandwidthRate, BandwidthBurst, and
|
|
RelayBandwidthBurst.
|
|
Therefore, generators MUST limit a relay's measured bandwidth to its
|
|
descriptor's bandwidth-avg. This limit needs to be implemented in the
|
|
generator, because generators may scale consensus weights before
|
|
sending them to Tor.
|
|
Generators SHOULD NOT limit measured bandwidths based on descriptors'
|
|
bandwidth-observed, because that penalises new relays.
|
|
|
|
sbws limits the relay's measured bandwidth to the bandwidth-avg
|
|
advertised.
|
|
|
|
Torflow partitions relays based on their bandwidth. For unmeasured
|
|
relays, Torflow uses the minimum of all descriptor bandwidths,
|
|
including bandwidth-avg (MaxAdvertisedBandwidth) and
|
|
bandwidth-observed. Then Torflow measures the relays in each partition
|
|
against each other, which implicitly limits a relay's measured
|
|
bandwidth to the bandwidths of similar relays.
|
|
|
|
Torflow also generates consensus weights based on the ratio between the
|
|
measured bandwidth and the minimum of all descriptor bandwidths (at the
|
|
time of the measurement). So when an operator reduces the
|
|
MaxAdvertisedBandwidth for a relay, Torflow reduces that relay's
|
|
measured bandwidth.
|
|
|
|
KeyValue
|
|
|
|
[Zero or more times.]
|
|
|
|
Future format versions may include additional KeyValue pairs on a
|
|
RelayLine.
|
|
Additional KeyValue pairs will be accompanied by a minor version
|
|
increment.
|
|
|
|
Implementations MAY add additional relay KeyValue pairs as needed.
|
|
This specification SHOULD be updated to avoid conflicting meanings
|
|
for the same Keywords.
|
|
|
|
Parsers MUST NOT rely on the order of these additional KeyValue
|
|
pairs.
|
|
|
|
Additional KeyValue pairs MUST NOT use any keywords specified in the
|
|
header format.
|
|
If there are, the parser MAY ignore conflicting keywords.
|
|
|
|
2.4. Implementation details
|
|
|
|
2.4.1. Writing bandwidth files atomically
|
|
|
|
To avoid inconsistent reads, implementations SHOULD write bandwidth files
|
|
atomically. If the file is transferred from another host, it SHOULD be
|
|
written to a temporary path, then renamed to the V3BandwidthsFile path.
|
|
|
|
sbws versions 0.7.0 and later write the bandwidth file to an archival
|
|
location, create a temporary symlink to that location, then atomically rename
|
|
the symlink
|
|
to the configured V3BandwidthsFile path.
|
|
|
|
Torflow does not write bandwidth files atomically.
|
|
|
|
2.4.2. Additional KeyValue pair definitions
|
|
|
|
KeyValue pairs in RelayLines that current implementations generate.
|
|
|
|
2.4.2.1. Simple Bandwidth Scanner
|
|
|
|
sbws RelayLines contain these keys:
|
|
|
|
"node_id=" hexdigest
|
|
|
|
As above.
|
|
|
|
"bw=" Bandwidth
|
|
|
|
As above.
|
|
|
|
"nick=" nickname
|
|
|
|
[Exactly once.]
|
|
|
|
The relay nickname.
|
|
|
|
Torflow also has a "nick=" KeyValue.
|
|
|
|
"rtt=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The Round Trip Time in milliseconds to obtain 1 byte of data.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
It became optional in version 1.3.0 or 1.4.0 of this specification.
|
|
|
|
"time=" DateTime
|
|
|
|
[Exactly once.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
when the last bandwidth was obtained.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
The Torflow equivalent is "measured_at=".
|
|
|
|
"success=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay were
|
|
successful.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
|
|
"error_circ=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because of circuit failures.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
The Torflow equivalent is "circ_fail=".
|
|
|
|
"error_stream=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because of stream failures.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
|
|
"error_destination=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because the destination Web server was not available.
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"error_second_relay=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because sbws could not find a second relay for the test circuit.
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"error_misc=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because of other reasons.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
|
|
"bw_mean=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The measured bandwidth mean for this relay in bytes per second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"bw_median=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The measured bandwidth median for this relay in bytes per second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"desc_bw_average=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The descriptor average bandwidth for this relay in bytes per second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"desc_obs_bw_last=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The last descriptor observed bandwidth for this relay in bytes per
|
|
second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"desc_obs_bw_mean=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The descriptor observed bandwidth mean for this relay in bytes per
|
|
second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"relay_recent_measurements_excluded_error_count=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of recent relay measurement attempts that failed.
|
|
Measurements are recent if they are in the last data_period days
|
|
(5 by default).
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"relay_recent_measurements_excluded_near_count=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
When all of a relay's recent successful measurements were performed in
|
|
a period of time that was too short (by default 1 day), the relay is
|
|
excluded. This KeyValue contains the number of recent successful
|
|
measurements for the relay that were ignored for this reason.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"relay_recent_measurements_excluded_old_count=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of successful measurements for this relay that are too old
|
|
(more than data_period days, 5 by default).
|
|
|
|
Excludes measurements that are already counted in
|
|
relay_recent_measurements_excluded_near_count.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_few_count=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of successful measurements for this relay that were ignored
|
|
because the relay did not have enough successful measurements (fewer
|
|
than 2, by default).
|
|
|
|
Excludes measurements that are already counted in
|
|
relay_recent_measurements_excluded_near_count or
|
|
relay_recent_measurements_excluded_old_count.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"under_min_report=" bool
|
|
|
|
[Zero or one time.]
|
|
|
|
If the value is 1, there are not enough eligible relays in the
|
|
bandwidth file, and Tor bandwidth authorities MAY NOT vote on this
|
|
relay. (Current Tor versions do not change their behaviour based on
|
|
the "under_min_report" key.)
|
|
|
|
If the value is 0 or the KeyValue is not present, there are enough
|
|
relays in the bandwidth file.
|
|
|
|
Because Tor versions released before April 2019 (see section 1.4. for
|
|
the full list of versions) ignore "vote=0", generator implementations
|
|
MUST NOT change the bandwidths for under_min_report relays. Using the
|
|
same bw value makes authorities that do not understand "vote=0"
|
|
or "under_min_report=1" produce votes that don't change relay weights
|
|
too much. It also avoids flapping when the reporting threshold is
|
|
reached.
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"unmeasured=" bool
|
|
|
|
[Zero or one time.]
|
|
|
|
If the value is 1, this relay was not successfully measured and
|
|
Tor bandwidth authorities MAY NOT vote on this relay.
|
|
(Current Tor versions do not change their behaviour based on
|
|
the "unmeasured" key.)
|
|
|
|
If the value is 0 or the KeyValue is not present, this relay
|
|
was successfully measured.
|
|
|
|
Because Tor versions released before April 2019 (see section 1.4. for
|
|
the full list of versions) ignore "vote=0", generator implementations
|
|
MUST set "bw=1" for unmeasured relays. Using the minimum bw value
|
|
makes authorities that do not understand "vote=0" or "unmeasured=1"
|
|
produce votes that don't change relay weights too much.
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"vote=" bool
|
|
|
|
[Zero or one time.]
|
|
|
|
If the value is 0, Tor directory authorities SHOULD ignore the relay's
|
|
entry in the bandwidth file. They SHOULD vote for the relay the same
|
|
way they would vote for a relay that is not present in the file.
|
|
|
|
This MAY be the case when this relay was not successfully measured but
|
|
it is included in the Bandwidth File, to diagnose why they were not
|
|
measured.
|
|
|
|
If the value is 1 or the KeyValue is not present, Tor directory
|
|
authorities MUST use the relay's bw value in any votes for that relay.
|
|
|
|
Implementations MUST also set "bw=1" for unmeasured relays.
|
|
But they MUST NOT change the bw for under_min_report relays.
|
|
(See the explanations under "unmeasured" and "under_min_report"
|
|
for more details.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
2.4.2.2. Torflow
|
|
|
|
Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].
|
|
|
|
References:
|
|
|
|
1. https://gitweb.torproject.org/torflow.git
|
|
2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
|
|
The Torflow specification is outdated, and does not match the current
|
|
implementation. See section A.1. for the format produced by Torflow.
|
|
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
|
|
4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt
|
|
5. https://semver.org/
|
|
|
|
A. Sample data
|
|
|
|
The following has not been obtained from any real measurement.
|
|
|
|
A.1. Generated by Torflow
|
|
|
|
This an example version 1.0.0 document:
|
|
|
|
1523911758
|
|
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
|
|
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
|
|
|
|
A.2. Generated by sbws version 0.1.0
|
|
|
|
1523911758
|
|
version=1.1.0
|
|
software=sbws
|
|
software_version=0.1.0
|
|
latest_bandwidth=2018-04-16T20:49:18
|
|
file_created=2018-04-16T21:49:18
|
|
generator_started=2018-04-16T15:13:25
|
|
earliest_bandwidth=2018-04-16T15:13:26
|
|
====
|
|
bw=380 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26
|
|
bw=189 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36
|
|
|
|
A.3. Generated by sbws version 1.0.3
|
|
|
|
1523911758
|
|
version=1.2.0
|
|
latest_bandwidth=2018-04-16T20:49:18
|
|
file_created=2018-04-16T21:49:18
|
|
generator_started=2018-04-16T15:13:25
|
|
earliest_bandwidth=2018-04-16T15:13:26
|
|
minimum_number_eligible_relays=3862
|
|
minimum_percent_eligible_relays=60
|
|
number_consensus_relays=6436
|
|
number_eligible_relays=6000
|
|
percent_eligible_relays=93
|
|
software=sbws
|
|
software_version=1.0.3
|
|
=====
|
|
bw=38000 bw_mean=1127824 bw_median=1180062 desc_avg_bw=1073741824 desc_obs_bw_last=17230879 desc_obs_bw_mean=14732306 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26
|
|
bw=1 bw_mean=199162 bw_median=185675 desc_avg_bw=409600 desc_obs_bw_last=836165 desc_obs_bw_mean=858030 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36
|
|
|
|
A.3.1. When there are not enough eligible measured relays:
|
|
|
|
1540496079
|
|
version=1.2.0
|
|
earliest_bandwidth=2018-10-20T19:35:52
|
|
file_created=2018-10-25T19:35:03
|
|
generator_started=2018-10-25T11:42:56
|
|
latest_bandwidth=2018-10-25T19:34:39
|
|
minimum_number_eligible_relays=3862
|
|
minimum_percent_eligible_relays=60
|
|
number_consensus_relays=6436
|
|
number_eligible_relays=2960
|
|
percent_eligible_relays=46
|
|
software=sbws
|
|
software_version=1.0.3
|
|
=====
|
|
|
|
A.4. Headers generated by sbws version 1.0.4
|
|
|
|
1523911758
|
|
version=1.3.0
|
|
latest_bandwidth=2018-04-16T20:49:18
|
|
destinations_countries=TH,ZZ
|
|
file_created=2018-04-16T21:49:18
|
|
generator_started=2018-04-16T15:13:25
|
|
earliest_bandwidth=2018-04-16T15:13:26
|
|
minimum_number_eligible_relays=3862
|
|
minimum_percent_eligible_relays=60
|
|
number_consensus_relays=6436
|
|
number_eligible_relays=6000
|
|
percent_eligible_relays=93
|
|
scanner_country=SN
|
|
software=sbws
|
|
software_version=1.0.4
|
|
=====
|
|
|
|
A.5 Generated by sbws version 1.1.0
|
|
|
|
1523911758
|
|
version=1.4.0
|
|
latest_bandwidth=2018-04-16T20:49:18
|
|
destinations_countries=TH,ZZ
|
|
file_created=2018-04-16T21:49:18
|
|
generator_started=2018-04-16T15:13:25
|
|
earliest_bandwidth=2018-04-16T15:13:26
|
|
minimum_number_eligible_relays=3862
|
|
minimum_percent_eligible_relays=60
|
|
number_consensus_relays=6436
|
|
number_eligible_relays=6000
|
|
percent_eligible_relays=93
|
|
recent_measurement_attempt_count=6243
|
|
recent_measurement_failure_count=732
|
|
recent_measurements_excluded_error_count=969
|
|
recent_measurements_excluded_few_count=3946
|
|
recent_measurements_excluded_near_count=90
|
|
recent_measurements_excluded_old_count=0
|
|
recent_priority_list_count=20
|
|
recent_priority_relay_count=6243
|
|
scanner_country=SN
|
|
software=sbws
|
|
software_version=1.1.0
|
|
time_to_report_half_network=57273
|
|
=====
|
|
bw=1 error_circ=1 error_destination=0 error_misc=0 error_second_relay=0 error_stream=0 master_key_ed25519=J3HQ24kOQWac3L1xlFLp7gY91qkb5NuKxjj1BhDi+m8 nick=snap269 node_id=$DC4D609F95A52614D1E69C752168AF1FCAE0B05F relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=1 relay_recent_measurements_excluded_near_count=3 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=3 time=2019-03-16T18:20:57 unmeasured=1 vote=0
|
|
bw=1 error_circ=0 error_destination=0 error_misc=0 error_second_relay=0 error_stream=2 master_key_ed25519=h6ZB1E1yBFWIMloUm9IWwjgaPXEpL5cUbuoQDgdSDKg nick=relay node_id=$C4544F9E209A9A9B99591D548B3E2822236C0503 relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=2 relay_recent_measurements_excluded_few_count=1 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=1 time=2019-03-17T06:50:58 unmeasured=1 vote=0
|
|
|
|
B. Scaling bandwidths
|
|
|
|
B.1. Scaling requirements
|
|
|
|
Tor accepts zero bandwidths, but they trigger bugs in older Tor
|
|
implementations. Therefore, scaling methods SHOULD perform the
|
|
following checks:
|
|
* If the total bandwidth is zero, all relays should be given equal
|
|
bandwidths.
|
|
* If the scaled bandwidth is zero, it should be rounded up to one.
|
|
|
|
Initial experiments indicate that scaling may not be needed for
|
|
torflow and sbws, because their measured bandwidths are similar
|
|
enough already.
|
|
|
|
B.2. A linear scaling method
|
|
|
|
If scaling is required, here is a simple linear bandwith scaling
|
|
method, which ensures that all bandwidth votes contain approximately
|
|
the same total bandwidth:
|
|
|
|
1. Calculate the relay quota by dividing the total measured bandwidth
|
|
in all votes, by the number of relays with measured bandwidth
|
|
votes. In the public tor network, this is approximately 7500 as of
|
|
April 2018. The quota should be a consensus parameter, so it can be
|
|
adjusted for all generators on the network.
|
|
|
|
2. Calculate a vote quota by multiplying the relay quota by the number
|
|
of relays this bandwidth authority has measured
|
|
bandwidths for.
|
|
|
|
3. Calculate a scaling factor by dividing the vote quota by the
|
|
total unscaled measured bandwidth in this bandwidth
|
|
authority's upcoming vote.
|
|
|
|
4. Multiply each unscaled measured bandwidth by the scaling
|
|
factor.
|
|
|
|
Now, the total scaled bandwidth in the upcoming vote is
|
|
approximately equal to the quota.
|
|
|
|
B.3. Quota changes
|
|
|
|
If all generators are using scaling, the quota can be gradually
|
|
reduced or increased as needed. Smaller quotas decrease the size
|
|
of uncompressed consensuses, and may decrease the size of
|
|
consensus diffs and compressed consensuses. But if the relay
|
|
quota is too small, some relays may be over- or under-weighted.
|
|
|
|
B.4. Torflow aggreation
|
|
|
|
Torflow implements two methods to compute the bandwidth values from the
|
|
(stream) bandwidth measurements: with and without PID control feedback.
|
|
The method described here is without PID control (see Torflow
|
|
specification, section 2.2).
|
|
|
|
In the following sections, the relays' measured bandwidth refer to the
|
|
ones that this bandwidth authority has measured for the relays that
|
|
would be included in the next bandwidth authority's upcoming vote.
|
|
|
|
1. Calculate the filtered bandwidth for each relay:
|
|
- choose the relay's measurements (`bw_j`) that are equal or greater
|
|
than the mean of the measurements for this relay
|
|
- calculate the mean of those measurements
|
|
|
|
In pseudocode:
|
|
|
|
bw_filt_i = mean(max(mean(bw_j), bw_j))
|
|
|
|
2. Calculate network averages:
|
|
- calculate the filtered average by dividing the sum of all the
|
|
relays' filtered bandwidth by the number of relays that have been
|
|
measured (`n`), ie, calculate the mean average of the relays'
|
|
filtered bandwidth.
|
|
- calculate the stream average by dividing the sum of all the
|
|
relays' filtered bandwidth by the number of relays that have been
|
|
measured (`n`), ie, calculate the mean average or the relays'
|
|
measured bandwidth.
|
|
|
|
In pseudocode:
|
|
|
|
bw_avg_filt_ = bw_filt_i / n
|
|
bw_avg_strm = bw_i / n
|
|
|
|
3. Calculate ratios for each relay:
|
|
- calculate the filtered ratio by dividing each relay filtered
|
|
bandwidth by the filtered average
|
|
- calculate the stream ratio by dividing each relay measured
|
|
bandwidth by the stream average
|
|
|
|
In pseudocode:
|
|
|
|
r_filt_i = bw_filt_i / bw_avg_filt
|
|
r_strm_i = bw_i / bw_avg_strm
|
|
|
|
4. Calculate the final ratio for each relay:
|
|
The final ratio is the larger between the filtered bandwidth and the
|
|
stream bandwidth.
|
|
|
|
In pseudocode:
|
|
|
|
r_i = max(r_filt_i, r_strm_i)
|
|
|
|
5. Calculate the scaled bandwidth for each relay:
|
|
The most recent descriptor observed bandwidth (`bw_obs_i`) is
|
|
multiplied by the ratio
|
|
|
|
In pseudocode:
|
|
|
|
bw_new_i = r_i * bw_obs_i
|
|
|
|
<<In this way, the resulting network status consensus bandwidth
|
|
values are effectively re-weighted proportional to how much faster
|
|
the node was as compared to the rest of the network.>>
|