mirror of
https://github.com/torproject/torspec.git
synced 2024-12-15 06:28:46 +00:00
1157 lines
40 KiB
Plaintext
1157 lines
40 KiB
Plaintext
Tor Bandwidth File Format
|
|
juga
|
|
teor
|
|
|
|
1. Scope and preliminaries
|
|
|
|
This document describes the format of Tor's Bandwidth File, version
|
|
1.0.0 and later.
|
|
|
|
It is a new specification for the existing bandwidth file format,
|
|
which we call version 1.0.0. It also specifies new format versions
|
|
1.1.0 and later, which are backwards compatible with 1.0.0 parsers.
|
|
|
|
Since Tor version 0.2.4.12-alpha, the directory authorities use
|
|
the Bandwidth File file called "V3BandwidthsFile" generated by
|
|
Torflow [1]. The details of this format are described in Torflow's
|
|
README.spec.txt. We also summarise the format in this specification.
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
|
|
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
|
|
"OPTIONAL" in this document are to be interpreted as described in
|
|
RFC 2119.
|
|
|
|
1.2. Acknowledgements
|
|
|
|
The original bandwidth generator (Torflow) and format was
|
|
created by mike. Teor suggested to write this specification while
|
|
contributing on pastly's new bandwidth generator implementation.
|
|
|
|
This specification was revised after feedback from:
|
|
|
|
Nick Mathewson (nickm)
|
|
Iain Learmonth (irl)
|
|
|
|
1.3 Outline
|
|
|
|
The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1
|
|
and 3.4.2, use the term bandwidth measurements, to refer to what
|
|
here is called Bandwidth File.
|
|
|
|
A Bandwidth File contains information on relays' bandwidth
|
|
capacities and is produced by bandwidth generators, previously known
|
|
as bandwidth scanners.
|
|
|
|
1.4. Format Versions
|
|
|
|
1.0.0 - The legacy Bandwidth File format
|
|
|
|
1.1.0 - Add a header containing information about the bandwidth
|
|
file. Document the sbws and Torflow relay line keys.
|
|
|
|
1.2.0 - If there are not enough eligible relays, the bandwidth file
|
|
SHOULD contain a header, but no relays. (To match Torflow's
|
|
existing behaviour.)
|
|
|
|
Adds new KeyValue Lines to the Header List section with
|
|
statistics about the number of relays included in the file.
|
|
Add new KeyValues to Relay Bandwidth Lines, with different
|
|
bandwidth values (averages and descriptor bandwidths).
|
|
|
|
1.3.0 - Adds scanner and destination countries to the header.
|
|
|
|
1.4.0 - Adds monitoring KeyValues to the header and relay lines.
|
|
|
|
RelayLines for excluded relays MAY be present in the bandwidth
|
|
file for diagnostic reasons. Similarly, if there are not enough
|
|
eligible relays, the bandwidth file MAY contain all known relays.
|
|
|
|
Diagnostic relay lines SHOULD be marked with vote=0, and
|
|
Tor SHOULD NOT use their bandwidths in its votes.
|
|
|
|
All Tor versions can consume format version 1.0.0.
|
|
|
|
All Tor versions can consume format version 1.1.0 and later,
|
|
but Tor versions earlier than 0.3.5.1-alpha warn if the header
|
|
contains any KeyValue lines after the Timestamp.
|
|
|
|
Tor versions 0.4.0.3-alpha, 0.3.5.8, 0.3.4.11, and earlier do not
|
|
understand "vote=0". Instead, they will vote for the actual bandwidths
|
|
that sbws puts in diagnostic relay lines:
|
|
* 1 for relays with "unmeasured=1", and
|
|
* the relay's measured and scaled bandwidth when "under_min_report=1".
|
|
|
|
2. Format details
|
|
|
|
The Bandwidth File MUST contain the following sections:
|
|
- Header List (exactly once), which is a partially ordered list of
|
|
- Header Lines (one or more times), then
|
|
- Relay Lines (zero or more times), in an arbitrary order.
|
|
If it does not contain these sections, parsers SHOULD ignore the file.
|
|
|
|
2.1. Definitions
|
|
|
|
The following nonterminals are defined in Tor directory protocol
|
|
sections 1.2., 2.1.1., 2.1.3.:
|
|
|
|
bool
|
|
Int
|
|
SP (space)
|
|
NL (newline)
|
|
KeywordChar
|
|
ArgumentChar
|
|
nickname
|
|
hexdigest (a '$', followed by 40 hexadecimal characters
|
|
([A-Fa-f0-9]))
|
|
|
|
Nonterminal defined section 2 of version-spec.txt [4]:
|
|
|
|
version_number
|
|
|
|
We define the following nonterminals:
|
|
|
|
Line ::= ArgumentChar* NL
|
|
RelayLine ::= KeyValue (SP KeyValue)* NL
|
|
HeaderLine ::= KeyValue NL
|
|
KeyValue ::= Key "=" Value
|
|
Key ::= (KeywordChar | "_")+
|
|
Value ::= ArgumentCharValue+
|
|
ArgumentCharValue ::= any printing ASCII character except NL and SP.
|
|
Terminator ::= "=====" or "===="
|
|
Generators SHOULD use a 5-character terminator.
|
|
Timestamp ::= Int
|
|
Bandwidth ::= Int
|
|
MasterKey ::= a base64-encoded Ed25519 public key, with
|
|
padding characters omitted.
|
|
DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601
|
|
CountryCode ::= Two capital ASCII letters ([A-Z]{2}), as defined in
|
|
ISO 3166-1 alpha-2 plus "ZZ" to denote unknown country
|
|
(eg the destination is in a Content Delivery Network).
|
|
CountryCodeList ::= One or more CountryCode(s) separated by a comma
|
|
([A-Z]{2}(,[A-Z]{2})*).
|
|
|
|
Note that key_value and value are defined in Tor directory protocol
|
|
with different formats to KeyValue and Value here.
|
|
|
|
Tor versions earlier than 0.3.5.1-alpha require all lines in the file
|
|
to be 510 characters or less. The previous limit was 254 characters in
|
|
Tor 0.2.6.2-alpha and earlier. Parsers MAY ignore longer Lines.
|
|
|
|
Note that directory authorities are only supported on the two most
|
|
recent stable Tor versions, so we expect that line limits will be
|
|
removed after Tor 0.4.0 is released in 2019.
|
|
|
|
2.2. Header List format
|
|
|
|
It consists of a Timestamp line and zero or more HeaderLines.
|
|
|
|
All the header lines MUST conform to the HeaderLine format, except
|
|
the first Timestamp line.
|
|
|
|
The Timestamp line is not a HeaderLine to keep compatibility with
|
|
the legacy Bandwidth File format.
|
|
|
|
Some header Lines MUST appear in specific positions, as documented
|
|
below. All other Lines can appear in any order.
|
|
|
|
If a parser does not recognize any extra material in a header Line,
|
|
the Line MUST be ignored.
|
|
|
|
If a header Line does not conform to this format, the Line SHOULD be
|
|
ignored by parsers.
|
|
|
|
It consists of:
|
|
|
|
Timestamp NL
|
|
|
|
[At start, exactly once.]
|
|
|
|
The Unix Epoch time in seconds of the most recent generator bandwidth
|
|
result.
|
|
|
|
If the generator implementation has multiple threads or
|
|
subprocesses which can fail independently, it SHOULD take the most
|
|
recent timestamp from each thread and use the oldest value. This
|
|
ensures all the threads continue running.
|
|
|
|
If there are threads that do not run continuously, they SHOULD be
|
|
excluded from the timestamp calculation.
|
|
|
|
If there are no recent results, the generator MUST NOT generate a new
|
|
file.
|
|
|
|
It does not follow the KeyValue format for backwards compatibility
|
|
with version 1.0.0.
|
|
|
|
"version=" version_number NL
|
|
|
|
[In second position, zero or one time.]
|
|
|
|
The specification document format version.
|
|
It uses semantic versioning [5].
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
Version 1.0.0 documents do not contain this Line, and the
|
|
version_number is considered to be "1.0.0".
|
|
|
|
"software=" Value NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The name of the software that created the document.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
Version 1.0.0 documents do not contain this Line, and the software
|
|
is considered to be "torflow".
|
|
|
|
"software_version=" Value NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The version of the software that created the document.
|
|
The version may be a version_number, a git commit, or some other
|
|
version scheme.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"file_created=" DateTime NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
when the file was created.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"generator_started=" DateTime NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
when the generator started.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"earliest_bandwidth=" DateTime NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
when the first relay bandwidth was obtained.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"latest_bandwidth=" DateTime NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
of the most recent generator bandwidth result.
|
|
|
|
This time MUST be identical to the initial Timestamp line.
|
|
|
|
This duplicate value is included to make the format easier for people
|
|
to read.
|
|
|
|
This Line was added in version 1.1.0 of this specification.
|
|
|
|
"number_eligible_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that have enough measurements to be
|
|
included in the bandwidth file.
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"minimum_percent_eligible_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The percentage of relays in the consensus that SHOULD be
|
|
included in every generated bandwidth file.
|
|
|
|
If this threshold is not reached, format versions 1.3.0 and earlier
|
|
SHOULD NOT contain any relays. (Bandwidth files always include a
|
|
header.)
|
|
|
|
Format versions 1.4.0 and later SHOULD include all the relays for
|
|
diagnostic purposes, even if this threshold is not reached. But these
|
|
relays SHOULD be marked so that Tor does not vote on them.
|
|
See section 1.4 for details.
|
|
|
|
The minimum percentage is 60% in Torflow, so sbws uses
|
|
60% as the default.
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"number_consensus_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays in the consensus.
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"percent_eligible_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of eligible relays, as a percentage of the number
|
|
of relays in the consensus.
|
|
|
|
This line SHOULD be equal to:
|
|
(number_eligible_relays * 100.0) / number_consensus_relays
|
|
to the number of relays in the consensus to include in this file.
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"minimum_number_eligible_relays=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The minimum number of relays that SHOULD be included in the bandwidth
|
|
file. See minimum_percent_eligible_relays for details.
|
|
|
|
This line SHOULD be equal to:
|
|
number_consensus_relays * (minimum_percent_eligible_relays / 100.0)
|
|
|
|
This Line was added in version 1.2.0 of this specification.
|
|
|
|
"scanner_country=" CountryCode NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The country, as in political geolocation, where the generator is run.
|
|
|
|
This Line was added in version 1.3.0 of this specification.
|
|
|
|
"destinations_countries=" CountryCodeList NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The country, as in political geolocation, or countries where the
|
|
destination Web server(s) are located.
|
|
The destination Web Servers serve the data that the generator retrieves
|
|
to measure the bandwidth.
|
|
|
|
This Line was added in version 1.3.0 of this specification.
|
|
|
|
"recent_consensus_count=" Int NL
|
|
|
|
[Zero or one time.].
|
|
|
|
The number of the different consensuses seen in the last data_period
|
|
days. (data_period is 5 by default.)
|
|
|
|
Assuming that Tor clients fetch a consensus every 1-2 hours,
|
|
and that the data_period is 5 days, the Value of this Key SHOULD be
|
|
between:
|
|
data_period * 24 / 2 = 60
|
|
data_period * 24 = 120
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_priority_list_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that a list with a subset of relays prioritized
|
|
to be measured has been created in the last data_period days.
|
|
(data_period is 5 by default.)
|
|
|
|
In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
|
|
approximately:
|
|
data_period * 24 / 1.5 = 80
|
|
Being 1.5 the approximate number of hours it takes to measure a
|
|
priority list of 7000 * 0.05 (350) relays, when the fraction of relays
|
|
in a priority list is the 5% (0.05).
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_priority_relay_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that has been in in the list of relays prioritized
|
|
to be measured in the last data_period days. (data_period is 5 by
|
|
default.)
|
|
|
|
In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
|
|
approximately:
|
|
80 * (7000 * 0.05) = 28000
|
|
Being 0.05 (5%) the fraction of relays in a priority list and 80
|
|
the approximate number of priority lists (see
|
|
"recent_priority_list_count").
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurement_attempt_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that any relay has been queued to be measured
|
|
in the last data_period days. (data_period is 5 by default.)
|
|
|
|
In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
|
|
approximately the same as "recent_priority_relay_count",
|
|
assuming that there is one attempt to measure a relay for each relay that
|
|
has been prioritized unless there are system, network or implementation
|
|
issues.
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurement_failure_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the scanner attempted to measure a relay in
|
|
the last data_period days (5 by default), but the relay has not been
|
|
measured because of system, network or implementation issues.
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_error_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that have no successful measurements in the last
|
|
data_period days (5 by default).
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_near_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that have some successful measurements in the last
|
|
data_period days (5 by default), but all those measurements were
|
|
performed in a period of time that was too short (by default 1 day).
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_old_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that have some successful measurements, but all
|
|
those measurements are too old (more than 5 days, by default).
|
|
|
|
Excludes relays that are already counted in
|
|
recent_measurements_excluded_near_count.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_few_count=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of relays that don't have enough recent successful
|
|
measurements. (Fewer than 2 measurements in the last 5 days, by
|
|
default).
|
|
|
|
Excludes relays that are already counted in
|
|
recent_measurements_excluded_near_count and
|
|
recent_measurements_excluded_old_count.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
"time_to_report_half_network=" Int NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The time in seconds that it would take to report measurements about the
|
|
half of the network, given the number of eligible relays and the time
|
|
it took in the last days (5 days, by default).
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This Line was added in version 1.4.0 of this specification.
|
|
|
|
KeyValue NL
|
|
|
|
[Zero or more times.]
|
|
|
|
There MUST NOT be multiple KeyValue header Lines with the same key.
|
|
If there are, the parser SHOULD choose an arbitrary Line.
|
|
|
|
If a parser does not recognize a Keyword in a KeyValue Line, it
|
|
MUST be ignored.
|
|
|
|
Future format versions may include additional KeyValue header Lines.
|
|
Additional header Lines will be accompanied by a minor version
|
|
increment.
|
|
|
|
Implementations MAY add additional header Lines as needed. This
|
|
specification SHOULD be updated to avoid conflicting meanings for
|
|
the same header keys.
|
|
|
|
Parsers MUST NOT rely on the order of these additional Lines.
|
|
|
|
Additional header Lines MUST NOT use any keywords specified in the
|
|
relay measurements format.
|
|
If there are, the parser MAY ignore conflicting keywords.
|
|
|
|
Terminator NL
|
|
|
|
[Zero or one time.]
|
|
|
|
The Header List section ends with a Terminator.
|
|
|
|
In version 1.0.0, Header List ends when the first relay bandwidth
|
|
is found conforming to the next section.
|
|
|
|
Implementations of version 1.1.0 and later SHOULD use a 5-character
|
|
terminator.
|
|
|
|
Tor 0.4.0.1-alpha and later look for a 5-character terminator,
|
|
or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2
|
|
used a 4-character terminator, this bug was fixed in 1.0.3.
|
|
|
|
2.3. Relay Line format
|
|
|
|
It consists of zero or more RelayLines containing relay ids and
|
|
bandwidths. The relays and their KeyValues are in arbitrary order.
|
|
|
|
There MUST NOT be multiple KeyValue pairs with the same key in the same
|
|
RelayLine. If there are, the parser SHOULD choose an arbitrary Value.
|
|
|
|
There MUST NOT be multiple RelayLines per relay identity (node_id or
|
|
master_key_ed25519). If there are, parsers SHOULD issue a warning.
|
|
Parers MAY reject the file, choose an arbitrary RelayLine, or ignore
|
|
both RelayLines.
|
|
|
|
If a parser does not recognize any extra material in a RelayLine,
|
|
the extra material MUST be ignored.
|
|
|
|
Each RelayLine includes the following KeyValue pairs:
|
|
|
|
"node_id=" hexdigest
|
|
|
|
[Exactly once.]
|
|
|
|
The fingerprint for the relay's RSA identity key.
|
|
|
|
Note: In bandwidth files read by Tor versions earlier than
|
|
0.3.4.1-alpha, node_id MUST NOT be at the end of the Line.
|
|
These authority versions are no longer supported.
|
|
|
|
Current Tor versions ignore master_key_ed25519, so node_id MUST be
|
|
present in each relay Line.
|
|
|
|
Implementations of version 1.1.0 and later SHOULD include both node_id
|
|
and master_key_ed25519. Parsers SHOULD accept Lines that contain at
|
|
least one of them.
|
|
|
|
"master_key_ed25519=" MasterKey
|
|
|
|
[Zero or one time.]
|
|
|
|
The relays's master Ed25519 key, base64 encoded,
|
|
without trailing "="s, to avoid ambiguity with KeyValue "="
|
|
character.
|
|
|
|
This KeyValue pair SHOULD be present, see the note under node_id.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
|
|
"bw=" Bandwidth
|
|
|
|
[Exactly once.]
|
|
|
|
The bandwidth of this relay in kilobytes per second.
|
|
|
|
No Zero Bandwidths:
|
|
Tor accepts zero bandwidths, but they trigger bugs in older Tor
|
|
implementations. Therefore, implementations SHOULD NOT produce zero
|
|
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
|
|
If there are zero bandwidths, the parser MAY ignore them.
|
|
|
|
Bandwidth Aggregation:
|
|
Multiple measurements can be aggregated using an averaging scheme,
|
|
such as a mean, median, or decaying average.
|
|
|
|
Bandwidth Scaling:
|
|
Torflow scales bandwidths to kilobytes per second. Other
|
|
implementations SHOULD use kilobytes per second for their initial
|
|
bandwidth scaling.
|
|
|
|
If different implementations or configurations are used in votes for
|
|
the same network, their measurements MAY need further scaling. See
|
|
Appendix B for information about scaling, and one possible scaling
|
|
method.
|
|
|
|
MaxAdvertisedBandwidth:
|
|
Bandwidth generators MUST limit the relays' measured bandwidth based
|
|
on the MaxAdvertisedBadwidth.
|
|
A relay's MaxAdvertisedBandwidth limits the bandwidth-avg in its
|
|
descriptor. bandwidth-avg is the minimum of MaxAdvertisedBandwidth,
|
|
BandwidthRate, RelayBandwidthRate, BandwidthBurst, and
|
|
RelayBandwidthBurst.
|
|
Therefore, generators MUST limit a relay's measured bandwidth to its
|
|
descriptor's bandwidth-avg. This limit needs to be implemented in the
|
|
generator, because generators may scale consensus weights before
|
|
sending them to Tor.
|
|
Generators SHOULD NOT limit measured bandwidths based on descriptors'
|
|
bandwidth-observed, because that penalises new relays.
|
|
|
|
sbws limits the relay's measured bandwidth to the bandwidth-avg
|
|
advertised.
|
|
|
|
Torflow partitions relays based on their bandwidth. For unmeasured
|
|
relays, Torflow uses the minimum of all descriptor bandwidths,
|
|
including bandwidth-avg (MaxAdvertisedBandwidth) and
|
|
bandwidth-observed. Then Torflow measures the relays in each partition
|
|
against each other, which implicitly limits a relay's measured
|
|
bandwidth to the bandwidths of similar relays.
|
|
|
|
Torflow also generates consensus weights based on the ratio between the
|
|
measured bandwidth and the minimum of all descriptor bandwidths (at the
|
|
time of the measurement). So when an operator reduces the
|
|
MaxAdvertisedBandwidth for a relay, Torflow reduces that relay's
|
|
measured bandwidth.
|
|
|
|
KeyValue
|
|
|
|
[Zero or more times.]
|
|
|
|
Future format versions may include additional KeyValue pairs on a
|
|
RelayLine.
|
|
Additional KeyValue pairs will be accompanied by a minor version
|
|
increment.
|
|
|
|
Implementations MAY add additional relay KeyValue pairs as needed.
|
|
This specification SHOULD be updated to avoid conflicting meanings
|
|
for the same Keywords.
|
|
|
|
Parsers MUST NOT rely on the order of these additional KeyValue
|
|
pairs.
|
|
|
|
Additional KeyValue pairs MUST NOT use any keywords specified in the
|
|
header format.
|
|
If there are, the parser MAY ignore conflicting keywords.
|
|
|
|
2.4. Implementation details
|
|
|
|
2.4.1 Writing bandwidth files atomically
|
|
|
|
To avoid inconsistent reads, implementations SHOULD write bandwidth files
|
|
atomically. If the file is transferred from another host, it SHOULD be
|
|
written to a temporary path, then renamed to the V3BandwidthsFile path.
|
|
|
|
sbws versions 0.7.0 and later write the bandwidth file to an archival
|
|
location, create a temporary symlink to that location, then atomically rename
|
|
the symlink
|
|
to the configured V3BandwidthsFile path.
|
|
|
|
Torflow does not write bandwidth files atomically.
|
|
|
|
2.4.2. Additional KeyValue pair definitions
|
|
|
|
KeyValue pairs in RelayLines that current implementations generate.
|
|
|
|
2.4.2.1. Simple Bandwidth Scanner
|
|
|
|
sbws RelayLines contain these keys:
|
|
|
|
"node_id=" hexdigest
|
|
|
|
As above.
|
|
|
|
"bw=" Bandwidth
|
|
|
|
As above.
|
|
|
|
"nick=" nickname
|
|
|
|
[Exactly once.]
|
|
|
|
The relay nickname.
|
|
|
|
Torflow also has a "nick=" KeyValue.
|
|
|
|
"rtt=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The Round Trip Time in milliseconds to obtain 1 byte of data.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
It became optional in version 1.3.0 or 1.4.0 of this specification.
|
|
|
|
"time=" DateTime
|
|
|
|
[Exactly once.]
|
|
|
|
The date and time timestamp in ISO 8601 format and UTC time zone
|
|
when the last bandwidth was obtained.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
The Torflow equivalent is "measured_at=".
|
|
|
|
"success=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay were
|
|
successful.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
|
|
"error_circ=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because of circuit failures.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
The Torflow equivalent is "circ_fail=".
|
|
|
|
"error_stream=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because of stream failures.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
|
|
"error_destination=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because the destination Web server was not available.
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"error_second_relay=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because sbws could not find a second relay for the test circuit.
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"error_misc=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of times that the bandwidth measurements for this relay
|
|
failed because of other reasons.
|
|
|
|
This KeyValue was added in version 1.1.0 of this specification.
|
|
|
|
"bw_mean=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The measured bandwidth mean for this relay in bytes per second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"bw_median=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The measured bandwidth median for this relay in bytes per second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"desc_bw_average=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The descriptor average bandwidth for this relay in bytes per second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"desc_obs_bw_last=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The last descriptor observed bandwidth for this relay in bytes per
|
|
second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"desc_obs_bw_mean=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The descriptor observed bandwidth mean for this relay in bytes per
|
|
second.
|
|
|
|
This KeyValue was added in version 1.2.0 of this specification.
|
|
|
|
"relay_recent_measurements_excluded_error_count=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of recent relay measurement attempts that failed.
|
|
Measurements are recent if they are in the last data_period days
|
|
(5 by default).
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"relay_recent_measurements_excluded_near_count=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
When all of a relay's recent successful measurements were performed in
|
|
a period of time that was too short (by default 1 day), the relay is
|
|
excluded. This KeyValue contains the number of recent successful
|
|
measurements for the relay that were ignored for this reason.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"relay_recent_measurements_excluded_old_count=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of successful measurements for this relay that are too old
|
|
(more than data_period days, 5 by default).
|
|
|
|
Excludes measurements that are already counted in
|
|
relay_recent_measurements_excluded_near_count.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"recent_measurements_excluded_few_count=" Int
|
|
|
|
[Zero or one time.]
|
|
|
|
The number of successful measurements for this relay that were ignored
|
|
because the relay did not have enough successful measurements (fewer
|
|
than 2, by default).
|
|
|
|
Excludes measurements that are already counted in
|
|
relay_recent_measurements_excluded_near_count or
|
|
relay_recent_measurements_excluded_old_count.
|
|
|
|
(See the note in section 1.4, version 1.4.0, about excluded relays.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"under_min_report=" bool
|
|
|
|
[Zero or one time.]
|
|
|
|
If the value is 1, there are not enough eligible relays in the
|
|
bandwidth file, and Tor bandwidth authorities MAY NOT vote on this
|
|
relay. (Current Tor versions do not change their behaviour based on
|
|
the "under_min_report" key.)
|
|
|
|
If the value is 0 or the KeyValue is not present, there are enough
|
|
relays in the bandwidth file.
|
|
|
|
Because Tor versions released before April 2019 (see section 1.4. for
|
|
the full list of versions) ignore "vote=0", generator implementations
|
|
MUST NOT change the bandwidths for under_min_report relays. Using the
|
|
same bw value makes authorities that do not understand "vote=0"
|
|
or "under_min_report=1" produce votes that don't change relay weights
|
|
too much. It also avoids flapping when the reporting threshold is
|
|
reached.
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"unmeasured=" bool
|
|
|
|
[Zero or one time.]
|
|
|
|
If the value is 1, this relay was not successfully measured and
|
|
Tor bandwidth authorities MAY NOT vote on this relay.
|
|
(Current Tor versions do not change their behaviour based on
|
|
the "unmeasured" key.)
|
|
|
|
If the value is 0 or the KeyValue is not present, this relay
|
|
was successfully measured.
|
|
|
|
Because Tor versions released before April 2019 (see section 1.4. for
|
|
the full list of versions) ignore "vote=0", generator implementations
|
|
MUST set "bw=1" for unmeasured relays. Using the minimum bw value
|
|
makes authorities that do not understand "vote=0" or "unmeasured=1"
|
|
produce votes that don't change relay weights too much.
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
"vote=" bool
|
|
|
|
[Zero or one time.]
|
|
|
|
If the value is 0, Tor directory authorities SHOULD ignore the relay's
|
|
entry in the bandwidth file. They SHOULD vote for the relay the same
|
|
way they would vote for a relay that is not present in the file.
|
|
|
|
This MAY be the case when this relay was not successfully measured but
|
|
it is included in the Bandwidth File, to diagnose why they were not
|
|
measured.
|
|
|
|
If the value is 1 or the KeyValue is not present, Tor directory
|
|
authorities MUST use the relay's bw value in any votes for that relay.
|
|
|
|
Implementations MUST also set "bw=1" for unmeasured relays.
|
|
But they MUST NOT change the bw for under_min_report relays.
|
|
(See the explanations under "unmeasured" and "under_min_report"
|
|
for more details.)
|
|
|
|
This KeyValue was added in version 1.4.0 of this specification.
|
|
|
|
2.4.2.2. Torflow
|
|
|
|
Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].
|
|
|
|
References:
|
|
|
|
1. https://gitweb.torproject.org/torflow.git
|
|
2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
|
|
The Torflow specification is outdated, and does not match the current
|
|
implementation. See section A.1. for the format produced by Torflow.
|
|
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
|
|
4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt
|
|
5. https://semver.org/
|
|
|
|
A. Sample data
|
|
|
|
The following has not been obtained from any real measurement.
|
|
|
|
A.1. Generated by Torflow
|
|
|
|
This an example version 1.0.0 document:
|
|
|
|
1523911758
|
|
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
|
|
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
|
|
|
|
A.2. Generated by sbws version 0.1.0
|
|
|
|
1523911758
|
|
version=1.1.0
|
|
software=sbws
|
|
software_version=0.1.0
|
|
latest_bandwidth=2018-04-16T20:49:18
|
|
file_created=2018-04-16T21:49:18
|
|
generator_started=2018-04-16T15:13:25
|
|
earliest_bandwidth=2018-04-16T15:13:26
|
|
====
|
|
bw=380 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26
|
|
bw=189 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36
|
|
|
|
A.3. Generated by sbws version 1.0.3
|
|
|
|
1523911758
|
|
version=1.2.0
|
|
latest_bandwidth=2018-04-16T20:49:18
|
|
file_created=2018-04-16T21:49:18
|
|
generator_started=2018-04-16T15:13:25
|
|
earliest_bandwidth=2018-04-16T15:13:26
|
|
minimum_number_eligible_relays=3862
|
|
minimum_percent_eligible_relays=60
|
|
number_consensus_relays=6436
|
|
number_eligible_relays=6000
|
|
percent_eligible_relays=93
|
|
software=sbws
|
|
software_version=1.0.3
|
|
=====
|
|
bw=38000 bw_mean=1127824 bw_median=1180062 desc_avg_bw=1073741824 desc_obs_bw_last=17230879 desc_obs_bw_mean=14732306 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26
|
|
bw=1 bw_mean=199162 bw_median=185675 desc_avg_bw=409600 desc_obs_bw_last=836165 desc_obs_bw_mean=858030 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36
|
|
|
|
A.3.1. When there are not enough eligible measured relays:
|
|
|
|
1540496079
|
|
version=1.2.0
|
|
earliest_bandwidth=2018-10-20T19:35:52
|
|
file_created=2018-10-25T19:35:03
|
|
generator_started=2018-10-25T11:42:56
|
|
latest_bandwidth=2018-10-25T19:34:39
|
|
minimum_number_eligible_relays=3862
|
|
minimum_percent_eligible_relays=60
|
|
number_consensus_relays=6436
|
|
number_eligible_relays=2960
|
|
percent_eligible_relays=46
|
|
software=sbws
|
|
software_version=1.0.3
|
|
=====
|
|
|
|
A.4. Headers generated by sbws version 1.0.4
|
|
|
|
1523911758
|
|
version=1.3.0
|
|
latest_bandwidth=2018-04-16T20:49:18
|
|
destinations_countries=TH,ZZ
|
|
file_created=2018-04-16T21:49:18
|
|
generator_started=2018-04-16T15:13:25
|
|
earliest_bandwidth=2018-04-16T15:13:26
|
|
minimum_number_eligible_relays=3862
|
|
minimum_percent_eligible_relays=60
|
|
number_consensus_relays=6436
|
|
number_eligible_relays=6000
|
|
percent_eligible_relays=93
|
|
scanner_country=SN
|
|
software=sbws
|
|
software_version=1.0.4
|
|
=====
|
|
|
|
A.5 Generated by sbws version 1.1.0
|
|
|
|
1523911758
|
|
version=1.4.0
|
|
latest_bandwidth=2018-04-16T20:49:18
|
|
destinations_countries=TH,ZZ
|
|
file_created=2018-04-16T21:49:18
|
|
generator_started=2018-04-16T15:13:25
|
|
earliest_bandwidth=2018-04-16T15:13:26
|
|
minimum_number_eligible_relays=3862
|
|
minimum_percent_eligible_relays=60
|
|
number_consensus_relays=6436
|
|
number_eligible_relays=6000
|
|
percent_eligible_relays=93
|
|
recent_measurement_attempt_count=6243
|
|
recent_measurement_failure_count=732
|
|
recent_measurements_excluded_error_count=969
|
|
recent_measurements_excluded_few_count=3946
|
|
recent_measurements_excluded_near_count=90
|
|
recent_measurements_excluded_old_count=0
|
|
recent_priority_list_count=20
|
|
recent_priority_relay_count=6243
|
|
scanner_country=SN
|
|
software=sbws
|
|
software_version=1.1.0
|
|
time_to_report_half_network=57273
|
|
=====
|
|
bw=1 error_circ=1 error_destination=0 error_misc=0 error_second_relay=0 error_stream=0 master_key_ed25519=J3HQ24kOQWac3L1xlFLp7gY91qkb5NuKxjj1BhDi+m8 nick=snap269 node_id=$DC4D609F95A52614D1E69C752168AF1FCAE0B05F relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=1 relay_recent_measurements_excluded_near_count=3 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=3 time=2019-03-16T18:20:57 unmeasured=1 vote=0
|
|
bw=1 error_circ=0 error_destination=0 error_misc=0 error_second_relay=0 error_stream=2 master_key_ed25519=h6ZB1E1yBFWIMloUm9IWwjgaPXEpL5cUbuoQDgdSDKg nick=relay node_id=$C4544F9E209A9A9B99591D548B3E2822236C0503 relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=2 relay_recent_measurements_excluded_few_count=1 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=1 time=2019-03-17T06:50:58 unmeasured=1 vote=0
|
|
|
|
B. Scaling bandwidths
|
|
|
|
B.1. Scaling requirements
|
|
|
|
Tor accepts zero bandwidths, but they trigger bugs in older Tor
|
|
implementations. Therefore, scaling methods SHOULD perform the
|
|
following checks:
|
|
* If the total bandwidth is zero, all relays should be given equal
|
|
bandwidths.
|
|
* If the scaled bandwidth is zero, it should be rounded up to one.
|
|
|
|
Initial experiments indicate that scaling may not be needed for
|
|
torflow and sbws, because their measured bandwidths are similar
|
|
enough already.
|
|
|
|
B.2. A linear scaling method
|
|
|
|
If scaling is required, here is a simple linear bandwith scaling
|
|
method, which ensures that all bandwidth votes contain approximately
|
|
the same total bandwidth:
|
|
|
|
1. Calculate the relay quota by dividing the total measured bandwidth
|
|
in all votes, by the number of relays with measured bandwidth
|
|
votes. In the public tor network, this is approximately 7500 as of
|
|
April 2018. The quota should be a consensus parameter, so it can be
|
|
adjusted for all generators on the network.
|
|
|
|
2. Calculate a vote quota by multiplying the relay quota by the number
|
|
of relays this bandwidth authority has measured
|
|
bandwidths for.
|
|
|
|
3. Calculate a scaling factor by dividing the vote quota by the
|
|
total unscaled measured bandwidth in this bandwidth
|
|
authority's upcoming vote.
|
|
|
|
4. Multiply each unscaled measured bandwidth by the scaling
|
|
factor.
|
|
|
|
Now, the total scaled bandwidth in the upcoming vote is
|
|
approximately equal to the quota.
|
|
|
|
B.3. Quota changes
|
|
|
|
If all generators are using scaling, the quota can be gradually
|
|
reduced or increased as needed. Smaller quotas decrease the size
|
|
of uncompressed consensuses, and may decrease the size of
|
|
consensus diffs and compressed consensuses. But if the relay
|
|
quota is too small, some relays may be over- or under-weighted.
|
|
|
|
B.4. Torflow aggreation
|
|
|
|
Torflow implements two methods to compute the bandwidth values from the
|
|
(stream) bandwidth measurements: with and without PID control feedback.
|
|
The method described here is without PID control (see Torflow
|
|
specification, section 2.2).
|
|
|
|
In the following sections, the relays' measured bandwidth refer to the
|
|
ones that this bandwidth authority has measured for the relays that
|
|
would be included in the next bandwidth authority's upcoming vote.
|
|
|
|
1. Calculate the filtered bandwidth for each relay:
|
|
- choose the relay's measurements (`bw_j`) that are equal or greater
|
|
than the mean of the measurements for this relay
|
|
- calculate the mean of those measurements
|
|
|
|
In pseudocode:
|
|
|
|
bw_filt_i = mean(max(mean(bw_j), bw_j))
|
|
|
|
2. Calculate network averages:
|
|
- calculate the filtered average by dividing the sum of all the
|
|
relays' filtered bandwidth by the number of relays that have been
|
|
measured (`n`), ie, calculate the mean average of the relays'
|
|
filtered bandwidth.
|
|
- calculate the stream average by dividing the sum of all the
|
|
relays' filtered bandwidth by the number of relays that have been
|
|
measured (`n`), ie, calculate the mean average or the relays'
|
|
measured bandwidth.
|
|
|
|
In pseudocode:
|
|
|
|
bw_avg_filt_ = bw_filt_i / n
|
|
bw_avg_strm = bw_i / n
|
|
|
|
3. Calculate ratios for each relay:
|
|
- calculate the filtered ratio by dividing each relay filtered
|
|
bandwidth by the filtered average
|
|
- calculate the stream ratio by dividing each relay measured
|
|
bandwidth by the stream average
|
|
|
|
In pseudocode:
|
|
|
|
r_filt_i = bw_filt_i / bw_avg_filt
|
|
r_strm_i = bw_i / bw_avg_strm
|
|
|
|
4. Calculate the final ratio for each relay:
|
|
The final ratio is the larger between the filtered bandwidth and the
|
|
stream bandwidth.
|
|
|
|
In pseudocode:
|
|
|
|
r_i = max(r_filt_i, r_strm_i)
|
|
|
|
5. Calculate the scaled bandwidth for each relay:
|
|
The most recent descriptor observed bandwidth (`bw_obs_i`) is
|
|
multiplied by the ratio
|
|
|
|
In pseudocode:
|
|
|
|
bw_new_i = r_i * bw_obs_i
|
|
|
|
<<In this way, the resulting network status consensus bandwidth
|
|
values are effectively re-weighted proportional to how much faster
|
|
the node was as compared to the rest of the network.>>
|