Update proposal 160 with comments from mailinglist.

Also add implementation details and a timestampt to the output of 161.
This commit is contained in:
Mike Perry 2009-05-19 21:24:18 -07:00
parent aac6585f22
commit f0cffa2b3d
2 changed files with 64 additions and 17 deletions

View File

@ -37,10 +37,8 @@ Target: 0.2.2.x
the number of authorities we have.
The better fix is to allow certain authorities to specify that they are
voting on bandwidth "offsets": how much they think the weight should
be changed for the relay in question. We should put the offset vote in
the stanza for the relay in question, so a given authority can choose
which relays to express preferences for and which not.
voting on bandwidth measurements: more accurate bandwidth values that
have actually been evaluated. In this way, authorities
3. Security implications
@ -64,15 +62,41 @@ Target: 0.2.2.x
First, we need a new consensus method to support this new calculation.
Now v3 votes can have a new weight on the "w" line:
"Bandwidth_Offset=" INT.
Now v3 votes can have an additional value on the "w" line:
"w Bandwidth=X Measured=" INT.
Once we're using the new consensus method, the new way to compute the
Bandwidth weight is by taking the old vote (explained in proposal 141:
median, then choose the lower number in the case of ties), and adding
or subtracting the median offset (using the offset closer to 0 in the
case of ties, and with a sum of 0 if the sum is negative).
Bandwidth weight is by checking if there are at least 3 "Measured"
votes. If so, the median of these is taken. Otherwise, the median
of the "Bandwidth=" values are taken, as described in Proposal 141.
Then the actual consensus looks just the same as it did before,
so clients never have to know that this additional calculation is
happening.
5. Implementation
The Measured values will be read from a file provided by the scanners
described in proposal 161. Files with a timestamp older than 3 days
will be ignored.
The file will be read in from dirserv_generate_networkstatus_vote_obj()
in a location specified by a new config option "V3MeasuredBandwidths".
A helper function will be called to populate new 'measured' and
'has_measured' fields of the routerstatus_t 'routerstatuses' list with
values read from this file.
An additional for_vote flag will be passed to
routerstatus_format_entry() from format_networkstatus_vote(), which will
indicate that the "Measured=" string should be appended to the "w Bandwith="
line with the measured value in the struct.
routerstatus_parse_entry_from_string() will be modified to parse the
"Measured=" lines into routerstatus_t struct fields.
Finally, networkstatus_compute_consensus() will set rs_out.bandwidth
to the median of the measured values if there are more than 3, otherwise
it will use the bandwidth value median as normal.

View File

@ -66,10 +66,7 @@ Status: Open
Dividing by the network-wide average has the advantage that it will
account for issues related to unbalancing between higher vs lower
capacity, such as Steven Murdoch's queuing theory weighting result.
Dividing by the slice average has the advantage that many scans can
be run in parallel from a single authority, and that results are
typically available sooner after a given scan takes place.
For this reason, we will opt for network-wide averages.
5. Ratio Filtering
@ -142,7 +139,26 @@ Status: Open
does not set us back any in that regard.
8. Integration with Proposal 160
8. Parallelization
Because each slice takes as long as 6 hours to complete, we will want
to parallelize as much as possible. This will be done by concurrently
running multiple scanners from each authority to deal with different
segments of the network. Each scanner piece will continually loop
over a portion of the network, outputting files of the form:
node_id=<idhex> SP strm_bw=<BW_measured(N)> SP
filt_bw=<BW_Norm_measured(N)> NL
The most recent file from each scanner will be periodically gathered
by another script that uses them to produce network-wide averages
and calculate ratios as per the algorithm in section 6. Because nodes
may shift in capacity, they may appear in more than one slice and/or
appear more than once in the file set. The line that yields a ratio
closest to 1.0 will be chosen in this case.
9. Integration with Proposal 160
The final results will be produced for the voting mechanism
described in Proposal 160 by multiplying the derived ratio by
@ -158,8 +174,15 @@ Status: Open
This will produce a new bandwidth value that will be output into a
file consisting of lines of the form:
node_id=<idhex> SP bw=<Bw_new> NL
node_id=<idhex> SP bw=<Bw_new> NL
The first line of the file will contain a timestamp in UNIX time()
seconds. This will be used by the authority to decide if the
measured values are too old to use.
This file can be either copied or rsynced into a directory readable
by the directory authority.
1. Exact values for each segment are still being determined via
test scans.