We calculate throughput from the time between receiving 0.5 and 1 MiB
of a response, which obviously excludes any measurements with
responses smaller than 1 MiB. From the FILESIZE and DATAPERC* fields
we can compute the number of milliseconds that have elapsed between
receiving bytes 524,288 and 1,048,576, which is a total of 524,288
bytes or 4,194,304 bits. We divide the value 4,194,304 by this time
difference to obtain throughput in bits per millisecond which happens
to be the same value as the number of kilobits per second.
Implements #29772.
This patch adds two new lines to the existing circuit round-trip
latencies graph: lowest and highest measurements that are not
outliers.
Implements #29773.
This is a hotfix to work around the issue described in #30351.
Hopefully, we'll come up with a better fix that doesn't go backwards
from tidyr to reshape2.
Turns out that we didn't specify the sorting order of
userstats-combined.csv. However, different platforms produced
consistently different outputs. Let's just define sort order to make
the output deterministic, even across platforms.
With this patch we're not overwriting bandwidth history parts with
whichever history comes last, but we're computing the maximum value
for each 15-minute interval of all imported bandwidth histories. This
makes bandwidth.csv independent of descriptor import order.
We're later going to include these measurements in graphs, but that
requires more changes. For now, let's ignore these measurements rather
than include them in the v2 onion results. They will be in the
database, so we'll be able to make those graphs including past
measurements.
Turns out that op-ab's domain name matches our '%.onion%' regex, that
we're using to distinguish public from onion server requests. Trying a
bit harder to distinguish the two.
Related to #29107.
In most cases these functions would call their prepare_* equivalents,
possibly tweak the result, and write it to a .csv file. This patch
moves all those tweaks to the prepare_* functions, possibly reverts
them in the plot_* functions, and makes the write_* functions
obsolete.
The result is not only less code. We're also going to find bugs in
written .csv files sooner, because the same code is now run for
writing graph files, and the latter happens much more often.
The mere size of this function made it hard to impossible to refactor
things to using more recent R packages dplyr and tidyr. Now there are
four plot_userstats_* functions with accompanying prepare_userstats_*
that make the corresponding write_userstats_* functions really small.
Turns out we never skipped previously imported webstats files due to
two bugs:
1. While building a list of previously imported webstats files we
reassembled their file names as ${server}_${site}_* rather than
${site}_${server}_* which was the file name format we chose in an
earlier version of the CollecTor module.
2. When checking whether a given webstats file already exists in the
database we compared the full file name to the reassembled file
name from the database with ${server} being truncated to 32
characters.
This commit fixes both bugs.
Previously, we used Java to write .sql files, imported them using
psql, and afterwards made queries via psql. Now we're using Java to
interact with the database directly. This is another step towards
making the daily updater Java-only.
Last month, in commit f8fa108 where we modernized the legacy module
and renamed it to bwhist, we split up the closeConnection() into one
method commit() to commit changes and another method closeConnection()
to close the connection. However, we somehow forgot to invoke the
commit() method.
This had two effects:
1. Newly added data was not made persistent in the database. This
lead to a moving window of roughly one week for new data and an
increasing gap between the last committed data and this 1-week
window.
2. The result of aggregating newly added data was not made
persistent. So, even after fixing the first issue above, we
accumulated newly added data, rather than only keeping the most
recent two weeks. This made the database slower over time.
This change adds two commit() calls at the right places.
The "Advertised and consumed bandwidth by relay flags" graph now
contains everything that's contained in the "Total relay bandwidth"
and the "Consumed bandwidth by Exit/Guard flag combination" graphs.
Removing these two graphs as obsolete.
Also update documentation for the newly deployed "Advertised and
consumed bandwidth by relay flags" graph.
Part of #28353.
This graph now contains everything that's contained in the Total relay
bandwidth and the Consumed bandwidth by Exit/Guard flag combination
graph.
Removing those graphs will be done in a separate commit.
Part of #28353.
OnionPerf results look to be comparable over time, but between vantage
points there are systematic deltas between the results. The "all"
plots show rises and falls where they actually don't exist, it's just
that a particular vantage point was offline so the average of the two
remaining moves noticeably.
In this commit we remove the source parameter from these graphs and
always include all sources separately in the graph, but not a
combination of all measurements together.
Implements #28603.
Over two years ago, in commit 1f90b72 from October 2016, we made our
user graphs faster by avoiding to read the large .csv file on demand.
Instead we read it once as part of the daily update, saved it to disk
as .RData file using R's save() function, and loaded it back to memory
using R's load() function when drawing a graph.
This approach worked okay. It just had two disadvantages:
1. We had to write a small amount of R code for each graph type,
which is why we only did it for graphs with large .csv files.
2. Running these small R script as part of the daily update made it
harder to move away from Ant towards a Java-only execution model.
The new approach implemented in this commit uses read_csv() fromt the
readr package which reads CSV files several times faster than
read.csv().
Requires installing the readr package from CRAN, which is available on
Debian in stretch-backports and later as r-cran-readr.
Implements #28799.