Commit Graph

468 Commits

Author SHA1 Message Date
Karsten Loesing
6052e22b08 Don't compute bridge descriptor digests.
These calls will fail for computing bridge descriptor digests, because
there are no `-----END SIGNATURE-----` lines in sanitized bridge
descriptors.
2017-06-06 15:08:55 +02:00
Karsten Loesing
a4d184bf94 Store raw descriptors as byte[], offset, and length.
Prior to this commit we read raw descriptor bytes from disk, split
them into serveral byte[] for each contained descriptor, and stored
those copies together with descriptors.  We further copied descriptor
parts, like signatures or status entries, and stored those copies as
well.

Overall, we temporarily required up to 3 times the size of descriptor
files just to store raw descriptor contents: 1) the entire descriptor
file read to memory, 2) copies of all contained descriptors, and 3)
copies of contained descriptor parts.  After moving on to the next
descriptor file, 1) was freed, but 2) and 3) remained in memory.  This
was rather wasteful.

With this commit we store raw descriptors as reference to the byte[]
containing the entire descriptor file plus offset and length of the
part containing one descriptor.  Similarly we store raw descriptor
parts as a reference to the full descriptor plus offset and length of
the descriptor part.  This saves a lot of memory, and it avoids
unnecessary array copying.

This change is also a step towards not storing raw descriptor contents
in memory at all, but instead leaving contents on disk and accessing
parts as needed.  However, this commit does not take that step yet.

The original purpose of this commit was to prepare switching from the
platform's default charset to UTF-8 for #21932.  The idea was to
reduce access to DescriptorImpl#rawDescriptorBytes and add all methods
working on those bytes, including converting them to a String, to
DescriptorImpl.  This commit achieves this purpose by preparing that
switch, yet it does not take that step, either.  Switching to UTF-8 is
midly backward-incompatible, so it'll have to wait until 2.0.0.
However, switching will be much easier based on the changes in this
commit.

Many of these changes in this commit are interdependent which makes it
difficult to split up this commit with reasonable effort.  Still, in
order to facilitate reviews, here is an explanation of changes made in
this commit from top to bottom:

Move all code for processing raw descriptor bytes from a) detecting
the descriptor type, b) finding descriptor starts and ends, up to c)
invoking the right DescriptorImpl subclass constructors from
DescriptorImpl and its subclasses over to DescriptorParserImpl.

Include offset and limit in the constructors of DescriptorImpl and
most of its subclasses.

Refer to directory and network status parts in RelayDirectoryImpl and
NetworkStatusImpl and its subclasses by offset and length rather than
passing copies of raw descriptors.

Provide two overloaded methods DescriptorImpl#newScanner() that
internally handle the byte[]-to-String conversion rather than leaving
this task to all DescriptorImpl subclasses.

In DescriptorImpl, rather than storing a copy of raw descriptor bytes
per descriptor, store a reference to a potentially larger byte[],
containing all descriptors read from a given file, together with
offset and length.

Provide various methods in DescriptorImpl that provide access to raw
descriptor bytes and that internally handle issues like unified
character encoding.

Include an XXX21932 tag in all places where byte[] is currently
converted to String using the platform's default charset.

Update existing methods in DescriptorImpl to only access
rawDescriptorBytes within offset and offset + length.

In classes referenced from DescriptorImpl subclasses, like
DirSourceEntryImpl and NetworkStatusEntryImpl, rather than storing a
copy of raw descriptor bytes, store a reference to the parent
DescriptorImpl instance together with offset and length.

Change raw descriptor bytes in ExitListEntryImpl into a String,
because the byte[] we stored there was never read from disk but
generated by ourselves using String#getBytes() using the platform's
default charset.  We also never used raw bytes in ExitListEntryImpl
anyway.  Admittedly, we could use offset and length there, too, but
the amount of saved memory is likely not worth the necessary code
changes.

Remove redundant zero-length checks from DescriptorImpl subclasses
including ExitListImpl, NetworkStatusImpl, and RelayDirectoryImpl.
These checks are redundant, because we already performed the same
checks in DescriptorImpl#countKeys().

Move commonly used helper methods for finding the first index of a
keyword or splitting descriptory by keyword from DescriptorImpl
subclasses, like NetworkStatusImpl and RelayDirectoryImpl, to
DescriptorImpl.

In test classes, replace the numerous invocations of DescriptorImpl
subclass constructors with local buildSomething() methods, so that
future changes to constructor signatures won't produce a diff as long
as this one.
2017-06-06 15:08:49 +02:00
Karsten Loesing
fadcaa4b20 Fix encoding bug in RelayDirectoryImpl and NetworkStatusImpl. 2017-06-06 15:03:50 +02:00
Karsten Loesing
232ea426b5 Fix bug in digest computation. 2017-06-06 15:03:40 +02:00
Karsten Loesing
74de0a5d7a Move descriptor digest computation to DescriptorImpl.
The main intention behind this change is to reduce the number of
places in the code where byte[] is converted to String.  But another
reason is to reduce code duplication, which would have been sufficient
to make this change.
2017-06-06 15:02:41 +02:00
Karsten Loesing
82f555ee35 Fix encoding of Microdescriptor's getDigestSha256Base64(). 2017-06-06 15:02:27 +02:00
Karsten Loesing
823fa496ce Fix bug in newly simplified method. 2017-06-04 17:37:18 +02:00
Karsten Loesing
38221ed097 Simplify and avoid repetition in parse helper methods.
Implements #22279.
2017-06-01 10:44:27 +02:00
Karsten Loesing
2362535963 Move latest change log entry to next release. 2017-05-26 16:14:17 +02:00
iwakeh
df6bdc8d9b Implements task-19607.
Use enums for keywords as well as enum sets and maps.
Use constants for repeated strings.
2017-05-26 16:08:57 +02:00
Karsten Loesing
bd52bcaeb5 Bump version to 1.7.0-dev. 2017-05-17 14:00:41 +02:00
Karsten Loesing
6941084f25 Prepare for 1.7.0 release. 2017-05-16 16:53:49 +02:00
Karsten Loesing
b82c054dbf Fix a few checkstyle complaints about whitespace. 2017-05-16 16:53:42 +02:00
Karsten Loesing
3c0b08135a Test for empty keys in more places. 2017-05-16 16:26:20 +02:00
iwakeh
08ccbf83f7 Reverse equality test in if-statements.
(This should be in checkstyle.)
2017-05-16 16:26:20 +02:00
iwakeh
8456cb154a Make all tests pass. Implements task-22217. 2017-05-16 16:26:20 +02:00
iwakeh
cdab58758a Let tests also verify error message (for all padding-count related tests).
Added some tests of which two don't pass yet.
2017-05-16 16:26:14 +02:00
Karsten Loesing
75844d046c Parse "padding-counts" lines in extra-info descriptors.
Implements #22217.
2017-05-16 16:23:43 +02:00
Karsten Loesing
f6252f909f Tweak change log. 2017-05-12 12:14:57 +02:00
Karsten Loesing
091fc90e17 Deprecate setFailUnrecognizedDescriptorLines().
Implements #22228.
2017-05-11 21:03:45 +02:00
Karsten Loesing
d242a2aa4d Add descriptor digest to vote and streamline method names.
Implements #20333.
2017-05-10 16:15:01 +02:00
Karsten Loesing
2187c711ad Add change log entry for #22190. 2017-05-10 14:26:07 +02:00
iwakeh
e927475c12 Make tests pass again and solve task-22190. 2017-05-09 15:17:09 +00:00
iwakeh
f483211795 Provided failing test for issue in task-22190. 2017-05-09 15:17:08 +00:00
iwakeh
3e3636b8d8 Extended JavaDoc explanation. Part of task-22190. 2017-05-09 15:17:07 +00:00
Karsten Loesing
2ed7d618b1 Tweak change log a bit. 2017-05-09 17:17:06 +02:00
Karsten Loesing
71885a8e69 Accept extra arguments in extra-info descriptors.
According to the specification it's valid to add extra arguments to
descriptor lines unless they are tagged with "[No extra arguments]".
This is not the case for any of the statistics-related lines in
extra-info descriptors, so we should allow extra arguments there.

Fixes #21934.
2017-05-09 17:15:32 +02:00
Karsten Loesing
ee696b09f0 Add support for six new key-value pairs added by OnionPerf.
OnionPerf adds six new key-value pairs to the .tpf format that
Torperf/CollecTor did not produce: ENDPOINTLOCAL, ENDPOINTPROXY,
ENDPOINTREMOTE, HOSTNAMELOCAL, HOSTNAMEREMOTE, and SOURCEADDRESS.

We should add support for these keys to metrics-lib, so that we can
start using their values.

Implements #22122.
2017-05-08 20:49:20 +02:00
Karsten Loesing
8f69d784be Don't skip unrecognized lines in certain cases.
When we started using Java 7's switch-on-String in 2b4d773, we broke
unrecognized line parsing in extra-info descriptors.  Namely, when we
reached the end of a crypto block we didn't reset the list for
collecting crypto lines.  So far so good, but any following
unrecognized lines would be collected as crypto lines and later
discarded, rather than being added to the unrecognized-lines list and
later reported.

This only affects relay descriptors, because sanitized bridge
descriptors don't contain crypto blocks.  And it only affects relay
descriptors with crypto blocks, like "identity-ed25519", whereas relay
extra-info descriptors published by older versions were not affected.

Fixes #21890.
2017-04-07 15:19:44 +02:00
Karsten Loesing
a57b1f7698 Update to latest metrics-base. 2017-03-14 09:29:07 +01:00
Karsten Loesing
b1ea641627 Add tutorial link and examples. 2017-03-13 20:18:21 +01:00
Karsten Loesing
5b1db5d72b Bump version to 1.6.0-dev. 2017-02-17 16:45:13 +01:00
Karsten Loesing
10042d3ce2 Prepare for 1.6.0 release. 2017-02-17 16:20:41 +01:00
Karsten Loesing
880c6036a3 Tweak Javadocs. 2017-02-17 16:20:41 +01:00
Karsten Loesing
0502046462 Deprecate three classes using HttpURLConnection.
Fixes #20323.
2017-02-17 09:17:54 +01:00
iwakeh
b3d4ff1738 Only create javadoc for api, implements part of task-21469. 2017-02-15 14:00:19 +01:00
Karsten Loesing
110cb01250 Parse "shared-rand-.*" lines in consensuses and votes. 2017-02-14 15:29:33 +01:00
Karsten Loesing
2bcd6bb0e4 Parse new protocol versions lines. 2017-02-14 15:29:27 +01:00
iwakeh
74ee2145ee Make the reader thread a daemon thread. 2017-02-01 16:15:04 +01:00
Karsten Loesing
8d09f56568 Avoid deleting extraneous local descriptor files.
DescriptorIndexCollector deletes descriptor files from a previous or
concurrent collect run if it doesn't collect those files itself.  This
is unexpected behavior and differs from what DescriptorCollectorImpl
does.

Fixes #20525.
2017-01-31 17:48:09 +01:00
Karsten Loesing
c3079ae2ec Remove 604 checkstyle complaints.
Resolves #21144.
2017-01-25 11:27:09 +01:00
Karsten Loesing
ed26984902 Update copyright. 2017-01-13 16:47:42 +01:00
Karsten Loesing
65b7327a82 Update to latest metrics-base. 2017-01-05 15:35:44 +01:00
iwakeh
b7c236e0fb Added changelog entry. 2017-01-05 15:35:44 +01:00
iwakeh
5b75c254ab Added development description. 2017-01-05 15:35:44 +01:00
iwakeh
e8df8cf6f2 Implements task-20596: use metrics-base and reduced build.xml,
added bootstrap script.

Removed obsolete metrics_checks.xml and made bootstrap-development.sh
executable.

Only add metrics-lib class files to release jar.
2017-01-05 15:35:32 +01:00
iwakeh
fa2d227527 Make tests pass again. Gson demands no-args constructors. 2017-01-05 15:33:15 +01:00
Karsten Loesing
9a61983055 Log more, throw fewer RuntimeExceptions. 2017-01-05 10:56:58 +01:00
hiromipaw
d1ab93ba98 Update license 2017-01-03 10:04:03 +01:00
iwakeh
3e37f25eb2 Added test and cure for corrupted history file.
In that case a warning is logged and parsing continued.

Warning makes sense, as it could be due to problems with the
file system, which an operator can do something about.
2016-12-20 09:02:50 +01:00