Commit Graph

949 Commits

Author SHA1 Message Date
Karsten Loesing
2e8cdf7fe1 Move sanitizing code to one class per type.
Part of #20542.
2020-12-01 10:38:26 +01:00
Karsten Loesing
47a4c7a962 Make some minor optimizations to bridgedescs code.
Part of #20542.
2020-11-30 22:45:22 +01:00
Karsten Loesing
a2fdbf3c6f Move lower-level sanitizing code to its own class.
Part of #20542.
2020-11-30 22:00:17 +01:00
Karsten Loesing
c0ee1a6cf7 Update most of the bridgedescs module to NIO.
Replace all File references with their Path equivalents, and use Files
methods wherever feasible.

Part of #20542.
2020-11-29 00:05:47 +01:00
Karsten Loesing
1068524255 Simplify the bridgedescs module.
The separation between BridgeSnapshotReader, BridgeDescriptorParser,
and SanitizedBridgesWriter doesn't make much sense anymore:

 - BridgeSnapshotReader only has a constructor of more than 200 lines
   of code.

 - BridgeDescriptorParser actually only determines the descriptor type
   and

 - SanitizedBridgesWriter performs parsing and obfuscation.

There are better ways to structure this code. The first step in that
direction is to remove clutter by moving the code to read bridge
snapshots to SanitizedBridgesWriter and deleting the other two
classes.

Part of #20542.
2020-11-28 22:14:53 +01:00
Karsten Loesing
f1c7198ac4 Add change log entry for #34030 fix. 2020-11-28 20:35:22 +01:00
Karsten Loesing
42a0dd2809 Correctly index files that are moved away and back.
The indexer did not handle a (mostly theoretic) edge case of a file
being moved away and then moved back shortly after. In such a case the
file should not be marked for deletion anymore and it should be
included in the index again. That's what this commit does.

The other minor changes to unit tests are just cosmetic.

Fixes #34030.
2020-11-28 11:14:21 +01:00
Karsten Loesing
cd15f34463 Fix minor issue with cleaning up directories.
One of the previously made changes to cleaning up directories was that
empty directories were deleted. This was necessary, because otherwise
there would be a growing number of directories as files get deleted
after reaching an age of seven weeks.

However, this change should not have included deleting the cleaned up
directory itself. In practice, this will not happen. But in tests it's
certainly possible that a directory is empty and then gets deleted.
This leads to all sorts of problems in tests.

The fix is to limit deleting empty directories to subdirectories.
That's what this commit does.
2020-11-28 11:11:26 +01:00
Karsten Loesing
66ddc4d7d9 Delete files in out/ that are older than 7 weeks.
Fixes #21219.
2020-11-27 11:11:21 +01:00
Karsten Loesing
91d10a0960 Bump version to 1.16.1-dev. 2020-11-25 16:09:01 +01:00
Karsten Loesing
63562f1ce1 Prepare for 1.16.1 release. 2020-08-16 22:22:54 +02:00
Karsten Loesing
6e2a8bc1fe Update to metrics-lib 2.14.0. 2020-08-16 22:21:59 +02:00
Karsten Loesing
b3cc8fa20f Prepare for 1.16.0 release. 2020-08-05 11:15:36 +02:00
Karsten Loesing
427b2c63cd Update to latest metrics-base. 2020-08-05 11:14:01 +02:00
Karsten Loesing
295e2c69de Retain ipv6- lines in bridge extra-infos.
These lines have been added by proposal 313 and are usually not
included by bridges. But apparently some bridges include them anyway,
probably bridges that have been configured as non-bridge relays
before. We should retain them just like we retain other statistics
lines.
2020-08-05 11:13:50 +02:00
Karsten Loesing
52236731f8 Bump version to 1.15.2-dev. 2020-05-17 09:48:51 +02:00
Karsten Loesing
e842a7d588 Prepare for 1.15.2 release. 2020-05-17 09:45:26 +02:00
Karsten Loesing
9b293ee35f Bump version to 1.15.1-dev. 2020-04-30 22:25:57 +02:00
Karsten Loesing
e3ae1ee868 Prepare for 1.15.1 release. 2020-04-30 22:21:27 +02:00
Karsten Loesing
86b559a6be Bump version to 1.15.0-dev. 2020-04-30 18:49:01 +02:00
Karsten Loesing
2a0c40f54b Prepare for 1.15.0 release. 2020-04-30 17:42:00 +02:00
Karsten Loesing
0f5536ed68 Archive OnionPerf analysis .json files.
Implements #34072.
2020-04-30 17:34:35 +02:00
Karsten Loesing
a87ce0d02f Extend descriptorCutOff by 6 hours.
Fixes #19828.
2020-04-28 15:03:40 +02:00
Karsten Loesing
2b90d656d1 Set default locale US and default time zone UTC.
Part of #33655.
2020-04-01 12:45:07 +02:00
Karsten Loesing
77d9429797 Simplify logging configuration.
Implements #33549.
2020-03-31 09:18:17 +02:00
Karsten Loesing
145045478f Bump version to 1.14.1-dev. 2020-01-16 12:16:30 +01:00
Karsten Loesing
b5ac823aec Prepare for 1.14.1 release. 2020-01-16 12:10:12 +01:00
Karsten Loesing
0e9aedef74 Fix smoke test for recent scheduler changes. 2020-01-16 12:10:05 +01:00
Karsten Loesing
a6359c6074 Fix non-RunOnce mode.
This was accidentally broken in #32554. The RunOnce mode worked just
fine, but the non-RunOnce mode terminated immediately after scheduling
tasks.
2020-01-16 11:40:35 +01:00
Karsten Loesing
c75f0c781e Bump version to 1.14.0-dev. 2020-01-15 23:20:01 +01:00
Karsten Loesing
3a9f05e01f Prepare for 1.14.0 release. 2020-01-15 23:07:02 +01:00
Karsten Loesing
27e41ea739 Update to metrics-lib 2.10.0. 2020-01-15 22:59:26 +01:00
Karsten Loesing
741401a0da Remember processed files between module runs.
The three recently added modules to archive Snowflake statistics,
bridge pool assignments, and BridgeDB metrics have in common that they
process any input files regardless of whether they already processed
them before.

The problem is that the input files processed by these modules are
either never removed (Snowflake statistics) or only removed manually
by the operator (bridge pool assignments and BridgeDB statistics).

The effect is that non-recent BridgeDB metrics and bridge pool
assignments are being placed in the indexed/recent/ directory in the
next execution after they are deleted for being older than 72 hours.
The same would happen with Snowflake statistics after the operator
removes them from the out/ directory.

The fix is to use a state file containing file names of previously
processed files and only process a file not found in there. This is
the same approach as taken for bridge descriptor tarballs.
2020-01-15 22:56:30 +01:00
Karsten Loesing
d2a74b676a Update copyright to 2020. 2020-01-15 21:36:34 +01:00
Karsten Loesing
d48163379c Avoid reprocessing webstats files.
Web servers typically provide us with the last 14 days of request
logs. We shouldn't process the whole 14 days over and over. Instead we
should only process new logs files and any other log files containing
log lines from newly written dates.

In some cases web servers stop serving a given virtual host or stop
acting as web server at all. However, in these cases we're left with
14 days of logs per virtual host. Ideally, these logs would get
cleaned up, but until that's the case, we should at least not
reprocess these files over and over.

In order to avoid reprocessing webstats files, we need a new state
file with log dates contained in given input files. We use that state
file to determine which of the previously processed webstats files to
re-process, so that we can write complete daily logs.
2020-01-14 17:03:24 +01:00
Karsten Loesing
3002d6bc6b Add some real tests for the webstats module. 2020-01-14 17:03:17 +01:00
Karsten Loesing
8263cc7bdb Remove dependency on metrics-lib's log package (4/4).
- Remove package-internal abstract class.
2019-11-25 17:02:07 +01:00
Karsten Loesing
c11b61465a Remove dependency on metrics-lib's log package (3/4).
- Remove package-internal interfaces InternalLogDescriptor and
   InternalWebServerAccessLog.
2019-11-25 17:01:09 +01:00
Karsten Loesing
ea1b1b4f6a Remove dependency on metrics-lib's log package (2/4).
- Remove unused code.
2019-11-25 17:01:00 +01:00
Karsten Loesing
859476ecae Remove dependency on metrics-lib's log package (1/4).
- Copy types from metrics-lib to this code base.
 - Update package and import statements.
 - Copy remaining parts of metrics-lib's FileType.
2019-11-25 17:00:44 +01:00
Karsten Loesing
cc3aa57e57 Remove dependency on DescriptorIndexCollector. 2019-11-22 18:49:46 +01:00
Karsten Loesing
5a0e6be21c Remove dependency on metrics-lib's internal package.
The only functionality contained in metrics-lib's internal package is
file (de-)compression, which in turn uses a third-party library that
we're using anyway. This is a weak reason for depending on our own
library for this functionality. Removing this dependency will make it
easier to make changes to our library in the future.

The new FileType class is based on a copy of the same enum type in
metrics-lib without @since tags and without methods that we don't use.
2019-11-22 18:01:11 +01:00
Karsten Loesing
de10fcd5b3 Stop checking and reloading configuration file.
Removes a deprecation warning and simplifies code.

Implements #32554.
2019-11-22 12:03:57 +01:00
Karsten Loesing
edf505ae06 Remove String[][] as configuration value type. 2019-11-21 09:49:49 +01:00
Karsten Loesing
ccdad654f2 Remove final modifier from private method. 2019-11-20 16:44:25 +01:00
Karsten Loesing
86f8001590 Make inner class static. 2019-11-20 15:14:15 +01:00
Karsten Loesing
5d5d2eba7f Tweak DownloaderTest a bit. 2019-11-20 12:50:57 +01:00
Karsten Loesing
e25b39757c Fix logging bug. 2019-11-20 12:50:57 +01:00
Karsten Loesing
3bbc4db433 Fix two JavaDocs issues with package-info.java. 2019-11-20 12:50:57 +01:00
Karsten Loesing
7338d79d1a Use StandardCharsets.US_ASCII instead of "US-ASCII". 2019-11-20 12:50:57 +01:00