collector

mirror of https://github.com/torproject/collector.git synced 2024-11-23 09:29:46 +00:00

Author	SHA1	Message	Date
Karsten Loesing	2e8cdf7fe1	Move sanitizing code to one class per type. Part of #20542.	2020-12-01 10:38:26 +01:00
Karsten Loesing	47a4c7a962	Make some minor optimizations to bridgedescs code. Part of #20542.	2020-11-30 22:45:22 +01:00
Karsten Loesing	a2fdbf3c6f	Move lower-level sanitizing code to its own class. Part of #20542.	2020-11-30 22:00:17 +01:00
Karsten Loesing	c0ee1a6cf7	Update most of the bridgedescs module to NIO. Replace all File references with their Path equivalents, and use Files methods wherever feasible. Part of #20542.	2020-11-29 00:05:47 +01:00
Karsten Loesing	1068524255	Simplify the bridgedescs module. The separation between BridgeSnapshotReader, BridgeDescriptorParser, and SanitizedBridgesWriter doesn't make much sense anymore: - BridgeSnapshotReader only has a constructor of more than 200 lines of code. - BridgeDescriptorParser actually only determines the descriptor type and - SanitizedBridgesWriter performs parsing and obfuscation. There are better ways to structure this code. The first step in that direction is to remove clutter by moving the code to read bridge snapshots to SanitizedBridgesWriter and deleting the other two classes. Part of #20542.	2020-11-28 22:14:53 +01:00
Karsten Loesing	f1c7198ac4	Add change log entry for #34030 fix.	2020-11-28 20:35:22 +01:00
Karsten Loesing	42a0dd2809	Correctly index files that are moved away and back. The indexer did not handle a (mostly theoretic) edge case of a file being moved away and then moved back shortly after. In such a case the file should not be marked for deletion anymore and it should be included in the index again. That's what this commit does. The other minor changes to unit tests are just cosmetic. Fixes #34030.	2020-11-28 11:14:21 +01:00
Karsten Loesing	cd15f34463	Fix minor issue with cleaning up directories. One of the previously made changes to cleaning up directories was that empty directories were deleted. This was necessary, because otherwise there would be a growing number of directories as files get deleted after reaching an age of seven weeks. However, this change should not have included deleting the cleaned up directory itself. In practice, this will not happen. But in tests it's certainly possible that a directory is empty and then gets deleted. This leads to all sorts of problems in tests. The fix is to limit deleting empty directories to subdirectories. That's what this commit does.	2020-11-28 11:11:26 +01:00
Karsten Loesing	66ddc4d7d9	Delete files in out/ that are older than 7 weeks. Fixes #21219.	2020-11-27 11:11:21 +01:00
Karsten Loesing	91d10a0960	Bump version to 1.16.1-dev.	2020-11-25 16:09:01 +01:00
Karsten Loesing	63562f1ce1	Prepare for 1.16.1 release.	2020-08-16 22:22:54 +02:00
Karsten Loesing	6e2a8bc1fe	Update to metrics-lib 2.14.0.	2020-08-16 22:21:59 +02:00
Karsten Loesing	b3cc8fa20f	Prepare for 1.16.0 release.	2020-08-05 11:15:36 +02:00
Karsten Loesing	427b2c63cd	Update to latest metrics-base.	2020-08-05 11:14:01 +02:00
Karsten Loesing	295e2c69de	Retain ipv6- lines in bridge extra-infos. These lines have been added by proposal 313 and are usually not included by bridges. But apparently some bridges include them anyway, probably bridges that have been configured as non-bridge relays before. We should retain them just like we retain other statistics lines.	2020-08-05 11:13:50 +02:00
Karsten Loesing	52236731f8	Bump version to 1.15.2-dev.	2020-05-17 09:48:51 +02:00
Karsten Loesing	e842a7d588	Prepare for 1.15.2 release.	2020-05-17 09:45:26 +02:00
Karsten Loesing	9b293ee35f	Bump version to 1.15.1-dev.	2020-04-30 22:25:57 +02:00
Karsten Loesing	e3ae1ee868	Prepare for 1.15.1 release.	2020-04-30 22:21:27 +02:00
Karsten Loesing	86b559a6be	Bump version to 1.15.0-dev.	2020-04-30 18:49:01 +02:00
Karsten Loesing	2a0c40f54b	Prepare for 1.15.0 release.	2020-04-30 17:42:00 +02:00
Karsten Loesing	0f5536ed68	Archive OnionPerf analysis .json files. Implements #34072.	2020-04-30 17:34:35 +02:00
Karsten Loesing	a87ce0d02f	Extend descriptorCutOff by 6 hours. Fixes #19828.	2020-04-28 15:03:40 +02:00
Karsten Loesing	2b90d656d1	Set default locale US and default time zone UTC. Part of #33655.	2020-04-01 12:45:07 +02:00
Karsten Loesing	77d9429797	Simplify logging configuration. Implements #33549.	2020-03-31 09:18:17 +02:00
Karsten Loesing	145045478f	Bump version to 1.14.1-dev.	2020-01-16 12:16:30 +01:00
Karsten Loesing	b5ac823aec	Prepare for 1.14.1 release.	2020-01-16 12:10:12 +01:00
Karsten Loesing	0e9aedef74	Fix smoke test for recent scheduler changes.	2020-01-16 12:10:05 +01:00
Karsten Loesing	a6359c6074	Fix non-RunOnce mode. This was accidentally broken in #32554. The RunOnce mode worked just fine, but the non-RunOnce mode terminated immediately after scheduling tasks.	2020-01-16 11:40:35 +01:00
Karsten Loesing	c75f0c781e	Bump version to 1.14.0-dev.	2020-01-15 23:20:01 +01:00
Karsten Loesing	3a9f05e01f	Prepare for 1.14.0 release.	2020-01-15 23:07:02 +01:00
Karsten Loesing	27e41ea739	Update to metrics-lib 2.10.0.	2020-01-15 22:59:26 +01:00
Karsten Loesing	741401a0da	Remember processed files between module runs. The three recently added modules to archive Snowflake statistics, bridge pool assignments, and BridgeDB metrics have in common that they process any input files regardless of whether they already processed them before. The problem is that the input files processed by these modules are either never removed (Snowflake statistics) or only removed manually by the operator (bridge pool assignments and BridgeDB statistics). The effect is that non-recent BridgeDB metrics and bridge pool assignments are being placed in the indexed/recent/ directory in the next execution after they are deleted for being older than 72 hours. The same would happen with Snowflake statistics after the operator removes them from the out/ directory. The fix is to use a state file containing file names of previously processed files and only process a file not found in there. This is the same approach as taken for bridge descriptor tarballs.	2020-01-15 22:56:30 +01:00
Karsten Loesing	d2a74b676a	Update copyright to 2020.	2020-01-15 21:36:34 +01:00
Karsten Loesing	d48163379c	Avoid reprocessing webstats files. Web servers typically provide us with the last 14 days of request logs. We shouldn't process the whole 14 days over and over. Instead we should only process new logs files and any other log files containing log lines from newly written dates. In some cases web servers stop serving a given virtual host or stop acting as web server at all. However, in these cases we're left with 14 days of logs per virtual host. Ideally, these logs would get cleaned up, but until that's the case, we should at least not reprocess these files over and over. In order to avoid reprocessing webstats files, we need a new state file with log dates contained in given input files. We use that state file to determine which of the previously processed webstats files to re-process, so that we can write complete daily logs.	2020-01-14 17:03:24 +01:00
Karsten Loesing	3002d6bc6b	Add some real tests for the webstats module.	2020-01-14 17:03:17 +01:00
Karsten Loesing	8263cc7bdb	Remove dependency on metrics-lib's log package (4/4). - Remove package-internal abstract class.	2019-11-25 17:02:07 +01:00
Karsten Loesing	c11b61465a	Remove dependency on metrics-lib's log package (3/4). - Remove package-internal interfaces InternalLogDescriptor and InternalWebServerAccessLog.	2019-11-25 17:01:09 +01:00
Karsten Loesing	ea1b1b4f6a	Remove dependency on metrics-lib's log package (2/4). - Remove unused code.	2019-11-25 17:01:00 +01:00
Karsten Loesing	859476ecae	Remove dependency on metrics-lib's log package (1/4). - Copy types from metrics-lib to this code base. - Update package and import statements. - Copy remaining parts of metrics-lib's FileType.	2019-11-25 17:00:44 +01:00
Karsten Loesing	cc3aa57e57	Remove dependency on DescriptorIndexCollector.	2019-11-22 18:49:46 +01:00
Karsten Loesing	5a0e6be21c	Remove dependency on metrics-lib's internal package. The only functionality contained in metrics-lib's internal package is file (de-)compression, which in turn uses a third-party library that we're using anyway. This is a weak reason for depending on our own library for this functionality. Removing this dependency will make it easier to make changes to our library in the future. The new FileType class is based on a copy of the same enum type in metrics-lib without @since tags and without methods that we don't use.	2019-11-22 18:01:11 +01:00
Karsten Loesing	de10fcd5b3	Stop checking and reloading configuration file. Removes a deprecation warning and simplifies code. Implements #32554.	2019-11-22 12:03:57 +01:00
Karsten Loesing	edf505ae06	Remove String[][] as configuration value type.	2019-11-21 09:49:49 +01:00
Karsten Loesing	ccdad654f2	Remove final modifier from private method.	2019-11-20 16:44:25 +01:00
Karsten Loesing	86f8001590	Make inner class static.	2019-11-20 15:14:15 +01:00
Karsten Loesing	5d5d2eba7f	Tweak DownloaderTest a bit.	2019-11-20 12:50:57 +01:00
Karsten Loesing	e25b39757c	Fix logging bug.	2019-11-20 12:50:57 +01:00
Karsten Loesing	3bbc4db433	Fix two JavaDocs issues with package-info.java.	2019-11-20 12:50:57 +01:00
Karsten Loesing	7338d79d1a	Use StandardCharsets.US_ASCII instead of "US-ASCII".	2019-11-20 12:50:57 +01:00

1 2 3 4 5 ...

949 Commits