metrics-lib

mirror of https://github.com/torproject/metrics-lib.git synced 2025-02-11 20:37:39 +00:00

Author	SHA1	Message	Date
Karsten Loesing	7b026cf44a	Allow underscore in transport names. Example of a valid line that is now allowed: bridge-ip-transports meek=32,obfs3_websocket=8,websocket=64	2014-07-22 09:21:00 +02:00
Karsten Loesing	73e5a6989d	Avoid parsing descriptor contents to Maps. Extra-info descriptors contain lots of comma-separated key=value lists that we store in SortedMap instances. But those occupy a lot of memory, and it's not certain that we'll ever want to use the contained keys or values. New approach: when parsing a descriptor, use regular expressions to check if lines are valid, and delay parsing into maps until needed.	2014-06-18 15:12:46 +02:00
Karsten Loesing	be359c873a	Store relay flags more efficiently.	2014-06-18 15:12:46 +02:00
Karsten Loesing	f3a170fb74	Clear sets used to validate at-most-once/exactly-once keywords. Related to 5caa384. Similarly, keeping these sets around just wastes heap space.	2014-06-18 15:12:46 +02:00
Karsten Loesing	c439d346b9	Avoid parsing descriptor contents to Lists or Sets. If we can easily determine the number of List or Set elements, we can as well store their contents in arrays and convert those to List or Set instances when requested. This can save us some memory and doesn't cost much performance.	2014-06-18 15:12:41 +02:00
Karsten Loesing	557c2ccfd7	Always accept [SP\|TAB]+ as delimiter instead of just SP. Better fix for #12403.	2014-06-17 13:57:15 +02:00
Karsten Loesing	b1478b8fb5	Add unit tests for 2351cea.	2014-06-17 13:44:55 +02:00
Karsten Loesing	2351cead7a	Accept [SP\|TAB]+ as delimiter in two places. There have been at least two relays including an additional SP after their nickname in server and extra-info descriptors. The spec stays vague about whether this is allowed or not, but the directory authorities seem to accept these descriptors just fine. We should also accept these descriptors. We should probably accept [SP\|TAB]+ in more places. But right now we're losing data by discarding these descriptors. Let's do the quick fix now and the potentially cleaner fix later. Fixes #12403.	2014-06-16 07:32:55 +02:00
Karsten Loesing	a472d7f342	Store relay flags more efficiently. Turns out a TreeSet<String> requires more memory than a String[]. We can put together the TreeSet<String> when we need it.	2014-05-28 11:01:47 +02:00
Karsten Loesing	e3945981d1	Use a single DateFormat per thread and format. DateFormat is not thread-safe, and creating a new instance every time we need one only wastes CPU time. Make sure that there's a single instance per thread and format that the thread can use whenever it wants.	2014-05-28 11:01:47 +02:00
Karsten Loesing	5caa3848b0	Clear parsed keywords after verifying them. No need to keep them around. That's just a waste of heap space.	2014-05-28 11:01:22 +02:00
Karsten Loesing	a12e989e40	Store bandwidth histories more efficiently. We were storing bandwidth histories in TreeMap<Long, Long>() with keys being time in millis and values being bandwidth values. This showed up in profiles. It's far more (memory-)efficient to store bandwidth values in a long[] and put together the TreeMap when the caller requests it. And if the bandwidth history is evaluated exactly once, there should not even be a CPU overhead.	2014-05-27 21:09:48 +02:00
Karsten Loesing	8722de7044	Make queue size of descriptor reader configurable. By default, the descriptor reader puts up to 100 parsed descriptor files in a queue in order to hand them out as quickly as possible. But if descriptor files contain hundreds or even thousands of descriptors, that default may be too high. Add a new method to make it configurable.	2014-05-27 20:53:04 +02:00
Karsten Loesing	b298cbcbd1	Fix encoding problem when parsing multi-descriptor files. When we're parsing a descriptor file with potentially more than one descriptor in it, we're converting file contents to String to be able to search for descriptor beginnings using String methods. But we're not passing a character encoding, leaving it up to Java to guess. What we should do is tell it to use "US-ASCII" as encoding, which is sufficient to find keywords marking the beginning of a new descriptor. Fixes #11821.	2014-05-25 12:32:36 +02:00
Karsten Loesing	38c48ddd0c	Parse micodesc consensuses and microdescriptors. Required for implementing #2785.	2014-01-17 15:53:45 +01:00
Jens-Michael Hoffmann	3e60ccdaab	Fix build errors on Debian systems. The local lib directory is not used anymore and respective references were removed. The java dependencies are now specified in the build.xml and taken from their installed locations. In addition to git, openjdk-6-jdk and ant the following java packages have to be installed: - libcommons-codec-java - libcommons-compress-java - junit4 Minor tweaks by Karsten Loesing.	2013-07-31 16:40:06 +02:00
Karsten Loesing	008781b7e5	Add tests for published lines containing milliseconds. Milliseconds are simply ignored, because SimpleDateFormat only looks at "yyyy-MM-dd HH:mm:ss" and ignores everything after that. Related to #9286 where we discovered that some relays include milliseconds in their descriptors.	2013-07-18 14:21:36 +02:00
Karsten Loesing	60a066a0b0	Fast exits read/write more than MAX_INT KiB per day. For example, see "other" entry in: exit-kibibytes-read 80=505190490,182=25102395,443=61873906, 6881=47999666,8989=8657674,17173=7910494,21762=9138992, 45682=5154543,50500=6086469,51413=62394452,other=2282907805	2013-07-08 12:53:53 +02:00
Karsten Loesing	e7f93e1a6a	Restrict valid keyword characters to [A-Za-z0-9-]+. Fixes #8798.	2013-05-03 15:33:29 +02:00
Karsten Loesing	b58211e577	Support bridge-ip-transports lines in extra-infos.	2013-04-19 20:57:31 +02:00
Karsten Loesing	5b21044819	Parse Unmeasured=1 in w lines of consensuses. Pointed out by atagar.	2013-04-09 08:36:53 +02:00
Karsten Loesing	fdcf0b49a3	guard-tk actually stands for weighted time known.	2013-02-05 15:37:12 +01:00
Karsten Loesing	c2a0dbf8bf	Parse the new flag-thresholds line in votes.	2013-02-05 14:49:28 +01:00
Karsten Loesing	785fd43246	Add parsing support for ntor-onion-key line. Spotted by atagar; see #7867.	2013-01-07 05:24:11 +01:00
Karsten Loesing	895992549b	Parse ipv6-policy lines in server descriptors. Spotted by atager in related ticket #7826.	2012-12-30 19:54:06 +01:00
Karsten Loesing	17e9149f07	Add support for parsing bridge-ip-versions lines.	2012-11-08 14:18:18 -05:00
Karsten Loesing	43b9390250	Add support for parsing geoip6-db-digest lines.	2012-11-07 13:55:23 -05:00
Karsten Loesing	be27fef42e	Looks like $fingerprint~nickname is also a valid family line entry. Support for $fingerprint=nickname was previously added in 6a46f46.	2012-11-07 12:27:54 -05:00
Karsten Loesing	66cec8b01f	Allow multiple "m" lines per network status entry.	2012-09-05 04:16:41 -04:00
Karsten Loesing	ba8cb725d2	Remove GetTor statistics parsing code.	2012-08-07 12:25:42 +02:00
Karsten Loesing	25f0e656c4	Accept transport lines containing more than just the transport name. Sanitized bridge descriptors contain transport lines with just the transport name. However, there are now relays including unsanitized transport lines, most likely because of a configuration problem. Don't reject the entire descriptor when encountering those lines.	2012-08-06 08:08:44 +02:00
Karsten Loesing	20f9d5574f	Make parse history in descriptor reader more accessible. So far, the only way to prevent files from being parsed repeatedly in distinct runs was to specify a history file that only metrics-lib was supposed to read and write. However, some applications may want to specify the list of files to exclude themselves, or they may want to learn which files have been excluded and which have been parsed. These applications shouldn't be forced to mess with the history file. Add three methods to the descriptor reader for these applications. They should also play nicely together with the history file approach. AFAIK, stem has methods with the same purpose but a slightly different semantic.	2012-07-21 12:11:47 +02:00
Karsten Loesing	ca201de75e	Parse transport lines in bridge extra-infos.	2012-06-29 13:54:31 +02:00
Karsten Loesing	0c19088c4b	We can parse all @type 1.x descriptor versions.	2012-06-29 13:29:05 +02:00
Karsten Loesing	2c3e59bb71	Tweak build file a bit.	2012-06-19 14:36:44 +02:00
Karsten Loesing	7348b3d208	Fix unit tests which were broken in 466725e.	2012-06-19 14:17:32 +02:00
Karsten Loesing	a3d89ee788	Support parsing GetTor statistics files.	2012-06-01 11:48:51 +02:00
Karsten Loesing	194768b33f	Parse exit lists with @type annotation and Downloaded line.	2012-05-31 16:00:09 +02:00
Karsten Loesing	71f473962f	Understand @type annotation in bridge pool assignments.	2012-05-31 12:02:48 +02:00
Karsten Loesing	1743e912c3	Parse sanitized bridge descriptor version 1.0.	2012-05-31 09:25:42 +02:00
Karsten Loesing	05a1cf7e7d	Parse new .tpf Torperf data format.	2012-05-30 10:59:09 +02:00
Karsten Loesing	466725e2ea	Parse v1 directories and contained server descriptors.	2012-05-19 19:30:28 +02:00
Karsten Loesing	49a88e7eaa	Add @type annotations for sanitized bridge descriptors. Spotted by Damian.	2012-05-19 11:48:29 +02:00
Karsten Loesing	26083ebde4	Fix unit tests. - Annotation lines starting with @ are now recognized. - Unrecognized keywords in "w" lines are now ignored.	2012-05-19 11:42:21 +02:00
Karsten Loesing	316e956bc0	Ignore unknown keywords in "w" lines. moria1 added a Capped= keyword to debug #2286 which made DocTor and metrics-db freak out. The correct behavior is to ignore unknown keywords.	2012-05-19 10:14:54 +02:00
Karsten Loesing	01878416dc	Correctly handle @type annotations when parsing descriptors.	2012-05-18 17:40:01 +02:00
Karsten Loesing	02fa685e9c	Looks like blank lines are allowed in v2 statuses. For the moment, we still disallow blank lines in all other descriptors. If this is not correct, we can easily fix that.	2012-05-16 17:35:17 +02:00
Karsten Loesing	0eb47d2650	Add support for parsing v2 network statuses.	2012-05-16 17:01:30 +02:00
Karsten Loesing	20b1ef6378	Fix unit tests.	2012-05-16 16:50:15 +02:00
Karsten Loesing	5d67942706	Use the descriptor parser interface in the downloader, too.	2012-05-09 12:42:55 +02:00

... 6 7 8 9 10 ...

508 Commits