3373 Commits

Author SHA1 Message Date
Damian Johnson
46a12277ac Missing changelog entry 2019-01-14 10:40:36 -08:00
Damian Johnson
aebf3e4e57 Separate Query exception attributes
Oops, commit cc43a6c both broke our unit tests and is incorrect in that it
changes the Query's 'error' attribute to a completely different type.

The error attribute is public so we cannot change it. Simply storing exception
metadata as a separate private attribute for the time being. In stem 2.x we'll
be able to drop this whole thing because without requiring python 2.x
compatibility we can re-raise exceptions while retaining their stacktrace.
2019-01-02 13:58:02 -08:00
Damian Johnson
e2d8575ce4 Remove hardcoded buffer size from ORPort sockets
When reading ORPort data that exceeded a hardcode (and arbitrary) buffer size
we cropped the content. This was caught by starlight when attempting to use
one of our demo scripts...

  https://trac.torproject.org/projects/tor/ticket/28961
  https://stem.torproject.org/tutorials/examples/download_descriptor.html

  Original traceback:
    File "/home/atagar/Desktop/stem/stem/descriptor/remote.py", line 589, in _download_descriptors
      self.content, self.reply_headers = _download_from_orport(endpoint, self.compression, self.resource)
    File "/home/atagar/Desktop/stem/stem/descriptor/remote.py", line 998, in _download_from_orport
      response = b''.join([cell.data for cell in circ.send(RelayCommand.DATA, request, stream_id = 1)])
    File "/home/atagar/Desktop/stem/stem/client/__init__.py", line 268, in send
      decrypted_cell, backward_key, backward_digest = stem.client.cell.RelayCell.decrypt(self.relay.link_protocol, encrypted_cell, self.backward_key, self.backward_digest)
    File "/home/atagar/Desktop/stem/stem/client/cell.py", line 412, in decrypt
      raise stem.ProtocolError('RELAY cells should be %i bytes, but received %i' % (link_protocol.fixed_cell_length, len(content)))
  ProtocolError: RELAY cells should be 512 bytes, but received 464

I'm unhappy with this approach, but after three days of chewing on this it's
the least bad approach I've come up with and seems to work. Patches welcome if
there's a smarter way of going about this.
2019-01-02 13:46:05 -08:00
Damian Johnson
0724fa5273 Update copyright dates for 2019
Happy new year! Bumping the dates for 2019...

  % find . -type f -iname '*.py' -exec sed -i 's/-2018/-2019/g' "{}" +;
  % grep -R "# Copyright 2018," ./*
2018-12-31 16:22:20 -08:00
Damian Johnson
d60ac492e2 Better type checking for RELAY cell replies
The message we give when RELAY cells receive an unexpected response are pretty
bad...

  ProtocolError: Circuit response should be a series of RELAY cells, but
  received an unexpected size for a response: 4048

Instead checking the cell types, providing a more descriptive error if they
mismatch. This doesn't fix the issue I'm trying to solve, but it gets me a bit
closer to the true problem of ticket #28961...

  ProtocolError: RELAY cells should be 512 bytes, but received 464
2018-12-30 14:58:55 -08:00
Damian Johnson
cc43a6ca90 Include the originating stacktrace in stem.descriptor.remote exceptions
Our remote module needs to retain then later rethrow exceptions, which makes
stacktraces less than helpful.

Both python 2.x and 3.x have mechanisms for preserving stacktraces but they
both rely on language syntax rather than libraries, so we cannot use either
without breaking compatibility with the other version.

As such opting for the least bad option I can think of which is to encode
the original stacktrace within our message.

As mentioned in the code's comment we'll opt for something better when we
drop python 2.x support.
2018-12-30 12:19:10 -08:00
Damian Johnson
bf267fe1d1 Merge version 1.7.1 hotfix 2018-12-26 15:07:44 -08:00
Taylor Yu
991291cd72 Stem release 1.7.1
Hotfix release requested by Nick incorporating commit 0eb8fda...

https://trac.torproject.org/projects/tor/ticket/28731#comment:18
1.7.1
2018-12-26 14:57:40 -08:00
Damian Johnson
5488849baa Blank inputs cause server descriptor parsing to fail
Honestly I'm not digging in too much, but DocTor has started providing
me with notifications of...

  Unable to retrieve the present server descriptors...

  source: http://204.13.164.118:80/tor/server/all
  time: 12/20/2018 11:44
  error: Content conform to being a server descriptor:

We strip annotation whitespace if we have descritor content to parse but didn't
if we didnn't. No reason I can think of to not do so in both cases.
2018-12-20 12:03:10 -08:00
Damian Johnson
098c571928 Add StaleDesc flag
Recognize tor's new StaleDesc flag. This hasn't yet made its way into the
spec...

  https://trac.torproject.org/projects/tor/ticket/28887

... but tor's changelog has a nice description of it. This removes a couple
more 'missing capability' notices from our test runs...

  [Flag (consensus)] StaleDesc
  [Flag (microdescriptor consensus)] StaleDesc
2018-12-17 10:43:41 -08:00
Damian Johnson
b91a5c31ab Add ACTIVE and DORMANT signals
Oops! I noted these in the docs when they were added to the spec but I forgot
to add them to the enumeration. With this the following 'new capabilitiy'
notices are no longer noted after running our tests...

  [Signal] ACTIVE
  [Signal] DORMANT
2018-12-17 10:14:00 -08:00
Damian Johnson
1764f4396b Deduplicate new capability messages
Clearly we're being way too verbose...

Your version of Tor has capabilities stem currently isn't taking advantage of.
If you're running the latest version of stem then please file a ticket on:

  https://trac.torproject.org/projects/tor/wiki/doc/stem/bugs

New capabilities are:

  [Flag (microdescriptor)] StaleDesc
  [Flag] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Flag (microdescriptor)] StaleDesc
  [Signal] ACTIVE
  [Signal] DORMANT
2018-12-17 10:12:30 -08:00
Taylor Yu
0eb8fda3b7 Relax bootstrap percentage regex
After integrating #28731 in tor, the bootstrap percentage regex no
longer matches.  Relax it accordingly.
2018-12-13 18:21:58 -06:00
Damian Johnson
b0caad31c4 Exemplify get_detached_signatures() usage
Adding an example from https://blog.atagar.com/november2018/ for using
get_detached_signatures(). Particularly important for this method since
it varies by the time of the day.
2018-12-11 10:27:19 -08:00
Damian Johnson
3e3d9bd5c3 Descriptor digest example
Waste not, want not. I wrote this demo script for my recent status report
(https://blog.atagar.com/november2018/), but on reflection it makes a good
example for how to use our new digest methods.
2018-12-11 10:18:58 -08:00
Damian Johnson
e5787e8dc5 Fill in missing changelog entries
Oops, forgot to cite some recent changes.
2018-11-30 08:39:39 -08:00
Damian Johnson
04c334d0fc Calculate network status document digests
Adding a digest() method to our base NetworkStatusDocument class. The spec is
vague on how this is calculated (#28664), but through experimentation figured
out that the range is up through the first 'directory-signature ' including the
following space.

To determine this I fetched the detached signatures and upcoming consensus
during the voting period (minutes 55-60 of the hour), and compared our
calculated result with that...

  ============================================================
  demo.py script
  ============================================================

  import stem.descriptor

  desc = next(stem.descriptor.parse_file(
    '/home/atagar/Desktop/next_consensus',
    descriptor_type = 'network-status-consensus-3 1.0',
    document_handler = stem.descriptor.DocumentHandler.DOCUMENT),
  )

  print('digest: %s' % desc.digest())

  ============================================================

  % curl http://128.31.0.39:9131/tor/status-vote/next/consensus-signatures > next_sigs
  % curl http://128.31.0.39:9131/tor/status-vote/next/consensus > consensus

  % grep consensus-digest sigs
  consensus-digest 296BA01987256A1C8EFB20E17667152DCFA50755

  % python demo.py
  digest: 296BA01987256A1C8EFB20E17667152DCFA50755
2018-11-29 11:22:20 -08:00
Damian Johnson
afbfb424c1 DORMANT and ACTIVE signals
Stem support for a couple new signals. Sounds like a neat capability!

  https://gitweb.torproject.org/torspec.git/commit/?id=4421149
2018-11-28 17:08:07 -08:00
Damian Johnson
8d43e5810c Deprecate the DescriptorReader class
Does anyone use this? Written in response to a request from Karsten when I
first began stem I've never heard of someone actually using it. To simplify,
lets drop it.
2018-11-27 18:53:14 -08:00
Damian Johnson
1c2f851dd2 Move comparison and hashing to base Descriptor class
Huh. Not sure why I added these to subclasses rather than their common parent.
Maybe there's a reason that will make me regret this, but certainly seems to
work.
2018-11-27 18:51:56 -08:00
Damian Johnson
ddb1a360da Use newer cached microdescriptors if available
Our data directory has up to two microdescriptor files: cached-microdescs and
cached-microdescs.new.

If the former is unavailable but the later is present we should use it...

  https://trac.torproject.org/projects/tor/ticket/28508

Maybe more important, when looking into this I realized that our attempt to get
tor's data directory stacktraces if not explicitly present in the torrc...

  >>> list(controller.get_microdescriptors())
  Traceback (most recent call last):
    File "<console>", line 1, in <module>
    File "/home/atagar/Desktop/stem/stem/control.py", line 490, in wrapped
      for val in func(self, *args, **kwargs):
    File "/home/atagar/Desktop/stem/stem/control.py", line 1791, in get_microdescriptors
      if not os.path.exists(data_directory):
    File "/usr/lib/python2.7/genericpath.py", line 26, in exists
      os.stat(path)
  TypeError: coercing to Unicode: need string or buffer, NoneType found

Changed it so we'll instead provide a generic exception saying we were unable
to determine the data directory. Fortunatley this whole thing is a fallback, so
eventually we'll be able to remove it.
2018-11-26 13:04:23 -08:00
Damian Johnson
71593fc734 Detached signature parsing support
Adding a parser for detached signatures, per irl's request...

  https://trac.torproject.org/projects/tor/ticket/28495

You can use stem.descriptor.remote to download these, or parse with
DetachedSignature.from_str(). However, you cannot use parse_file()
until we have a @type annotation for these...

  https://trac.torproject.org/projects/tor/ticket/28615

When downloaded these are only available for five minutes each hour making them
highly clunky to use, but irl suggested he might change that (hope so!). At
present during the window when they're available they can be fetched as
follows...

  ============================================================
  Example script
  ============================================================

  import stem.descriptor.remote

  detached_sigs = stem.descriptor.remote.get_detached_signatures().run()[0]

  for i, sig in enumerate(detached_sigs.signatures):
    print('Signature %i is from %s' % (i + 1, sig.identity))

  ============================================================
  When available (minutes 55-60 of the hour)
  ============================================================

  % python demo.py
  Signature 1 is from 0232AF901C31A04EE9848595AF9BB7620D4C5B2E
  Signature 2 is from 14C131DFC5C6F93646BE72FA1401C02A8DF2E8B4
  Signature 3 is from 23D15D965BC35114467363C165C4F724B64B4F66
  Signature 4 is from 27102BC123E7AF1D4741AE047E160C91ADC76B21
  Signature 5 is from 49015F787433103580E3B66A1707A00E60F2D15B
  Signature 6 is from D586D18309DED4CD6D57C18FDB97EFA96D330566
  Signature 7 is from E8A9C45EDE6D711294FADF8E7951F4DE6CA56B58
  Signature 8 is from ED03BB616EB2F60BEC80151114BB25CEF515B226
  Signature 9 is from EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97

  ============================================================
  When unavailable (minutes 0-55 of the hour)
  ============================================================

  % python demo.py
  Traceback (most recent call last):
    File "demo.py", line 3, in <module>
      detached_sigs = stem.descriptor.remote.get_detached_signatures().run()[0]
    File "/home/atagar/Desktop/stem/stem/descriptor/remote.py", line 476, in run
      return list(self._run(suppress))
    File "/home/atagar/Desktop/stem/stem/descriptor/remote.py", line 487, in _run
      raise self.error
  urllib2.HTTPError: HTTP Error 404: Not found
2018-11-25 12:11:40 -08:00
Damian Johnson
f4536b70b8 Avoid test_connections_by_ss flakyness from IOErrors
Nick encountered another error from this test...

  ======================================================================
  ERROR: test_connections_by_ss
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/home/nickm/src/stem/test/integ/util/connection.py", line 50, in test_connections_by_ss
      self.check_resolver(Resolver.SS)
    File "/home/nickm/src/stem/test/require.py", line 58, in wrapped
      return func(self, *args, **kwargs)
    File "/home/nickm/src/stem/test/integ/util/connection.py", line 28, in check_resolver
      connections = get_connections(resolver, process_pid = runner.get_pid())
    File "/home/nickm/src/stem/stem/util/connection.py", line 300, in get_connections
      raise IOError('No results found using: %s' % resolver_command)
  IOError: No results found using: ss -nptu

  ----------------------------------------------------------------------

I'm unsure why this test is so flaky for him. Earlier I attempted to mitigate
this by catching OSErrors but on reflection what he was really getting were
IOErrors. Python *said* it was an OSError but that's because python3 has made
IOError an alias...

  https://stackoverflow.com/questions/29347790/difference-between-ioerror-and-oserror

In Stem 2.x I should probably replace IOError throughout our codebase with
OSError.
2018-11-24 09:58:18 -08:00
Damian Johnson
f6ea37a47d Doctests failed with python3
Oops, my bad. Caught by teor on...

  https://trac.torproject.org/projects/tor/ticket/28571

Our tests passed with python 2.7 (which is what I usually run), but python
3.x's filter() was changed to provide an iterable rather than a list so we
need to normalize.
2018-11-22 10:58:04 -08:00
Damian Johnson
32a3d26267 Descriptor decompression inappropriately stripped trailing newline
Oops, our decompression helper stripped trailing whitespaces. This wasn't
noticeable to our parser, but it does throw off digesting...

  import stem.descriptor.remote

  digest = 'BsaDvyZyHjBDGWCYpMx0Du3N1Mn2uMfNF7PjgizQC1s'

  desc = stem.descriptor.remote.get_microdescriptors([digest]).run()[0]
  print('digest: %s (expected %s)' % (desc.digest(), digest))

  ============================================================

  % python scrap.py
  digest: j8kC3P07m9dL45ll1O0PSpvfOfxLtzAWqJYjzvwLEcM (expected BsaDvyZyHjBDGWCYpMx0Du3N1Mn2uMfNF7PjgizQC1s)
2018-11-21 15:28:36 -08:00
Damian Johnson
5495c5725e Digest method for microdescriptors
Revamp digesting of Microdescriptors in the same way as server and extrainfo
descriptors...

  https://trac.torproject.org/projects/tor/ticket/28398

This is definitely better *but* is backward incompatible with the class' old
'digest' attribute. Unfortunately I can't avoid this. The method needs to be
called digest() for consistency, and python cannot have name conflicts between
methods and attributes.

The old digest value was hex rather than base64 encoded which made it
relatively useless (it couldn't be used to fetch or validate microdescriptors,
the sole point of the damn thing) so fingers crossed that no one was using it.

I try very hard to provide backward compatibility in minor version bumps of
Stem but in this case I don't think we should be a slave to that here.
2018-11-21 12:21:32 -08:00
Damian Johnson
c606f44289 Undeprecate stem.descriptor.remote's get_microdescriptors()
We deprecated get_microdescriptors() because tor hadn't implemented it on
DirPorts but now it has. Not only that but our existing method just works
(neat!). Undeprecating get_microdescriptors(), expanding its pydocs, adding
a test, and adding an alias to the base module to match other descriptor
types.
2018-11-21 11:52:50 -08:00
Damian Johnson
310b5ca40b Implement RouterStatusEntry.from_str()
Router status entries don't have their own @type annotation, so our from_str()
method could not provide them. Implementing their own from_str() method so
things like RouterStatusEntryV3.from_str() will work.
2018-11-21 11:46:33 -08:00
Damian Johnson
e30b130482 Replace microdescriptor router status entry digest attribute
Bah. Back when I added our 'digest' attribute to RouterStatusEntryMicroV3 I
tried to be consistent by making all our hashes hex. However, this was a
mistake. Uses of the microdescriptor digest expect base64 so deprecating the
'digest' attribute with another 'microdescriptor_digest' that's base64.
2018-11-21 11:46:29 -08:00
Damian Johnson
2ea93474b8 Python3 unit test regressions with new Descriptor.from_str() tests
Oops, couple unicode-vs-bytes mistakes...

  https://trac.torproject.org/projects/tor/ticket/28550

  ======================================================================
  ERROR: test_from_str_multiple
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/home/atagar/Desktop/stem/test/unit/descriptor/descriptor.py", line 45, in test_from_str_multiple
      RelayDescriptor.content({'router': 'relay2 71.35.133.197 9001 0 0'}),
  TypeError: sequence item 1: expected str instance, bytes found

  ======================================================================
  ERROR: test_from_str_type_handling
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/home/atagar/Desktop/stem/test/unit/descriptor/descriptor.py", line 33, in test_from_str_type_handling
      desc = Descriptor.from_str('@type server-descriptor 1.0\n' + desc_text)
  TypeError: Can't convert 'bytes' object to str implicitly

  ----------------------------------------------------------------------

I'm also making from_str() normalize unicode to bytes so the method isn't a
misnomer for python3.
2018-11-21 09:29:16 -08:00
Damian Johnson
670f403bb3 Descriptor.from_str() method
As requested by irl [1] adding a from_str() method to our Descriptor class,
similar to ControlMessage's from_str() [2].

This method makes it much simpler to parse descriptors from byte strings.
Unlike parse_file() this defaults to providing a single descriptor, with a
'multiple = True' flag that should be provided if your content has more than
one...

  my_server_descriptor = RelayServerDescriptor.from_str(content)

  consensus_relays = NetworkStatusDocumentV3.from_str(consensus_content, multiple = True)

[1] https://trac.torproject.org/projects/tor/ticket/28450
[2] https://stem.torproject.org/api/response.html#stem.response.ControlMessage.from_str
2018-11-20 13:43:20 -08:00
Damian Johnson
2192228e43 Replace parse_bytes() with a from_str() method
Shifting to the same pattern we used with the stem.response.ControlMessage
method...

  https://stem.torproject.org/api/response.html#stem.response.ControlMessage.from_str

Also making this provide a single descriptor by default (the more common use
case) with a 'multiple = True' option, and tests.
2018-11-20 13:34:43 -08:00
Damian Johnson
dafbace188 Pass along parse_bytes()'s keyword arguments
They were dropped, causing callers to have the descriptor type and other args
ignored.
2018-11-20 10:47:07 -08:00
Iain R. Learmonth
74d7eeef9f [PATCH] Adds a parse_bytes() function to load descriptors
A warning message suggests wrapping bytes in BytesIO and then calling
parse_file(), but this is a simple step that could be included as a
convenience function.

In my particular use case, I'm loading from a file but I'd like to
perform the read asynchronously using asyncio.

  https://trac.torproject.org/projects/tor/ticket/28450
2018-11-20 10:44:31 -08:00
Damian Johnson
e810cc093a Parse 'bandwidth-file-digest' lines from votes
Parsing of a newly added field...

  https://gitweb.torproject.org/torspec.git/commit/?id=1b686ef
2018-11-19 10:43:00 -08:00
Damian Johnson
73e5bd3e4f Deprecate server descriptor annotation methods
When I first began Stem reading from cached descriptors in tor's data directory
was our primary mechanism to get descriptor data. This is no longer the case.
Mostly we use stem.descriptor.remote or the control port.

I included these annotation methods for completeness (ie. 'hey, there's data
here so lets expose it'). That said, I've never heard of someone actually
finding this to be useful, and not that it's extremely rare for people to even
*get* descriptors that have these annotations lets just drop it.

Brought to mind thanks to a discussion about annotations with irl...

  https://trac.torproject.org/projects/tor/ticket/28503
2018-11-18 13:50:06 -08:00
Damian Johnson
0d09943e04 Update ArchLinux instructions
Stem has been removed from the community maintained AUR repository, and moved
to the official ArchLinux repos. As sucn updating our links and installation
instructions.

[1] https://aur.archlinux.org/packages/stem/
[2] https://www.archlinux.org/packages/community/any/python-stem/
2018-11-18 11:04:55 -08:00
Damian Johnson
49c6a17a0a Add is_valid() and is_fresh() methods to the conensus
Nice idea from Iain...

  https://trac.torproject.org/projects/tor/ticket/28448
2018-11-17 16:42:43 -08:00
Damian Johnson
616026e9fb Better exception if provided with an invalid digest encoding
Our digest methods rightfully raise a NotImplementedException if I bugger up
and add to the DigestEncoding enumeration without actually implementing it. But
if users provide other bad data for an encoding argument we should provide a
ValueError instead.
2018-11-17 15:08:36 -08:00
Damian Johnson
3768e1e922 Missing run() from pydoc
Ok, that was dumb. The whole point of my minor doc adjustment was to call
run()! Stupid me.

  https://trac.torproject.org/projects/tor/ticket/28400

Ran this demo both successfully and with an exception hardcoded in run() to
ensure I didn't bugger it up again. :P
2018-11-16 09:31:37 -08:00
Damian Johnson
5d3565002b Add hash_type and encoding arguments to descriptor digest() methods
Iain made a great point that it's tougher to calculate descriptor digests than
it should be...

  https://trac.torproject.org/projects/tor/ticket/28398

Digest type and encoding varies by their use. Mostly our spec sticks with
sha1/hex or sha256/base64, but sometimes it differs. For instance, the consensus
cites sha1/base64 server desciptor digests, whereas according to Karsten Tor
Metrics uses sha1/hex digests for filenames.

Presently server and extrainfo descriptors are the only classes with digest()
methods. Microdescriptors should (consensus votes cite microdescriptor digests)
but nobody has asked for those so we'll cross that bridge when we come to it.

This branch expands our digest() methods in the following ways...

  * Add a hash_type argument so callers can specify sha1 or sha256
    hashing.

  * Add an encoding argument so callers can specify hex, base64, or
    no encoding.

  * In our digest documentation cite what references that descriptor
    type's digest (ie. 'what the heck is this useful for?').
2018-11-15 12:18:03 -08:00
Damian Johnson
6beaaf49df Replace remaining _digest_for_content() usage
Swapping out our remmaining _digest_for_content() usage so we can drop that
helper.
2018-11-15 12:13:43 -08:00
Damian Johnson
de42798f43 Support hash types and encodings for server descriptor digests
Replicating what I just did for extrainfo descriptors with server descriptors.
2018-11-15 11:56:20 -08:00
Damian Johnson
f816134a06 Internal _content_range() helper
Our new digest type and encoding arguments make our _digest_for_content()
helper a poor fit. The only useful thing this helper does is narrow our
content to a specific range. As such adding a helper that does only that.

This doesn't yet change any of our _digest_for_content() callers. That's
next.
2018-11-15 10:18:16 -08:00
Damian Johnson
874f419775 Add a digest DigestEncoding argument
Digests are defined by a hash type and encoding tuple. I was using the first to
imply the second, but this doesn't always work. For example, the consensus
cites base64 encoded sha1 server descriptor digests but stem provides hex
encoded sha1s due to the following discussion with Karsten (subject: "Stem
Sphinx Documentation", 6/7/12).

  >> - Why does digest() return the base64-encoded digest, not the
  >> hex-formatted one?  Network statuses are the only documents in Tor using
  >> base64 (or rather, a variant of it without trailing ='s), so it's easier
  >> to convert those to hex than to convert everything else to base64.  Now,
  >> if you switch to hex, you'll only have to decide between lower-case and
  >> upper-case.  I think Tor and metrics-lib use upper-case hex in most places.
  >
  > I went with base64 because I thought that this was only useful for
  > comparing with the network status. What uses the hex encoded digest?

  The hex-encoded server descriptor digest is used as file name in metrics
  tarballs.

  The (decoded) descriptor digest is used to verify the descriptor signature.

  Other reasons for hex-encoding the digest() result is that the digest()
  in extra-info descriptors should return the hex-encoded digest, too, or
  you wouldn't be able to compare it to the extra-info-digest line in
  server descriptors.  Having both methods return a different encoding
  would be confusing.

  Oh, and router-digest lines in sanitized bridge descriptors also contain
  the hex-encoded digest.  You wouldn't want to convert that to base64
  before writing it to the digest variable, nor would you want digest()
  and digest to return differently encoded digests.

As such I'm going to leave both the hashing and encoding up to our callers
*and* cite all digest uses I know of in our digest method's pydoc.
2018-11-15 09:53:24 -08:00
Damian Johnson
909fbee874 Rename DigestHashType to DigestHash
Shorter is better in enum names, and the 'Type' suffix didn't convey anything.
2018-11-15 08:53:58 -08:00
Damian Johnson
fc229c7ad1 Document DigestHashType
Simple enum so not much to be said.
2018-11-13 17:37:39 -08:00
Damian Johnson
f214309e04 Compute extrainfo sha256 digests from the whole descriptor
Accounting for a tor bug that's prompting an upcoming spec change...

  https://trac.torproject.org/projects/tor/ticket/28415
2018-11-13 17:32:27 -08:00
Damian Johnson
70d6e35047 Sha256 extrainfo descriptor digests
When referencing digests tor now includes both sha1 and sha256 digests. As
such, beginning to expand our digest() methods to do the same...

  https://trac.torproject.org/projects/tor/ticket/28398

I'm starting with extrainfo descriptors because their hashes are referenced by
our server descriptors, providing easy test data to see if we're doing this
right or not...

  % curl http://128.31.0.39:9131/tor/server/fp/3BB34C63072D9D10E836EE42968713F7B9325F66 > /tmp/my_server_desc
  % curl http://128.31.0.39:9131/tor/server/extra/3BB34C63072D9D10E836EE42968713F7B9325F66 > /tmp/my_extrainfo_desc

  % grep extra /tmp/my_server_desc
  extra-info-digest 5BEBC13FDA976050D3A0632EE6508FD1BF1D1750 FNzZZtYPlMjBeb78Wv0zS5DUIPGB3TrpJ3k79MZURMU

  % python
  >>> import stem.descriptor
  >>> desc = next(stem.descriptor.parse_file('/tmp/my_extrainfo_desc', 'extra-info 1.0'))

  # Good! The below shows that our sha1 digest matches what our server
  # descriptor says it should be.

  >>> desc.digest(stem.descriptor.DigestHashType.SHA1)
  '5BEBC13FDA976050D3A0632EE6508FD1BF1D1750'

  # Bad! This should *not* mismatch. >:(

  >>> desc.digest(stem.descriptor.DigestHashType.SHA256)
  'ciuNPeDfpiBQfowP7N1g7jPsuHR9fwceTTFyknNdyvY'

Unfortunately while I'm clearly doing something wrong, I'm puzzled about why we
mismatch. Our dir-spec's extra-info-digest description is pretty clear...

  "sha256-digest" is a base64-encoded SHA256 digest of the extra-info
  document, computed over the same data.

We're definitely hashing the same data (otherwise the sha1 wouldn't match).
This code also certainly seems to be doing exactly what the spec says (base64
encoding the sha256 digest). So... huh.

Gonna punt this over to irl who requested this to see if he can spot what I'm
doing wrong.
2018-11-12 10:38:55 -08:00
Damian Johnson
0a8b47e85f Use run() for Query example
Oops. As mentioned by irl the surrounding text talks about using run() so our
demonstration should too. This example was correct (due to its 'block = True'
argument), but better to show what we say we show.

  https://trac.torproject.org/projects/tor/ticket/28400
2018-11-12 09:34:30 -08:00