Oops, commit cc43a6c both broke our unit tests and is incorrect in that it
changes the Query's 'error' attribute to a completely different type.
The error attribute is public so we cannot change it. Simply storing exception
metadata as a separate private attribute for the time being. In stem 2.x we'll
be able to drop this whole thing because without requiring python 2.x
compatibility we can re-raise exceptions while retaining their stacktrace.
When reading ORPort data that exceeded a hardcode (and arbitrary) buffer size
we cropped the content. This was caught by starlight when attempting to use
one of our demo scripts...
https://trac.torproject.org/projects/tor/ticket/28961https://stem.torproject.org/tutorials/examples/download_descriptor.html
Original traceback:
File "/home/atagar/Desktop/stem/stem/descriptor/remote.py", line 589, in _download_descriptors
self.content, self.reply_headers = _download_from_orport(endpoint, self.compression, self.resource)
File "/home/atagar/Desktop/stem/stem/descriptor/remote.py", line 998, in _download_from_orport
response = b''.join([cell.data for cell in circ.send(RelayCommand.DATA, request, stream_id = 1)])
File "/home/atagar/Desktop/stem/stem/client/__init__.py", line 268, in send
decrypted_cell, backward_key, backward_digest = stem.client.cell.RelayCell.decrypt(self.relay.link_protocol, encrypted_cell, self.backward_key, self.backward_digest)
File "/home/atagar/Desktop/stem/stem/client/cell.py", line 412, in decrypt
raise stem.ProtocolError('RELAY cells should be %i bytes, but received %i' % (link_protocol.fixed_cell_length, len(content)))
ProtocolError: RELAY cells should be 512 bytes, but received 464
I'm unhappy with this approach, but after three days of chewing on this it's
the least bad approach I've come up with and seems to work. Patches welcome if
there's a smarter way of going about this.
The message we give when RELAY cells receive an unexpected response are pretty
bad...
ProtocolError: Circuit response should be a series of RELAY cells, but
received an unexpected size for a response: 4048
Instead checking the cell types, providing a more descriptive error if they
mismatch. This doesn't fix the issue I'm trying to solve, but it gets me a bit
closer to the true problem of ticket #28961...
ProtocolError: RELAY cells should be 512 bytes, but received 464
Our remote module needs to retain then later rethrow exceptions, which makes
stacktraces less than helpful.
Both python 2.x and 3.x have mechanisms for preserving stacktraces but they
both rely on language syntax rather than libraries, so we cannot use either
without breaking compatibility with the other version.
As such opting for the least bad option I can think of which is to encode
the original stacktrace within our message.
As mentioned in the code's comment we'll opt for something better when we
drop python 2.x support.
Honestly I'm not digging in too much, but DocTor has started providing
me with notifications of...
Unable to retrieve the present server descriptors...
source: http://204.13.164.118:80/tor/server/all
time: 12/20/2018 11:44
error: Content conform to being a server descriptor:
We strip annotation whitespace if we have descritor content to parse but didn't
if we didnn't. No reason I can think of to not do so in both cases.
Recognize tor's new StaleDesc flag. This hasn't yet made its way into the
spec...
https://trac.torproject.org/projects/tor/ticket/28887
... but tor's changelog has a nice description of it. This removes a couple
more 'missing capability' notices from our test runs...
[Flag (consensus)] StaleDesc
[Flag (microdescriptor consensus)] StaleDesc
Oops! I noted these in the docs when they were added to the spec but I forgot
to add them to the enumeration. With this the following 'new capabilitiy'
notices are no longer noted after running our tests...
[Signal] ACTIVE
[Signal] DORMANT
Adding an example from https://blog.atagar.com/november2018/ for using
get_detached_signatures(). Particularly important for this method since
it varies by the time of the day.
Waste not, want not. I wrote this demo script for my recent status report
(https://blog.atagar.com/november2018/), but on reflection it makes a good
example for how to use our new digest methods.
Adding a digest() method to our base NetworkStatusDocument class. The spec is
vague on how this is calculated (#28664), but through experimentation figured
out that the range is up through the first 'directory-signature ' including the
following space.
To determine this I fetched the detached signatures and upcoming consensus
during the voting period (minutes 55-60 of the hour), and compared our
calculated result with that...
============================================================
demo.py script
============================================================
import stem.descriptor
desc = next(stem.descriptor.parse_file(
'/home/atagar/Desktop/next_consensus',
descriptor_type = 'network-status-consensus-3 1.0',
document_handler = stem.descriptor.DocumentHandler.DOCUMENT),
)
print('digest: %s' % desc.digest())
============================================================
% curl http://128.31.0.39:9131/tor/status-vote/next/consensus-signatures > next_sigs
% curl http://128.31.0.39:9131/tor/status-vote/next/consensus > consensus
% grep consensus-digest sigs
consensus-digest 296BA01987256A1C8EFB20E17667152DCFA50755
% python demo.py
digest: 296BA01987256A1C8EFB20E17667152DCFA50755
Does anyone use this? Written in response to a request from Karsten when I
first began stem I've never heard of someone actually using it. To simplify,
lets drop it.
Huh. Not sure why I added these to subclasses rather than their common parent.
Maybe there's a reason that will make me regret this, but certainly seems to
work.
Our data directory has up to two microdescriptor files: cached-microdescs and
cached-microdescs.new.
If the former is unavailable but the later is present we should use it...
https://trac.torproject.org/projects/tor/ticket/28508
Maybe more important, when looking into this I realized that our attempt to get
tor's data directory stacktraces if not explicitly present in the torrc...
>>> list(controller.get_microdescriptors())
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/atagar/Desktop/stem/stem/control.py", line 490, in wrapped
for val in func(self, *args, **kwargs):
File "/home/atagar/Desktop/stem/stem/control.py", line 1791, in get_microdescriptors
if not os.path.exists(data_directory):
File "/usr/lib/python2.7/genericpath.py", line 26, in exists
os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found
Changed it so we'll instead provide a generic exception saying we were unable
to determine the data directory. Fortunatley this whole thing is a fallback, so
eventually we'll be able to remove it.
Adding a parser for detached signatures, per irl's request...
https://trac.torproject.org/projects/tor/ticket/28495
You can use stem.descriptor.remote to download these, or parse with
DetachedSignature.from_str(). However, you cannot use parse_file()
until we have a @type annotation for these...
https://trac.torproject.org/projects/tor/ticket/28615
When downloaded these are only available for five minutes each hour making them
highly clunky to use, but irl suggested he might change that (hope so!). At
present during the window when they're available they can be fetched as
follows...
============================================================
Example script
============================================================
import stem.descriptor.remote
detached_sigs = stem.descriptor.remote.get_detached_signatures().run()[0]
for i, sig in enumerate(detached_sigs.signatures):
print('Signature %i is from %s' % (i + 1, sig.identity))
============================================================
When available (minutes 55-60 of the hour)
============================================================
% python demo.py
Signature 1 is from 0232AF901C31A04EE9848595AF9BB7620D4C5B2E
Signature 2 is from 14C131DFC5C6F93646BE72FA1401C02A8DF2E8B4
Signature 3 is from 23D15D965BC35114467363C165C4F724B64B4F66
Signature 4 is from 27102BC123E7AF1D4741AE047E160C91ADC76B21
Signature 5 is from 49015F787433103580E3B66A1707A00E60F2D15B
Signature 6 is from D586D18309DED4CD6D57C18FDB97EFA96D330566
Signature 7 is from E8A9C45EDE6D711294FADF8E7951F4DE6CA56B58
Signature 8 is from ED03BB616EB2F60BEC80151114BB25CEF515B226
Signature 9 is from EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97
============================================================
When unavailable (minutes 0-55 of the hour)
============================================================
% python demo.py
Traceback (most recent call last):
File "demo.py", line 3, in <module>
detached_sigs = stem.descriptor.remote.get_detached_signatures().run()[0]
File "/home/atagar/Desktop/stem/stem/descriptor/remote.py", line 476, in run
return list(self._run(suppress))
File "/home/atagar/Desktop/stem/stem/descriptor/remote.py", line 487, in _run
raise self.error
urllib2.HTTPError: HTTP Error 404: Not found
Nick encountered another error from this test...
======================================================================
ERROR: test_connections_by_ss
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/nickm/src/stem/test/integ/util/connection.py", line 50, in test_connections_by_ss
self.check_resolver(Resolver.SS)
File "/home/nickm/src/stem/test/require.py", line 58, in wrapped
return func(self, *args, **kwargs)
File "/home/nickm/src/stem/test/integ/util/connection.py", line 28, in check_resolver
connections = get_connections(resolver, process_pid = runner.get_pid())
File "/home/nickm/src/stem/stem/util/connection.py", line 300, in get_connections
raise IOError('No results found using: %s' % resolver_command)
IOError: No results found using: ss -nptu
----------------------------------------------------------------------
I'm unsure why this test is so flaky for him. Earlier I attempted to mitigate
this by catching OSErrors but on reflection what he was really getting were
IOErrors. Python *said* it was an OSError but that's because python3 has made
IOError an alias...
https://stackoverflow.com/questions/29347790/difference-between-ioerror-and-oserror
In Stem 2.x I should probably replace IOError throughout our codebase with
OSError.
Oops, my bad. Caught by teor on...
https://trac.torproject.org/projects/tor/ticket/28571
Our tests passed with python 2.7 (which is what I usually run), but python
3.x's filter() was changed to provide an iterable rather than a list so we
need to normalize.
Revamp digesting of Microdescriptors in the same way as server and extrainfo
descriptors...
https://trac.torproject.org/projects/tor/ticket/28398
This is definitely better *but* is backward incompatible with the class' old
'digest' attribute. Unfortunately I can't avoid this. The method needs to be
called digest() for consistency, and python cannot have name conflicts between
methods and attributes.
The old digest value was hex rather than base64 encoded which made it
relatively useless (it couldn't be used to fetch or validate microdescriptors,
the sole point of the damn thing) so fingers crossed that no one was using it.
I try very hard to provide backward compatibility in minor version bumps of
Stem but in this case I don't think we should be a slave to that here.
We deprecated get_microdescriptors() because tor hadn't implemented it on
DirPorts but now it has. Not only that but our existing method just works
(neat!). Undeprecating get_microdescriptors(), expanding its pydocs, adding
a test, and adding an alias to the base module to match other descriptor
types.
Router status entries don't have their own @type annotation, so our from_str()
method could not provide them. Implementing their own from_str() method so
things like RouterStatusEntryV3.from_str() will work.
Bah. Back when I added our 'digest' attribute to RouterStatusEntryMicroV3 I
tried to be consistent by making all our hashes hex. However, this was a
mistake. Uses of the microdescriptor digest expect base64 so deprecating the
'digest' attribute with another 'microdescriptor_digest' that's base64.
As requested by irl [1] adding a from_str() method to our Descriptor class,
similar to ControlMessage's from_str() [2].
This method makes it much simpler to parse descriptors from byte strings.
Unlike parse_file() this defaults to providing a single descriptor, with a
'multiple = True' flag that should be provided if your content has more than
one...
my_server_descriptor = RelayServerDescriptor.from_str(content)
consensus_relays = NetworkStatusDocumentV3.from_str(consensus_content, multiple = True)
[1] https://trac.torproject.org/projects/tor/ticket/28450
[2] https://stem.torproject.org/api/response.html#stem.response.ControlMessage.from_str
A warning message suggests wrapping bytes in BytesIO and then calling
parse_file(), but this is a simple step that could be included as a
convenience function.
In my particular use case, I'm loading from a file but I'd like to
perform the read asynchronously using asyncio.
https://trac.torproject.org/projects/tor/ticket/28450
When I first began Stem reading from cached descriptors in tor's data directory
was our primary mechanism to get descriptor data. This is no longer the case.
Mostly we use stem.descriptor.remote or the control port.
I included these annotation methods for completeness (ie. 'hey, there's data
here so lets expose it'). That said, I've never heard of someone actually
finding this to be useful, and not that it's extremely rare for people to even
*get* descriptors that have these annotations lets just drop it.
Brought to mind thanks to a discussion about annotations with irl...
https://trac.torproject.org/projects/tor/ticket/28503
Our digest methods rightfully raise a NotImplementedException if I bugger up
and add to the DigestEncoding enumeration without actually implementing it. But
if users provide other bad data for an encoding argument we should provide a
ValueError instead.
Ok, that was dumb. The whole point of my minor doc adjustment was to call
run()! Stupid me.
https://trac.torproject.org/projects/tor/ticket/28400
Ran this demo both successfully and with an exception hardcoded in run() to
ensure I didn't bugger it up again. :P
Iain made a great point that it's tougher to calculate descriptor digests than
it should be...
https://trac.torproject.org/projects/tor/ticket/28398
Digest type and encoding varies by their use. Mostly our spec sticks with
sha1/hex or sha256/base64, but sometimes it differs. For instance, the consensus
cites sha1/base64 server desciptor digests, whereas according to Karsten Tor
Metrics uses sha1/hex digests for filenames.
Presently server and extrainfo descriptors are the only classes with digest()
methods. Microdescriptors should (consensus votes cite microdescriptor digests)
but nobody has asked for those so we'll cross that bridge when we come to it.
This branch expands our digest() methods in the following ways...
* Add a hash_type argument so callers can specify sha1 or sha256
hashing.
* Add an encoding argument so callers can specify hex, base64, or
no encoding.
* In our digest documentation cite what references that descriptor
type's digest (ie. 'what the heck is this useful for?').
Our new digest type and encoding arguments make our _digest_for_content()
helper a poor fit. The only useful thing this helper does is narrow our
content to a specific range. As such adding a helper that does only that.
This doesn't yet change any of our _digest_for_content() callers. That's
next.
Digests are defined by a hash type and encoding tuple. I was using the first to
imply the second, but this doesn't always work. For example, the consensus
cites base64 encoded sha1 server descriptor digests but stem provides hex
encoded sha1s due to the following discussion with Karsten (subject: "Stem
Sphinx Documentation", 6/7/12).
>> - Why does digest() return the base64-encoded digest, not the
>> hex-formatted one? Network statuses are the only documents in Tor using
>> base64 (or rather, a variant of it without trailing ='s), so it's easier
>> to convert those to hex than to convert everything else to base64. Now,
>> if you switch to hex, you'll only have to decide between lower-case and
>> upper-case. I think Tor and metrics-lib use upper-case hex in most places.
>
> I went with base64 because I thought that this was only useful for
> comparing with the network status. What uses the hex encoded digest?
The hex-encoded server descriptor digest is used as file name in metrics
tarballs.
The (decoded) descriptor digest is used to verify the descriptor signature.
Other reasons for hex-encoding the digest() result is that the digest()
in extra-info descriptors should return the hex-encoded digest, too, or
you wouldn't be able to compare it to the extra-info-digest line in
server descriptors. Having both methods return a different encoding
would be confusing.
Oh, and router-digest lines in sanitized bridge descriptors also contain
the hex-encoded digest. You wouldn't want to convert that to base64
before writing it to the digest variable, nor would you want digest()
and digest to return differently encoded digests.
As such I'm going to leave both the hashing and encoding up to our callers
*and* cite all digest uses I know of in our digest method's pydoc.
When referencing digests tor now includes both sha1 and sha256 digests. As
such, beginning to expand our digest() methods to do the same...
https://trac.torproject.org/projects/tor/ticket/28398
I'm starting with extrainfo descriptors because their hashes are referenced by
our server descriptors, providing easy test data to see if we're doing this
right or not...
% curl http://128.31.0.39:9131/tor/server/fp/3BB34C63072D9D10E836EE42968713F7B9325F66 > /tmp/my_server_desc
% curl http://128.31.0.39:9131/tor/server/extra/3BB34C63072D9D10E836EE42968713F7B9325F66 > /tmp/my_extrainfo_desc
% grep extra /tmp/my_server_desc
extra-info-digest 5BEBC13FDA976050D3A0632EE6508FD1BF1D1750 FNzZZtYPlMjBeb78Wv0zS5DUIPGB3TrpJ3k79MZURMU
% python
>>> import stem.descriptor
>>> desc = next(stem.descriptor.parse_file('/tmp/my_extrainfo_desc', 'extra-info 1.0'))
# Good! The below shows that our sha1 digest matches what our server
# descriptor says it should be.
>>> desc.digest(stem.descriptor.DigestHashType.SHA1)
'5BEBC13FDA976050D3A0632EE6508FD1BF1D1750'
# Bad! This should *not* mismatch. >:(
>>> desc.digest(stem.descriptor.DigestHashType.SHA256)
'ciuNPeDfpiBQfowP7N1g7jPsuHR9fwceTTFyknNdyvY'
Unfortunately while I'm clearly doing something wrong, I'm puzzled about why we
mismatch. Our dir-spec's extra-info-digest description is pretty clear...
"sha256-digest" is a base64-encoded SHA256 digest of the extra-info
document, computed over the same data.
We're definitely hashing the same data (otherwise the sha1 wouldn't match).
This code also certainly seems to be doing exactly what the spec says (base64
encoding the sha256 digest). So... huh.
Gonna punt this over to irl who requested this to see if he can spot what I'm
doing wrong.
Oops. As mentioned by irl the surrounding text talks about using run() so our
demonstration should too. This example was correct (due to its 'block = True'
argument), but better to show what we say we show.
https://trac.torproject.org/projects/tor/ticket/28400