Fix typos

This commit is contained in:
Jakub Wilk 2024-01-19 23:20:13 +01:00
parent dcc9d1a571
commit 7a2348e4cc
10 changed files with 38 additions and 38 deletions

View File

@ -7,7 +7,7 @@ In the interest of shortening what is written here, I am going to start where we
For the earlier history up to 2006 and the code up until Python 2.4, which I find interesting, look at that link.
Sometime around 2014 was the dawn of ["uncompyle" and PyPI](https://pypi.python.org/pypi/uncompyle/1.1) — the era of
public version control. Dan Pascu's code although not public used [darcs](http://darcs.net/) for version control. I converted the darcs to to git and put this at [decompyle-2.4](https://github.com/rocky/decompile-2.4).
public version control. Dan Pascu's code although not public used [darcs](http://darcs.net/) for version control. I converted the darcs repository to git and put this at [decompyle-2.4](https://github.com/rocky/decompile-2.4).
# uncompyle, unpyc
@ -17,7 +17,7 @@ The project exists not only on [github](https://github.com/gstarnberger/uncompyl
[bitbucket](https://bitbucket.org/gstarnberger/uncompyle) and later the defunct [google
code](https://code.google.com/archive/p/unpyc/) under the name _unpyc_. The git/svn history goes back to 2009. Somewhere in there the name was changed from "decompyle" to "unpyc" by Keknehv, and then to "uncompyle" by Guenther Starnberger.
The name Thomas Grainger isn't found in (m)any of the commits in the several years of active development. First Keknehv worked on this up to Python 2.5 or so while acceping Python bytecode back to 2.0 or so. Then "hamled" made a few commits earler on, while Eike Siewertsen made a few commits later on. But mostly "wibiti", and Guenther Starnberger got the code to where uncompyle2 was around 2012.
The name Thomas Grainger isn't found in (m)any of the commits in the several years of active development. First Keknehv worked on this up to Python 2.5 or so while accepting Python bytecode back to 2.0 or so. Then "hamled" made a few commits earlier on, while Eike Siewertsen made a few commits later on. But mostly "wibiti", and Guenther Starnberger got the code to where uncompyle2 was around 2012.
While John Aycock and Hartmut Goebel were well versed in compiler technology, those that have come afterwards don't seem to have been as facile in it. Furthermore, documentation or guidance on how the decompiler code worked, comparison to a conventional compiler pipeline, how to add new constructs, or debug grammars was weak. Some of the grammar tracing and error reporting was a bit weak as well.
@ -38,7 +38,7 @@ I started working on this late 2015, mostly to add fragment support. In that, I
* this project - grammar and semantic actions for decompiling
([uncompyle6](https://pypi.python.org/pypi/uncompyle6)).
`uncompyle6`, abandons the idea found in some 2.7 version of `uncompyle` that support Python 2.6 and 2.5 by trying to rewite opcodes at the bytecode level.
`uncompyle6`, abandons the idea found in some 2.7 version of `uncompyle` that support Python 2.6 and 2.5 by trying to rewrite opcodes at the bytecode level.
Having a grammar per Python version is simpler to maintain, cleaner and it scales indefinitely.
@ -68,13 +68,13 @@ project is largely by Michael Hansen and Darryl Pogue. If they supported getting
# So you want to write a decompiler for Python?
If you think, as I am sure will happen in the future, "hey, I can just write a decompiler from scratch and not have to deal with all all of the complexity in uncompyle6", think again. What is likely to happen is that you'll get at best a 90% solution working for a single Python release that will be obsolete in about a year, and more obsolete each subsequent year.
If you think, as I am sure will happen in the future, "hey, I can just write a decompiler from scratch and not have to deal with all of the complexity in uncompyle6", think again. What is likely to happen is that you'll get at best a 90% solution working for a single Python release that will be obsolete in about a year, and more obsolete each subsequent year.
Writing a decompiler for Python gets harder as it Python progresses. Writing decompiler for Python 3.7 isn't as easy as it was for Python 2.2. For one thing, now that Python has a well-established AST, that opens another interface by which code can be improved.
In Python 3.10 I am seeing (for the first time?) bytecode getting moved around so that it is no longer the case that line numbers have to be strictly increasing as bytecode offsets increase. And I am seeing dead code appear as well.
That said, if you still feel you want to write a single version decompiler, look at the test cases in this project and talk to me. I may have some ideas that I haven't made public yet. See also what I've wrtten about the on how this code works and on [decompilation in dynamic runtime languages](http://rocky.github.io/Deparsing-Paper.pdf) in general.
That said, if you still feel you want to write a single version decompiler, look at the test cases in this project and talk to me. I may have some ideas that I haven't made public yet. See also what I've written about the on how this code works and on [decompilation in dynamic runtime languages](http://rocky.github.io/Deparsing-Paper.pdf) in general.
@ -82,8 +82,8 @@ That said, if you still feel you want to write a single version decompiler, look
This project deparses using an Earley-algorithm parse. But in order to do this accurately, the process of tokenization is a bit more involved in the scanner. We don't just disassemble bytecode and use the opcode name. That aspect hasn't changed from the very first decompilers. However understanding _what_ information needs to be made explicit and what pseudo instructions to add that accomplish this has taken some time to understand.
Earley-algorithm parsers have gotten negative press, most notably by the dragon book. Having used this a bit, I am convinced having a system that handles ambiguous grammars is the right thing to do and matches the problem well. Iin practice the speed of the parser isn't a problem when one understand what's up. And this has taken a little while to understand.
Earley-algorim parsers for context free languages or languages that are to a large extent context free and tend to be linear and the grammar stears towards left recursive rules. There is a technique for improving LL right recursion, but our parser doesn't have that yet.
Earley-algorithm parsers have gotten negative press, most notably by the dragon book. Having used this a bit, I am convinced having a system that handles ambiguous grammars is the right thing to do and matches the problem well. In practice the speed of the parser isn't a problem when one understand what's up. And this has taken a little while to understand.
Earley-algorithm parsers for context free languages or languages that are to a large extent context free and tend to be linear and the grammar steers towards left recursive rules. There is a technique for improving LL right recursion, but our parser doesn't have that yet.
The [decompiling paper](http://rocky.github.io/Deparsing-Paper.pdf) discusses these aspects in a more detail.

38
NEWS.md
View File

@ -8,7 +8,7 @@
* Correct 2.5-7 relative import formatting
* Miscellaneous bug fixing
* remove \n in lambda
* Python 2.6 gramar cleanup
* Python 2.6 grammar cleanup
* Correct some Python 2.6 chain compare decompilation
* Ensure no parenthesis subscript slices
* Correct 2.x formatting "slice2" nonterminal
@ -35,7 +35,7 @@
================
* Fragment parsing was borked. This means deparsing in trepan2/trepan3k was broken
* 3.7+: narrow precedence for call tatement
* 3.7+: narrow precedence for call statement
* del_stmt -> delete to better match Python AST
* 3.8+ Add another `forelsestmt` (found only in a loop)
* 3.8+ Add precedence on walrus operator
@ -66,7 +66,7 @@ Mostly small miscellaneous bug fixes
3.7.1: 2020-6-12 Fleetwood66
====================================================
Released to pick up new xdis version which has fixes to read bytestings better on 3.x
Released to pick up new xdis version which has fixes to read bytestrings better on 3.x
* Handle 3.7+ "else" branch removal adAs seen in `_cmp()` of `python3.8/distutils/version.py` with optimization `-O2`
* 3.6+ "with" and "with .. as" grammar improvements
@ -89,10 +89,10 @@ More upheaval in xdis which we need to track here.
3.6.6: 2020-4-20 Love in the time of Cholera
============================================
The main reason for this release is an incompatablity bump in xdis which handles
The main reason for this release is an incompatibility bump in xdis which handles
3.7 SipHash better.
* Go over "yield" as an expression precidence
* Go over "yield" as an expression precedence
* Some small alignment with code in decompyle3 for "or" and "and" was done
@ -118,7 +118,7 @@ The main focus in this release was fix some of the more glaring problems creapt
`uncompyle6` code is at a plateau where what is most needed is a code refactoring. In doing this, until everything refactored and replaced, decomplation may get worse.
Therefore, this release largely serves as a checkpoint before more major upheaval.
The upheaval, in started last release, I believe the pinnicle was around c90ff51 which wasn't a release. I suppose I should tag that.
The upheaval, in started last release, I believe the pinnacle was around c90ff51 which wasn't a release. I suppose I should tag that.
After c90ff5, I started down the road of redoing control flow in a more comprehensible, debuggable, and scalable way. See [The Control Flow Mess](https://github.com/rocky/python-uncompyle6/wiki/The-Control-Flow-Mess)
@ -132,7 +132,7 @@ In the decompyle3 code, I've gone down the road making the grammar goal symbol b
I cringe in thinking about how the code has lived for so long without noticing such a simple stupidity, and lapse of sufficient thought.
Some stats from testing. The below give numbers of decompiled tests from Python's test suite which succesfully ran
Some stats from testing. The below give numbers of decompiled tests from Python's test suite which successfully ran
```
Version test-suites passing
@ -175,14 +175,14 @@ On the most recent Python versions I regularly decompile thousands of Python pro
Does this mean the decompiler works perfectly? No. There are still a dozen or so failing programs, although the actual number of bugs is probably smaller though.
However, in perparation of a more major refactoring of the parser grammar, this release was born.
However, in preparation of a more major refactoring of the parser grammar, this release was born.
In many cases, decompilation is better. But there are some cases where decompilation has gotten worse. For lack of time (and interest) 3.0 bytecode suffered a hit. Possibly some code in the 3.x range did too. In time and with cleaner refactored code, this will come back.
Commit c90ff51 was a local maxiumum before, I started reworking the grammar to separate productions that were specific to loops versus those that are not in loops.
In the middle of that I added another grammar simplication to remove singleton productions of the form `sstmts-> stmts`. These were always was a bit ugly, and complicated output.
Commit c90ff51 was a local maximum before, I started reworking the grammar to separate productions that were specific to loops versus those that are not in loops.
In the middle of that I added another grammar simplification to remove singleton productions of the form `sstmts-> stmts`. These were always was a bit ugly, and complicated output.
At any rate if decompilation fails, you can try c90ff51. Or another decompiler. `unpyc37` is pretty good for 3.7. wibiti `uncompyle2` is great for 2.7. `pycdc` is mediocre for Python before 3.5 or so, and not that good for the most recent Python. Geerally these programs will give some sort of answer even if it isn't correct.
At any rate if decompilation fails, you can try c90ff51. Or another decompiler. `unpyc37` is pretty good for 3.7. wibiti `uncompyle2` is great for 2.7. `pycdc` is mediocre for Python before 3.5 or so, and not that good for the most recent Python. Generally these programs will give some sort of answer even if it isn't correct.
decompyle3 isn't that good for 3.7 and worse for 3.8, but right now it does things no other Python decompiler like `unpyc37` or `pycdc` does. For example, `decompyle3` handles variable annotations. As always, the issue trackers for the various programs will give you a sense for what needs to be done. For now, I've given up on reporting issues in the other decompilers because there are already enough issues reported, and they are just not getting fixed anyway.
@ -213,7 +213,7 @@ indicate when an import contains a dotted import. Similarly, code for
3.7 `import .. as ` is basically the same as `from .. import`, the
only difference is the target of the name changes to an "alias" in the
former. As a result, the disambiguation is now done on the semantic
action side, rathero than in parsing grammar rules.
action side, rather than in parsing grammar rules.
Some small specific fixes:
@ -246,13 +246,13 @@ versions better. This however comes with a big decompilation speed
penalty. When we redo control flow this should go back to normal, but
for now, accuracy is more important than speed.
Another `assert` transform rule was added. Parser rules to distingish
Another `assert` transform rule was added. Parser rules to distinguish
`try/finally` in 3.8 were added and we are more stringent about what
can be turned into an `assert`. There was some grammar cleanup here
too.
A number of small bugs were fixed, and some administrative changes to
make `make check-short` really be short, but check more throughly what
make `make check-short` really be short, but check more thoroughly what
it checks. minimum xdis version needed was bumped to include in the
newer 3.6-3.9 releases. See the `ChangeLog` for details.
@ -261,7 +261,7 @@ newer 3.6-3.9 releases. See the `ChangeLog` for details.
=============================
The main focus in this release was more accurate decompilation especially
for 3.7 and 3.8. However there are some improvments to Python 2.x as well,
for 3.7 and 3.8. However there are some improvements to Python 2.x as well,
including one of the long-standing problems of detecting the difference between
`try ... ` and `try else ...`.
@ -269,11 +269,11 @@ With this release we now rebase Python 3.7 on off of a 3.7 base; This
is also as it is (now) in decompyle3. This facilitates removing some of the
cruft in control-flow detection in the 2.7 uncompyle2 base.
Alas, decompilation speed for 3.7 on is greatly increased. Hopefull
Alas, decompilation speed for 3.7 on is greatly increased. Hopefully
this is temporary (cough, cough) until we can do a static control flow
pass.
Finally, runing in 3.9-dev is tolerated. We can disassemble, but no parse tables yet.
Finally, running in 3.9-dev is tolerated. We can disassemble, but no parse tables yet.
3.5.1 2019-11-17 JNC
@ -566,7 +566,7 @@ function calls and definitions.
- Misc pydisasm fixes
- Weird comprehension bug seen via new loctraceback
- Fix Python 3.5+ CALL_FUNCTION_VAR and BUILD_LIST_UNPACK in call; with this
we can can handle 3.5+ f(a, b, *c, *d, *e) now
we can handle 3.5+ f(a, b, *c, *d, *e) now
2.15.1 2018-02-05
=====================
@ -661,7 +661,7 @@ Overall: better 3.6 decompiling and some much needed code refactoring and cleanu
- Handle `EXTENDED_ARGS` better. While relevant to all Python versions it is most noticeable in
version 3.6+ where in switching to wordcodes the size of operands has been reduced from 2^16
to 2^8. `JUMP` instruction then often need EXTENDED_ARGS.
- Refactor find_jump_targets() with via working of of instructions rather the bytecode array.
- Refactor find_jump_targets() with via working of instructions rather the bytecode array.
- use `--weak-verify` more and additional fuzzing on verify()
- fragment parser now ignores errors in nested function definitions; an parameter was
added to assist here. Ignoring errors may be okay because the fragment parser often just needs,

View File

@ -171,7 +171,7 @@ Expanding decompiler availability to multiple Python Versions
--------------------------------------------------------------
Above we mention decompiling multiple versions of bytecode from a
single Python interpreter. We we talk about having the decompiler
single Python interpreter. We talk about having the decompiler
runnable from multiple versions of Python, independent of the set of
bytecode that the decompiler supports.
@ -185,7 +185,7 @@ implemented correctly. These also make excellent programs to check
whether a program has decompiled correctly.
Aside from this, debugging can be easier as well. To assist
understanding bytcode and single stepping it see `x-python
understanding bytecode and single stepping it see `x-python
<https://pypi.org/project/x-python/>`_ and the debugger for it
`trepan-xpy <https://pypi.org/project/trepanxpy/>`_.

View File

@ -41,7 +41,7 @@ although compatible with the original intention, is yet a little bit
different. See this_ for more information.
Python fragment deparsing given an instruction offset is useful in
showing stack traces and can be encorporated into any program that
showing stack traces and can be incorporated into any program that
wants to show a location in more detail than just a line number at
runtime. This code can be also used when source-code information does
not exist and there is just bytecode. Again, my debuggers make use of
@ -161,8 +161,8 @@ Python syntax changes, you should use this option if the bytecode is
the right bytecode for the Python interpreter that will be checking
the syntax.
You can also cross compare the results with either another version of
`uncompyle6` since there are are sometimes regressions in decompiling
You can also cross compare the results with another version of
`uncompyle6` since there are sometimes regressions in decompiling
specific bytecode as the overall quality improves.
For Python 3.7 and 3.8, the code in decompyle3_ is generally
@ -199,7 +199,7 @@ On the lower end of Python versions, decompilation seems pretty good although
we don't have any automated testing in place for Python's distributed tests.
Also, we don't have a Python interpreter for versions 1.6, and 2.0.
In the Python 3 series, Python support is is strongest around 3.4 or
In the Python 3 series, Python support is strongest around 3.4 or
3.3 and drops off as you move further away from those versions. Python
3.0 is weird in that it in some ways resembles 2.6 more than it does
3.1 or 2.7. Python 3.6 changes things drastically by using word codes

View File

@ -1,5 +1,5 @@
# From 2.7 test_normalize.py
# Bug has to to with finding the end of the tryelse block. I think thrown
# Bug has to do with finding the end of the tryelse block. I think thrown
# off by the "continue". In instructions the COME_FROM for END_FINALLY
# was at the wrong offset because some sort of "rtarget" was adjust.

View File

@ -5,7 +5,7 @@ def bug(self, j, a, b):
self.parse_comment(a, b, report=3)
# From 3.6 fnmatch.py
# Bug was precidence parenthesis around decorator
# Bug was precedence parenthesis around decorator
import functools
@functools.lru_cache(maxsize=256, typed=True)

View File

@ -1,6 +1,6 @@
# 2.5.6 decimal.py
# Bug on 2.5 and 2.6 by incorrectly changing opcode to
# RETURN_VALUE to psuedo op: RETURN_END_IF
# RETURN_VALUE to pseudo op: RETURN_END_IF
def _formatparam(param, value=None, quote=True):
if value is not None and len(value) > 0:
if isinstance(value, tuple):

View File

@ -114,7 +114,7 @@ class Python35Parser(Python34Parser):
ifelsestmtl ::= testexpr c_stmts_opt jb_else else_suitel
# 3.5 Has jump optimization which can route the end of an
# "if/then" back to to a loop just before an else.
# "if/then" back to a loop just before an else.
jump_absolute_else ::= jb_else
jump_absolute_else ::= CONTINUE ELSE

View File

@ -558,7 +558,7 @@ class Python37Parser(Python37BaseParser):
ifelsestmtl ::= testexpr_cf c_stmts_opt jb_else else_suitel
# 3.5 Has jump optimization which can route the end of an
# "if/then" back to to a loop just before an else.
# "if/then" back to a loop just before an else.
jump_absolute_else ::= jb_else
jump_absolute_else ::= CONTINUE ELSE

View File

@ -2093,7 +2093,7 @@ def code_deparse(
)
# Just when you think we've forgotten about what we
# were supposed to to: Generate source from the Syntax ree!
# were supposed to do: Generate source from the Syntax tree!
deparsed.gen_source(deparsed.ast, co.co_name, customize)
deparsed.set_pos_info(deparsed.ast, 0, len(deparsed.text))