mirror of
https://github.com/upx/upx.git
synced 2024-12-12 15:05:56 +00:00
d453cc27a3
http://en.wikipedia.org/wiki/Binary_prefix for more details.
149 lines
6.2 KiB
Plaintext
149 lines
6.2 KiB
Plaintext
This document explains the concept of "filtering" in UPX. Basically
|
|
filtering is a data preprocessing method which could improve the
|
|
compression ratio of the files UPX processes.
|
|
|
|
Currently the filters UPX uses are all based on one very special
|
|
algorithm which is working well on ix86 executable files.
|
|
This is what upx calls the "naive" implementation. There is also a
|
|
"clever" method which works only with 32-bit executable file formats
|
|
and was first implemented in UPX.
|
|
|
|
Let's start with an example (from this point I assume a 32-bit file
|
|
format). Consider this code fragment:
|
|
|
|
00025970: E877410600 calln FatalError
|
|
00025975: 8B414C mov eax,[ecx+4C]
|
|
00025978: 85C0 test eax,eax
|
|
0002597A: 7419 je file:00025995
|
|
0002597C: 85F6 test esi,esi
|
|
0002597E: 7504 jne file:00025984
|
|
00025980: 89C6 mov esi,eax
|
|
00025982: EB11 jmps file:00025995
|
|
00025984: 39C6 cmp esi,eax
|
|
00025986: 740D je file:00025995
|
|
00025988: 83C4F4 add (d) esp,F4
|
|
0002598B: 68A0A91608 push 0816A9A0
|
|
00025990: E857410600 calln FatalError
|
|
00025995: FF45F4 inc [ebp-0C]
|
|
|
|
Here you can find two calls to a function called "FatalError". As you
|
|
probably know the compression ratio is better if the compressor engine
|
|
finds longer sequences of repeated strings. In this case the engine
|
|
sees the following two byte sequences:
|
|
|
|
E877 410600 8B and
|
|
E857 410600 FF.
|
|
|
|
So it can find a 3-byte-long match.
|
|
|
|
Now comes the trick. On ix86 near calls are encoded as 0xE8 then a 32
|
|
bit relative offset to the destination address. Let's see what
|
|
happens if the position of the call is added to that offset:
|
|
|
|
0x64177 + 0x25970 = 0x89AE7
|
|
0x64157 + 0x25990 = 0x89AE7
|
|
|
|
E8 E79A0800 8B
|
|
E8 E79A0800 FF
|
|
|
|
As you can see now the compressor engine finds a 5-byte-long match.
|
|
Which means, that we've just saved 2 bytes of compressed data. Not bad.
|
|
|
|
So this is the basic idea (the "naive" implementation). All we have to
|
|
do is to "filter" the uncompressed data using this method before
|
|
compression, and "unfilter" it after decompression. Simply go over the
|
|
memory, find 0xE8 bytes and process the next 4 bytes as specified
|
|
above.
|
|
|
|
Of course there are several possibilities where this scheme could be
|
|
improved. First, not only calls could be handled this way - near jumps
|
|
(0xE9 + 32-bit offset) could work similarly.
|
|
|
|
A second improvement could be if we limit this filtering only for the
|
|
area occupied by real code - there is no point in messing with general
|
|
data.
|
|
|
|
Another improvement comes if the byte order of the 32-bit offset is
|
|
reversed. Why? Here is another call which follows the above fragment:
|
|
|
|
000261FA: E8C9390600 calln ErrorF
|
|
|
|
0x639C9 + 0x261FA = 0x89BC3
|
|
|
|
E8 C39B 0800 compare this with
|
|
|
|
E8 E79A 0800
|
|
|
|
As you can see these two functions are quite close together, but the
|
|
compressor is not able to utilize this information (2-byte-long matches
|
|
are usually not useful) unless the byte order of the offsets are
|
|
reversed. In this case:
|
|
|
|
E8 0008 9AE7
|
|
|
|
E8 0008 9BC3
|
|
|
|
So, the compressor engine finds a 3-byte-long match here. This is a
|
|
nice improvement - now the engine utilizes the similarity of nearby
|
|
destinations too.
|
|
|
|
This is nice, but what happens when we find a "fake" call - ie. an 0xE8
|
|
which is part of another instruction? Like this:
|
|
|
|
0002A3B1: C745 E8 00000000 mov [ebp-18],00000000
|
|
|
|
In this case those nice 0x00 bytes are overwritten with some less
|
|
compressible data. This is the disadvantage of the "naive"
|
|
implementation.
|
|
|
|
So let's be clever and try to detect and process only "real" calls. In
|
|
UPX a simple method is used to find these calls. We simply check that
|
|
the destinations of these calls are inside the same area as the calls
|
|
themselves (so the above code is still a false positive, but it helps
|
|
generally). A better method would be to actually disassemble the code -
|
|
contributions are welcome :-)
|
|
|
|
But this is only half of the job. We can not simply process one call
|
|
then skip another one - the unfiltering process needs some information
|
|
to be able to reverse the filtering.
|
|
|
|
UPX uses the following idea, which works nicely. First we assume that
|
|
the size of the area that should be filtered is less than 16 MiB. Then
|
|
UPX scans over this area and keeps a record of the bytes that are
|
|
following the 0xE8 bytes. If we are lucky, there will be bytes that
|
|
were not found following 0xE8. These bytes are our candidates to be
|
|
used as markers.
|
|
|
|
Do you still remember that we assumed that the size of scanned area is
|
|
less than 16 MiB? Well, this means that when we process a real call, the
|
|
resulting offset will be less than 0x00FFFFFF too. So the MSB is always
|
|
0x00. Which is a nice place to store our marker. Of course we should
|
|
reverse the byte order in the resulting offset - so this marker will
|
|
appear just after the 0xE8 byte and not 4 bytes after it.
|
|
|
|
That's all. Just go over the memory area, identify the "real" calls,
|
|
and use this method to mark them. Then the job of the unfilter is very
|
|
easy - it just searches for a 0xE8 + marker sequence and does the
|
|
unfiltering if it finds one. It's clever, isn't it? :)
|
|
|
|
To tell you the truth it's not this simple in UPX. It can use an
|
|
additional parameter ("add_value") which makes things a little bit more
|
|
complicated (for example it can happen that a found marker is proven to
|
|
be unusable because of some overflow during an addition).
|
|
|
|
And the whole algorithm is optimized for simplicity on the unfiltering
|
|
side (as short and as fast assembly as possible - see stub/macros.ash),
|
|
which makes the filtering process a little more difficult (fcto_ml.ch,
|
|
fcto_ml2.ch, filteri.cpp).
|
|
|
|
As it can be seen in filteri.cpp, there are lots of variants of this
|
|
filtering implemented - native/clever, calls/jumps/calls&jumps,
|
|
reversed/unreversed offsets - a sum of 18 slightly different filters
|
|
(and another 9 variants for 16-bit programs).
|
|
|
|
You can select one of them using the command line parameter "--filter="
|
|
or try most of them with "--all-filters". Or just let upx use the one
|
|
we defined as the default for that executable format.
|
|
|
|
EOF
|