This document explains the concept of "filtering" in UPX. Basically filtering is a data preprocessing method which could improve the compression ratio of the files UPX processes. Currently the filters UPX uses are all based on one very special algorithm which is working well on ix86 executable files. This is what upx calls the "naive" implementation. There is also a "clever" method which works only with 32-bit executable file formats and was first implemented in UPX. Let's start with an example (from this point I assume a 32-bit file format). Consider this code fragment: 00025970: E877410600 calln FatalError 00025975: 8B414C mov eax,[ecx+4C] 00025978: 85C0 test eax,eax 0002597A: 7419 je file:00025995 0002597C: 85F6 test esi,esi 0002597E: 7504 jne file:00025984 00025980: 89C6 mov esi,eax 00025982: EB11 jmps file:00025995 00025984: 39C6 cmp esi,eax 00025986: 740D je file:00025995 00025988: 83C4F4 add (d) esp,F4 0002598B: 68A0A91608 push 0816A9A0 00025990: E857410600 calln FatalError 00025995: FF45F4 inc [ebp-0C] Here you can find two calls to a function called "FatalError". As you probably know the compression ratio is better if the compressor engine finds longer sequences of repeated strings. In this case the engine sees the following two byte sequences: E877 410600 8B and E857 410600 FF. So it can find a 3-byte-long match. Now comes the trick. On ix86 near calls are encoded as 0xE8 then a 32 bit relative offset to the destination address. Let's see what happens if the position of the call is added to that offset: 0x64177 + 0x25970 = 0x89AE7 0x64157 + 0x25990 = 0x89AE7 E8 E79A0800 8B E8 E79A0800 FF As you can see now the compressor engine finds a 5-byte-long match. Which means, that we've just saved 2 bytes of compressed data. Not bad. So this is the basic idea (the "naive" implementation). All we have to do is to "filter" the uncompressed data using this method before compression, and "unfilter" it after decompression. Simply go over the memory, find 0xE8 bytes and process the next 4 bytes as specified above. Of course there are several possibilities where this scheme could be improved. First, not only calls could be handled this way - near jumps (0xE9 + 32-bit offset) could work similarly. A second improvement could be if we limit this filtering only for the area occupied by real code - there is no point in messing with general data. Another improvement comes if the byte order of the 32-bit offset is reversed. Why? Here is another call which follows the above fragment: 000261FA: E8C9390600 calln ErrorF 0x639C9 + 0x261FA = 0x89BC3 E8 C39B 0800 compare this with E8 E79A 0800 As you can see these two functions are quite close together, but the compressor is not able to utilize this information (2-byte-long matches are usually not useful) unless the byte order of the offsets are reversed. In this case: E8 0008 9AE7 E8 0008 9BC3 So, the compressor engine finds a 3-byte-long match here. This is a nice improvement - now the engine utilizes the similarity of nearby destinations too. This is nice, but what happens when we find a "fake" call - ie. an 0xE8 which is part of another instruction? Like this: 0002A3B1: C745 E8 00000000 mov [ebp-18],00000000 In this case those nice 0x00 bytes are overwritten with some less compressible data. This is the disadvantage of the "naive" implementation. So let's be clever and try to detect and process only "real" calls. In UPX a simple method is used to find these calls. We simply check that the destinations of these calls are inside the same area as the calls themselves (so the above code is still a false positive, but it helps generally). A better method would be to actually disassemble the code - contributions are welcome :-) But this is only half of the job. We can not simply process one call then skip another one - the unfiltering process needs some information to be able to reverse the filtering. UPX uses the following idea, which works nicely. First we assume that the size of the area that should be filtered is less than 16 MiB. Then UPX scans over this area and keeps a record of the bytes that are following the 0xE8 bytes. If we are lucky, there will be bytes that were not found following 0xE8. These bytes are our candidates to be used as markers. Do you still remember that we assumed that the size of scanned area is less than 16 MiB? Well, this means that when we process a real call, the resulting offset will be less than 0x00FFFFFF too. So the MSB is always 0x00. Which is a nice place to store our marker. Of course we should reverse the byte order in the resulting offset - so this marker will appear just after the 0xE8 byte and not 4 bytes after it. That's all. Just go over the memory area, identify the "real" calls, and use this method to mark them. Then the job of the unfilter is very easy - it just searches for a 0xE8 + marker sequence and does the unfiltering if it finds one. It's clever, isn't it? :) To tell you the truth it's not this simple in UPX. It can use an additional parameter ("add_value") which makes things a little bit more complicated (for example it can happen that a found marker is proven to be unusable because of some overflow during an addition). And the whole algorithm is optimized for simplicity on the unfiltering side (as short and as fast assembly as possible - see stub/macros.ash), which makes the filtering process a little more difficult (fcto_ml.ch, fcto_ml2.ch, filteri.cpp). As it can be seen in filteri.cpp, there are lots of variants of this filtering implemented - native/clever, calls/jumps/calls&jumps, reversed/unreversed offsets - a sum of 18 slightly different filters (and another 9 variants for 16-bit programs). You can select one of them using the command line parameter "--filter=" or try most of them with "--all-filters". Or just let upx use the one we defined as the default for that executable format. EOF