From 3a1716db5818feb96054dcce325e8840063d10b7 Mon Sep 17 00:00:00 2001
From: Chris Lattner
LLVM
+The first four bytes of the stream identify the encoding of the file. This +is used by a reader to know what is contained in the file.
blah +
+A bitstream literally consists of a stream of bits. This stream is made up of a +number of primitive values that encode a stream of integer values. These +integers are are encoded in two ways: either as Fixed +Width Integers or as Variable Width +Integers.
Fixed-width integer values have their low bits emitted directly to the file. + For example, a 3-bit integer value encodes 1 as 001. Fixed width integers + are used when there are a well-known number of options for a field. For + example, boolean values are usually encoded with a 1-bit wide integer. +
+ +Variable-width integer (VBR) values encode values of arbitrary size, +optimizing for the case where the values are small. Given a 4-bit VBR field, +any 3-bit value (0 through 7) is encoded directly, with the high bit set to +zero. Values larger than N-1 bits emit their bits in a series of N-1 bit +chunks, where all but the last set the high bit.
+ +For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a +vbr4 value. The first set of four bits indicates the value 3 (011) with a +continuation piece (indicated by a high bit of 1). The next word indicates a +value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value +27. +
+ +6-bit characters encode common characters into a fixed 6-bit field. They
+represent the following characters with the following 6-bit values:
+
+
+
+
+
This encoding is only suitable for encoding characters and strings that +consist only of the above characters. It is completely incapable of encoding +characters not in the set.
+ +Occasionally, it is useful to emit zero bits until the bitstream is a +multiple of 32 bits. This ensures that the bit position in the stream can be +represented as a multiple of 32-bit words.
+ ++A bitstream is a sequential series of Blocks and +Data Records. Both of these start with an +abbreviation ID encoded as a fixed-bitwidth field. The width is specified by +the current block, as described below. The value of the abbreviation ID +specifies either a builtin ID (which have special meanings, defined below) or +one of the abbreviation IDs defined by the stream itself. +
+ ++The set of builtin abbrev IDs is: +
+ +Abbreviation IDs 4 and above are defined by the stream itself.
+ ++Blocks in a bitstream denote nested regions of the stream, and are identified by +a content-specific id number (for example, LLVM IR uses an ID of 12 to represent +function bodies). Nested blocks capture the hierachical structure of the data +encoded in it, and various properties are associated with blocks as the file is +parsed. Block definitions allow the reader to efficiently skip blocks +in constant time if the reader wants a summary of blocks, or if it wants to +efficiently skip data they do not understand. The LLVM IR reader uses this +mechanism to skip function bodies, lazily reading them on demand. +
+ ++When reading and encoding the stream, several properties are maintained for the +block. In particular, each block maintains: +
+ +As sub blocks are entered, these properties are saved and the new sub-block +has its own set of abbreviations, and its own abbrev id width. When a sub-block +is popped, the saved values are restored.
+ +[ENTER_SUBBLOCK, blockidvbr8, newabbrevlenvbr4, + <align32bits>, blocklen32]
+ ++The ENTER_SUBBLOCK abbreviation ID specifies the start of a new block record. +The blockid value is encoded as a 8-bit VBR identifier, and indicates +the type of block being entered (which is application specific). The +newabbrevlen value is a 4-bit VBR which specifies the +abbrev id width for the sub-block. The blocklen is a 32-bit aligned +value that specifies the size of the subblock, in 32-bit words. This value +allows the reader to skip over the entire block in one jump. +
+ +[END_BLOCK, <align32bits>]
+ ++The END_BLOCK abbreviation ID specifies the end of the current block record. +Its end is aligned to 32-bits to ensure that the size of the block is an even +multiple of 32-bits.
+ ++blah +
+ +