From 3a1716db5818feb96054dcce325e8840063d10b7 Mon Sep 17 00:00:00 2001 From: Chris Lattner Date: Sat, 12 May 2007 05:37:42 +0000 Subject: [PATCH] add a bunch of content. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@37002 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/BitCodeFormat.html | 225 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 215 insertions(+), 10 deletions(-) diff --git a/docs/BitCodeFormat.html b/docs/BitCodeFormat.html index 949de946982..b84cd0e75bc 100644 --- a/docs/BitCodeFormat.html +++ b/docs/BitCodeFormat.html @@ -1,4 +1,5 @@ - + @@ -13,6 +14,10 @@
  • Bitstream Format
    1. Magic Numbers
    2. +
    3. Primitives
    4. +
    5. Abbreviation IDs
    6. +
    7. Blocks
    8. +
    9. Data Records
  • LLVM IR Encoding
  • @@ -71,10 +76,13 @@ structure. This structure consists of the following concepts:

    @@ -91,21 +99,218 @@ understanding the encoding.

    -

    LLVM

    +

    The first four bytes of the stream identify the encoding of the file. This +is used by a reader to know what is contained in the file.

    - - -
    Well-Formedness
    + +
    Primitives +
    -

    blah +

    +A bitstream literally consists of a stream of bits. This stream is made up of a +number of primitive values that encode a stream of integer values. These +integers are are encoded in two ways: either as Fixed +Width Integers or as Variable Width +Integers.

    + +
    Fixed Width Integers +
    + +
    + +

    Fixed-width integer values have their low bits emitted directly to the file. + For example, a 3-bit integer value encodes 1 as 001. Fixed width integers + are used when there are a well-known number of options for a field. For + example, boolean values are usually encoded with a 1-bit wide integer. +

    + +
    + + +
    Variable Width +Integers
    + +
    + +

    Variable-width integer (VBR) values encode values of arbitrary size, +optimizing for the case where the values are small. Given a 4-bit VBR field, +any 3-bit value (0 through 7) is encoded directly, with the high bit set to +zero. Values larger than N-1 bits emit their bits in a series of N-1 bit +chunks, where all but the last set the high bit.

    + +

    For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a +vbr4 value. The first set of four bits indicates the value 3 (011) with a +continuation piece (indicated by a high bit of 1). The next word indicates a +value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value +27. +

    + +
    + + +
    6-bit characters
    + +
    + +

    6-bit characters encode common characters into a fixed 6-bit field. They +represent the following characters with the following 6-bit values: + +

    + +

    This encoding is only suitable for encoding characters and strings that +consist only of the above characters. It is completely incapable of encoding +characters not in the set.

    + +
    + + + + +
    + +

    Occasionally, it is useful to emit zero bits until the bitstream is a +multiple of 32 bits. This ensures that the bit position in the stream can be +represented as a multiple of 32-bit words.

    + +
    + + + + + +
    + +

    +A bitstream is a sequential series of Blocks and +Data Records. Both of these start with an +abbreviation ID encoded as a fixed-bitwidth field. The width is specified by +the current block, as described below. The value of the abbreviation ID +specifies either a builtin ID (which have special meanings, defined below) or +one of the abbreviation IDs defined by the stream itself. +

    + +

    +The set of builtin abbrev IDs is: +

    + +
      +
    • 0 - END_BLOCK - This abbrev ID marks the end of the + current block.
    • +
    • 1 - ENTER_SUBBLOCK - This abbrev ID marks the + beginning of a new block.
    • +
    • 2 - DEFINE_ABBREV - This defines a new abbreviation.
    • +
    • 3 - UNABBREV_RECORD - This ID specifies the definition of an unabbreviated + record.
    • +
    + +

    Abbreviation IDs 4 and above are defined by the stream itself.

    + +
    + + + + +
    + +

    +Blocks in a bitstream denote nested regions of the stream, and are identified by +a content-specific id number (for example, LLVM IR uses an ID of 12 to represent +function bodies). Nested blocks capture the hierachical structure of the data +encoded in it, and various properties are associated with blocks as the file is +parsed. Block definitions allow the reader to efficiently skip blocks +in constant time if the reader wants a summary of blocks, or if it wants to +efficiently skip data they do not understand. The LLVM IR reader uses this +mechanism to skip function bodies, lazily reading them on demand. +

    + +

    +When reading and encoding the stream, several properties are maintained for the +block. In particular, each block maintains: +

    + +
      +
    1. A current abbrev id width. This value starts at 2, and is set every time a + block record is entered. The block entry specifies the abbrev id width for + the body of the block.
    2. + +
    3. A set of abbreviations. Abbreviations may be defined within a block, or + they may be associated with all blocks of a particular ID. +
    4. +
    + +

    As sub blocks are entered, these properties are saved and the new sub-block +has its own set of abbreviations, and its own abbrev id width. When a sub-block +is popped, the saved values are restored.

    + +
    + + + + +
    + +

    [ENTER_SUBBLOCK, blockidvbr8, newabbrevlenvbr4, + <align32bits>, blocklen32]

    + +

    +The ENTER_SUBBLOCK abbreviation ID specifies the start of a new block record. +The blockid value is encoded as a 8-bit VBR identifier, and indicates +the type of block being entered (which is application specific). The +newabbrevlen value is a 4-bit VBR which specifies the +abbrev id width for the sub-block. The blocklen is a 32-bit aligned +value that specifies the size of the subblock, in 32-bit words. This value +allows the reader to skip over the entire block in one jump. +

    + +
    + + + + +
    + +

    [END_BLOCK, <align32bits>]

    + +

    +The END_BLOCK abbreviation ID specifies the end of the current block record. +Its end is aligned to 32-bits to ensure that the size of the block is an even +multiple of 32-bits.

    + +
    + + + + + + +
    + +

    +blah +

    + +
    + +