mirror of
https://github.com/RPCSX/llvm.git
synced 2024-11-28 14:10:41 +00:00
[PDB] Begin adding documentation for the PDB file format.
Differential Revision: https://reviews.llvm.org/D26374 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286491 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
e13ecb7d13
commit
ab792ca2d9
3
docs/PDB/DbiStream.rst
Normal file
3
docs/PDB/DbiStream.rst
Normal file
@ -0,0 +1,3 @@
|
||||
=====================================
|
||||
The PDB DBI (Debug Info) Stream
|
||||
=====================================
|
3
docs/PDB/GlobalStream.rst
Normal file
3
docs/PDB/GlobalStream.rst
Normal file
@ -0,0 +1,3 @@
|
||||
=====================================
|
||||
The PDB Global Symbol Stream
|
||||
=====================================
|
3
docs/PDB/HashStream.rst
Normal file
3
docs/PDB/HashStream.rst
Normal file
@ -0,0 +1,3 @@
|
||||
=====================================
|
||||
The TPI & IPI Hash Streams
|
||||
=====================================
|
3
docs/PDB/ModiStream.rst
Normal file
3
docs/PDB/ModiStream.rst
Normal file
@ -0,0 +1,3 @@
|
||||
=====================================
|
||||
The Module Information Stream
|
||||
=====================================
|
121
docs/PDB/MsfFile.rst
Normal file
121
docs/PDB/MsfFile.rst
Normal file
@ -0,0 +1,121 @@
|
||||
=====================================
|
||||
The MSF File Format
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _msf_superblock:
|
||||
|
||||
The Superblock
|
||||
==============
|
||||
At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
|
||||
follows:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct SuperBlock {
|
||||
char FileMagic[sizeof(Magic)];
|
||||
ulittle32_t BlockSize;
|
||||
ulittle32_t FreeBlockMapBlock;
|
||||
ulittle32_t NumBlocks;
|
||||
ulittle32_t NumDirectoryBytes;
|
||||
ulittle32_t Unknown;
|
||||
ulittle32_t BlockMapAddr;
|
||||
};
|
||||
|
||||
- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
|
||||
followed by the bytes ``1A 44 53 00 00 00``.
|
||||
- **BlockSize** - The block size of the internal file system. Valid values are
|
||||
512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
|
||||
depending on the block sizes. For the purposes of LLVM, we handle only block
|
||||
sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
|
||||
- **FreeBlockMapBlock** - The index of a block within the file, at which begins
|
||||
a bitfield representing the set of all blocks within the file which are "free"
|
||||
(i.e. the data within that block is not used). This bitfield is spread across
|
||||
the MSF file at ``BlockSize`` intervals.
|
||||
**Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field
|
||||
is designed to support incremental and atomic updates of the underlying MSF
|
||||
file. While writing to an MSF file, if the value of this field is `1`, you
|
||||
can write your new modified bitfield to page 2, and vice versa. Only when
|
||||
you commit the file to disk do you need to swap the value in the SuperBlock
|
||||
to point to the new ``FreeBlockMapBlock``.
|
||||
- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
|
||||
should equal the size of the file on disk.
|
||||
- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
|
||||
directory contains information about each stream's size and the set of blocks
|
||||
that it occupies. It will be described in more detail later.
|
||||
- **BlockMapAddr** - The index of a block within the MSF file. At this block is
|
||||
an array of ``ulittle32_t``'s listing the blocks that the stream directory
|
||||
resides on. For large MSF files, the stream directory (which describes the
|
||||
block layout of each stream) may not fit entirely on a single block. As a
|
||||
result, this extra layer of indirection is introduced, whereby this block
|
||||
contains the list of blocks that the stream directory occupies, and the stream
|
||||
directory itself can be stitched together accordingly. The number of
|
||||
``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
|
||||
|
||||
The Stream Directory
|
||||
====================
|
||||
The Stream Directory is the root of all access to the other streams in an MSF
|
||||
file. Beginning at byte 0 of the stream directory is the following structure:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct StreamDirectory {
|
||||
ulittle32_t NumStreams;
|
||||
ulittle32_t StreamSizes[NumStreams];
|
||||
ulittle32_t StreamBlocks[NumStreams][];
|
||||
};
|
||||
|
||||
And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
|
||||
Note that each of the last two arrays is of variable length, and in particular
|
||||
that the second array is jagged.
|
||||
|
||||
**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
|
||||
streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
|
||||
|
||||
Stream 0: ceil(1000 / 4096) = 1 block
|
||||
|
||||
Stream 1: ceil(8000 / 4096) = 2 blocks
|
||||
|
||||
Stream 2: ceil(16000 / 4096) = 4 blocks
|
||||
|
||||
Stream 3: ceil(9000 / 4096) = 3 blocks
|
||||
|
||||
In total, 10 blocks are used. Let's see what the stream directory might look
|
||||
like:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct StreamDirectory {
|
||||
ulittle32_t NumStreams = 4;
|
||||
ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
|
||||
ulittle32_t StreamBlocks[][] = {
|
||||
{4},
|
||||
{5, 6},
|
||||
{11, 9, 7, 8},
|
||||
{10, 15, 12}
|
||||
};
|
||||
};
|
||||
|
||||
In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
|
||||
would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
|
||||
``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
|
||||
|
||||
Note also that the streams are discontiguous, and that part of stream 3 is in the
|
||||
middle of part of stream 2. You cannot assume anything about the layout of the
|
||||
blocks!
|
||||
|
||||
Alignment and Block Boundaries
|
||||
==============================
|
||||
As may be clear by now, it is possible for a single field (whether it be a high
|
||||
level record, a long string field, or even a single ``uint16``) to begin and
|
||||
end in separate blocks. For example, if the block size is 4096 bytes, and a
|
||||
``uint16`` field begins at the last byte of the current block, then it would
|
||||
need to end on the first byte of the next block. Since blocks are not
|
||||
necessarily contiguously laid out in the file, this means that both the consumer
|
||||
and the producer of an MSF file must be prepared to split data apart
|
||||
accordingly. In the aforementioned example, the high byte of the ``uint16``
|
||||
would be written to the last byte of block N, and the low byte would be written
|
||||
to the first byte of block N+1, which could be tens of thousands of bytes later
|
||||
(or even earlier!) in the file, depending on what the stream directory says.
|
3
docs/PDB/PdbStream.rst
Normal file
3
docs/PDB/PdbStream.rst
Normal file
@ -0,0 +1,3 @@
|
||||
========================================
|
||||
The PDB Info Stream (aka the PDB Stream)
|
||||
========================================
|
3
docs/PDB/PublicStream.rst
Normal file
3
docs/PDB/PublicStream.rst
Normal file
@ -0,0 +1,3 @@
|
||||
=====================================
|
||||
The PDB Public Symbol Stream
|
||||
=====================================
|
3
docs/PDB/TpiStream.rst
Normal file
3
docs/PDB/TpiStream.rst
Normal file
@ -0,0 +1,3 @@
|
||||
=====================================
|
||||
The PDB TPI Stream
|
||||
=====================================
|
160
docs/PDB/index.rst
Normal file
160
docs/PDB/index.rst
Normal file
@ -0,0 +1,160 @@
|
||||
=====================================
|
||||
The PDB File Format
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _pdb_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
PDB (Program Database) is a file format invented by Microsoft and which contains
|
||||
debug information that can be consumed by debuggers and other tools. Since
|
||||
officially supported APIs exist on Windows for querying debug information from
|
||||
PDBs even without the user understanding the internals of the file format, a
|
||||
large ecosystem of tools has been built for Windows to consume this format. In
|
||||
order for Clang to be able to generate programs that can interoperate with these
|
||||
tools, it is necessary for us to generate PDB files ourselves.
|
||||
|
||||
At the same time, LLVM has a long history of being able to cross-compile from
|
||||
any platform to any platform, and we wish for the same to be true here. So it
|
||||
is necessary for us to understand the PDB file format at the byte-level so that
|
||||
we can generate PDB files entirely on our own.
|
||||
|
||||
This manual describes what we know about the PDB file format today. The layout
|
||||
of the file, the various streams contained within, the format of individual
|
||||
records within, and more.
|
||||
|
||||
We would like to extend our heartfelt gratitude to Microsoft, without whom we
|
||||
would not be where we are today. Much of the knowledge contained within this
|
||||
manual was learned through reading code published by Microsoft on their `GitHub
|
||||
repo <https://github.com/Microsoft/microsoft-pdb>`__.
|
||||
|
||||
.. _pdb_layout:
|
||||
|
||||
File Layout
|
||||
===========
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
||||
MsfFile
|
||||
PdbStream
|
||||
TpiStream
|
||||
DbiStream
|
||||
ModiStream
|
||||
PublicStream
|
||||
GlobalStream
|
||||
HashStream
|
||||
|
||||
.. _msf:
|
||||
|
||||
The MSF Container
|
||||
-----------------
|
||||
A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
|
||||
An MSF file is actually a miniature "file system within a file". It contains
|
||||
multiple streams (aka files) which can represent arbitrary data, and these
|
||||
streams are divided into blocks which may not necessarily be contiguously
|
||||
laid out within the file (aka fragmented). Additionally, the MSF contains a
|
||||
stream directory (aka MFT) which describes how the streams (files) are laid
|
||||
out within the MSF.
|
||||
|
||||
For more information about the MSF container format, stream directory, and
|
||||
block layout, see :doc:`MsfFile`.
|
||||
|
||||
.. _streams:
|
||||
|
||||
Streams
|
||||
-------
|
||||
The PDB format contains a number of streams which describe various information
|
||||
such as the types, symbols, source files, and compilands (e.g. object files)
|
||||
of a program, as well as some additional streams containing hash tables that are
|
||||
used by debuggers and other tools to provide fast lookup of records and types
|
||||
by name, and various other information about how the program was compiled such
|
||||
as the specific toolchain used, and more. A summary of streams contained in a
|
||||
PDB file is as follows:
|
||||
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Name | Stream Index | Contents |
|
||||
+====================+==============================+===========================================+
|
||||
| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
|
||||
| | | - Fields to match EXE to this PDB |
|
||||
| | | - Map of named streams to stream indices |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
|
||||
| | | - Index of TPI Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
|
||||
| | | - Indices of individual module streams |
|
||||
| | | - Indices of public / global streams |
|
||||
| | | - Section Contribution Information |
|
||||
| | | - Source File Information |
|
||||
| | | - FPO / PGO Data |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
|
||||
| | | - Index of IPI Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| /LinkInfo | - Contained in PDB Stream | - Unknown |
|
||||
| | Named Stream map | |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| /src/headerblock | - Contained in PDB Stream | - Unknown |
|
||||
| | Named Stream map | |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
|
||||
| | Named Stream map | string de-duplication |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
|
||||
| | - One for each compiland | - Line Number Information |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
|
||||
| | | - Index of Public Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Global Stream | - Contained in DBI Stream | - Global Symbol Records |
|
||||
| | | - Index of Global Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
|
||||
| | | by name |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
|
||||
| | | by name |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
|
||||
More information about the structure of each of these can be found on the
|
||||
following pages:
|
||||
|
||||
:doc:`PdbStream`
|
||||
Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
|
||||
|
||||
:doc:`TpiStream`
|
||||
Information about the TPI stream and the CodeView records contained within.
|
||||
|
||||
:doc:`DbiStream`
|
||||
Information about the DBI stream and relevant substreams including the Module Substreams,
|
||||
source file information, and CodeView symbol records contained within.
|
||||
|
||||
:doc:`ModiStream`
|
||||
Information about the Module Information Stream, of which there is one for each compilation
|
||||
unit and the format of symbols contained within.
|
||||
|
||||
:doc:`PublicStream`
|
||||
Information about the Public Symbol Stream.
|
||||
|
||||
:doc:`GlobalStream`
|
||||
Information about the Global Symbol Stream.
|
||||
|
||||
:doc:`HashStream`
|
||||
Information about the Hash Table stream, and how it can be used to quickly look up records
|
||||
by name.
|
||||
|
||||
CodeView
|
||||
========
|
||||
CodeView is another format which comes into the picture. While MSF defines
|
||||
the structure of the overall file, and PDB defines the set of streams that
|
||||
appear within the MSF file and the format of those streams, CodeView defines
|
||||
the format of **symbol and type records** that appear within specific streams.
|
||||
Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for
|
||||
more information about the CodeView format.
|
@ -274,6 +274,7 @@ For API clients and LLVM developers.
|
||||
Coroutines
|
||||
GlobalISel
|
||||
XRay
|
||||
PDB/index
|
||||
|
||||
:doc:`WritingAnLLVMPass`
|
||||
Information on how to write LLVM transformations and analyses.
|
||||
@ -398,6 +399,9 @@ For API clients and LLVM developers.
|
||||
:doc:`XRay`
|
||||
High-level documentation of how to use XRay in LLVM.
|
||||
|
||||
:doc:`The Microsoft PDB File Format <PDB/index>`
|
||||
A detailed description of the Microsoft PDB (Program Database) file format.
|
||||
|
||||
Development Process Documentation
|
||||
=================================
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user