mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-25 12:29:58 +00:00
[PDB] Add documentation for the DBI Stream.
Differential Revision: https://reviews.llvm.org/D26552 llvm-svn: 286853
This commit is contained in:
parent
73a50770c9
commit
b98abb915e
@ -1,3 +1,445 @@
|
||||
=====================================
|
||||
The PDB DBI (Debug Info) Stream
|
||||
=====================================
|
||||
=====================================
|
||||
The PDB DBI (Debug Info) Stream
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _dbi_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The PDB DBI Stream (Index 3) is one of the largest and most important streams
|
||||
in a PDB file. It contains information about how the program was compiled,
|
||||
(e.g. compilation flags, etc), the compilands (e.g. object files) that
|
||||
were used to link together the program, the source files which were used
|
||||
to build the program, as well as references to other streams that contain more
|
||||
detailed information about each compiland, such as the CodeView symbol records
|
||||
contained within each compiland and the source and line information for
|
||||
functions and other symbols within each compiland.
|
||||
|
||||
|
||||
.. _dbi_header:
|
||||
|
||||
Stream Header
|
||||
=============
|
||||
At offset 0 of the DBI Stream is a header with the following layout:
|
||||
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct DbiStreamHeader {
|
||||
int32_t VersionSignature;
|
||||
uint32_t VersionHeader;
|
||||
uint32_t Age;
|
||||
uint16_t GlobalStreamIndex;
|
||||
uint16_t BuildNumber;
|
||||
uint16_t PublicStreamIndex;
|
||||
uint16_t PdbDllVersion;
|
||||
uint16_t SymRecordStream;
|
||||
uint16_t PdbDllRbld;
|
||||
int32_t ModInfoSize;
|
||||
int32_t SectionContributionSize;
|
||||
int32_t SectionMapSize;
|
||||
int32_t SourceInfoSize;
|
||||
int32_t TypeServerSize;
|
||||
uint32_t MFCTypeServerIndex;
|
||||
int32_t OptionalDbgHeaderSize;
|
||||
int32_t ECSubstreamSize;
|
||||
uint16_t Flags;
|
||||
uint16_t Machine;
|
||||
uint32_t Padding;
|
||||
};
|
||||
|
||||
- **VersionSignature** - Unknown meaning. Appears to always be ``-1``.
|
||||
|
||||
- **VersionHeader** - A value from the following enum.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
enum class DbiStreamVersion : uint32_t {
|
||||
VC41 = 930803,
|
||||
V50 = 19960307,
|
||||
V60 = 19970606,
|
||||
V70 = 19990903,
|
||||
V110 = 20091201
|
||||
};
|
||||
|
||||
Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
|
||||
``V70``, and it is not clear what the other values are for.
|
||||
|
||||
- **Age** - The number of times the PDB has been written. Equal to the same
|
||||
field from the :ref:`PDB Stream header <pdb_stream_header>`.
|
||||
|
||||
- **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`,
|
||||
which contains CodeView symbol records for all global symbols. Actual records
|
||||
are stored in the symbol record stream, and are referenced from this stream.
|
||||
|
||||
- **BuildNumber** - A bitfield containing values representing the major and minor
|
||||
version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
|
||||
program, with the following layout:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
uint16_t MinorVersion : 8;
|
||||
uint16_t MajorVersion : 7;
|
||||
uint16_t NewVersionFormat : 1;
|
||||
|
||||
For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``.
|
||||
If it is ``false``, the layout above does not apply and the reader should consult
|
||||
the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for
|
||||
further guidance.
|
||||
|
||||
- **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`,
|
||||
which contains CodeView symbol records for all public symbols. Actual records
|
||||
are stored in the symbol record stream, and are referenced from this stream.
|
||||
|
||||
- **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
|
||||
PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
|
||||
|
||||
- **SymRecordStream** - The stream containing all CodeView symbol records used
|
||||
by the program. This is used for deduplication, so that many different
|
||||
compilands can refer to the same symbols without having to include the full record
|
||||
content inside of each module stream.
|
||||
|
||||
- **PdbDllRbld** - Unknown
|
||||
|
||||
- **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream
|
||||
|
||||
- **Flags** - A bitfield with the following layout, containing various
|
||||
information about how the program was built:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
uint16_t WasIncrementallyLinked : 1;
|
||||
uint16_t ArePrivateSymbolsStripped : 1;
|
||||
uint16_t HasConflictingTypes : 1;
|
||||
uint16_t Reserved : 13;
|
||||
|
||||
The only one of these that is not self-explanatory is ``HasConflictingTypes``.
|
||||
Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``.
|
||||
If it is passed to ``link.exe``, this field will be set. Otherwise it will
|
||||
not be set. It is unclear what this flag does, although it seems to have
|
||||
subtle implications on the algorithm used to look up type records.
|
||||
|
||||
- **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__
|
||||
enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86).
|
||||
|
||||
Immediately after the fixed-size DBI Stream header are ``7`` variable-length
|
||||
`substreams`. The following ``7`` fields of the DBI Stream header specify the
|
||||
number of bytes of the corresponding substream. Each substream's contents will
|
||||
be described in detail :ref:`below <dbi_substreams>`. The length of the entire
|
||||
DBI Stream should equal ``64`` (the length of the header above) plus the value
|
||||
of each of the following ``7`` fields.
|
||||
|
||||
- **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
|
||||
|
||||
- **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
|
||||
|
||||
- **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
|
||||
|
||||
- **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`.
|
||||
|
||||
- **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`.
|
||||
|
||||
- **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`.
|
||||
|
||||
- **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`.
|
||||
|
||||
.. _dbi_substreams:
|
||||
|
||||
Substreams
|
||||
==========
|
||||
|
||||
.. _dbi_mod_info_substream:
|
||||
|
||||
Module Info Substream
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`. The
|
||||
module info substream is an array of variable-length records, each one
|
||||
describing a single module (e.g. object file) linked into the program. Each
|
||||
record in the array has the format:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct SectionContribEntry {
|
||||
uint16_t Section;
|
||||
char Padding1[2];
|
||||
int32_t Offset;
|
||||
int32_t Size;
|
||||
uint32_t Characteristics;
|
||||
uint16_t ModuleIndex;
|
||||
char Padding2[2];
|
||||
uint32_t DataCrc;
|
||||
uint32_t RelocCrc;
|
||||
};
|
||||
|
||||
While most of these are self-explanatory, the ``Characteristics`` field
|
||||
warrants some elaboration. It corresponds to the ``Characteristics``
|
||||
field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__
|
||||
structure.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct ModInfo {
|
||||
uint32_t Unused1;
|
||||
SectionContribEntry SectionContr;
|
||||
uint16_t Flags;
|
||||
uint16_t ModuleSymStream;
|
||||
uint32_t SymByteSize;
|
||||
uint32_t C11ByteSize;
|
||||
uint32_t C13ByteSize;
|
||||
uint16_t SourceFileCount;
|
||||
char Padding[2];
|
||||
uint32_t Unused2;
|
||||
uint32_t SourceFileNameIndex;
|
||||
uint32_t PdbFilePathNameIndex;
|
||||
char ModuleName[];
|
||||
char ObjFileName[];
|
||||
};
|
||||
|
||||
- **SectionContr** - Describes the properties of the section in the final binary
|
||||
which contain the code and data from this module.
|
||||
|
||||
- **Flags** - A bitfield with the following format:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
uint16_t Dirty : 1; // ``true`` if this ModInfo has been written since reading the PDB.
|
||||
uint16_t EC : 1; // ``true`` if EC information is present for this module. It is unknown what EC actually is.
|
||||
uint16_t Unused : 6;
|
||||
uint16_t TSM : 8; // Type Server Index for this module. It is unknown what this is used for, but it is not used by LLVM.
|
||||
|
||||
|
||||
- **ModuleSymStream** - The index of the stream that contains symbol information
|
||||
for this module. This includes CodeView symbol information as well as source
|
||||
and line information.
|
||||
|
||||
- **SymByteSize** - The number of bytes of data from the stream identified by
|
||||
``ModuleSymStream`` that represent CodeView symbol records.
|
||||
|
||||
- **C11ByteSize** - The number of bytes of data from the stream identified by
|
||||
``ModuleSymStream`` that represent C11-style CodeView line information.
|
||||
|
||||
- **C13ByteSize** - The number of bytes of data from the stream identified by
|
||||
``ModuleSymStream`` that represent C13-style CodeView line information. At
|
||||
most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero.
|
||||
|
||||
- **SourceFileCount** - The number of source files that contributed to this
|
||||
module during compilation.
|
||||
|
||||
- **SourceFileNameIndex** - The offset in the names buffer of the primary
|
||||
translation unit used to build this module. All PDB files observed to date
|
||||
always have this value equal to 0.
|
||||
|
||||
- **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file
|
||||
containing this module's symbol information. This has only been observed
|
||||
to be non-zero for the special ``* Linker *`` module.
|
||||
|
||||
- **ModuleName** - The module name. This is usually either a full path to an
|
||||
object file (either directly passed to ``link.exe`` or from an archive) or
|
||||
a string of the form ``Import:<dll name>``.
|
||||
|
||||
- **ObjFileName** - The object file name. In the case of an module that is
|
||||
linked directly passed to ``link.exe``, this is the same as **ModuleName**.
|
||||
In the case of a module that comes from an archive, this is usually the full
|
||||
path to the archive.
|
||||
|
||||
.. _dbi_sec_contr_substream:
|
||||
|
||||
Section Contribution Substream
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
|
||||
and consumes ``Header->SectionContributionSize`` bytes. This substream begins
|
||||
with a single ``uint32_t`` which will be one of the following values:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
enum class SectionContrSubstreamVersion : uint32_t {
|
||||
Ver60 = 0xeffe0000 + 19970605,
|
||||
V2 = 0xeffe0000 + 20140516
|
||||
};
|
||||
|
||||
``Ver60`` is the only value which has been observed in a PDB so far. Following
|
||||
this ``4`` byte field is an array of fixed-length structures. If the version
|
||||
is ``Ver60``, it is an array of ``SectionContribEntry`` structures. If the
|
||||
version is ``V2``, it is an array of ``SectionContribEntry2`` structures,
|
||||
defined as follows:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct SectionContribEntry2 {
|
||||
SectionContribEntry SC;
|
||||
uint32_t ISectCoff;
|
||||
};
|
||||
|
||||
The purpose of the second field is not well understood.
|
||||
|
||||
|
||||
.. _dbi_section_map_substream:
|
||||
|
||||
Section Map Substream
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends,
|
||||
and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``8``
|
||||
byte header followed by an array of fixed-length records. The header and records
|
||||
have the following layout:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct SectionMapHeader {
|
||||
uint16_t Count; // Number of segment descriptors
|
||||
uint16_t LogCount; // Number of logical segment descriptors
|
||||
};
|
||||
|
||||
struct SectionMapEntry {
|
||||
uint16_t Flags; // See the SectionMapEntryFlags enum below.
|
||||
uint16_t Ovl; // Logical overlay number
|
||||
uint16_t Group; // Group index into descriptor array.
|
||||
uint16_t Frame;
|
||||
uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF.
|
||||
uint16_t ClassName; // Byte index of class in string table, or 0xFFFF.
|
||||
uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group.
|
||||
uint32_t SectionLength; // Byte count of the segment or group.
|
||||
};
|
||||
|
||||
enum class SectionMapEntryFlags : uint16_t {
|
||||
Read = 1 << 0, // Segment is readable.
|
||||
Write = 1 << 1, // Segment is writable.
|
||||
Execute = 1 << 2, // Segment is executable.
|
||||
AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address.
|
||||
IsSelector = 1 << 8, // Frame represents a selector.
|
||||
IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
|
||||
IsGroup = 1 << 10 // If set, descriptor represents a group.
|
||||
};
|
||||
|
||||
Many of these fields are not well understood, so will not be discussed further.
|
||||
|
||||
.. _dbi_file_info_substream:
|
||||
|
||||
File Info Substream
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends,
|
||||
and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping
|
||||
from module to the source files that contribute to that module. Since multiple
|
||||
modules can use the same source file (for example, a header file), this substream
|
||||
uses a string table to store each unique file name only once, and then have each
|
||||
module use offsets into the string table rather than embedding the string's value
|
||||
directly. The format of this substream is as follows:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct FileInfoSubstream {
|
||||
uint16_t NumModules;
|
||||
uint16_t NumSourceFiles;
|
||||
|
||||
uint16_t ModIndices[NumModules];
|
||||
uint16_t ModFileCounts[NumModules];
|
||||
uint32_t FileNameOffsets[NumSourceFiles];
|
||||
char NamesBuffer[][NumSourceFiles];
|
||||
};
|
||||
|
||||
**NumModules** - The number of modules for which source file information is
|
||||
contained within this substream. Should match the corresponding value from the
|
||||
ref:`dbi_header`.
|
||||
|
||||
**NumSourceFiles**: In theory this is supposed to contain the number of source
|
||||
files for which this substream contains information. But that would present a
|
||||
problem in that the width of this field being ``16``-bits would prevent one from
|
||||
having more than 64K source files in a program. In early versions of the file
|
||||
format, this seems to have been the case. In order to support more than this, this
|
||||
field of the is simply ignored, and computed dynamically by summing up the values of
|
||||
the ``ModFileCounts`` array (discussed below). In short, this value should be
|
||||
ignored.
|
||||
|
||||
**ModIndices** - This array is present, but does not appear to be useful.
|
||||
|
||||
**ModFileCountArray** - An array of ``NumModules`` integers, each one containing
|
||||
the number of source files which contribute to the module at the specified index.
|
||||
While each individual module is limited to 64K contributing source files, the
|
||||
union of all modules' source files may be greater than 64K. The real number of
|
||||
source files is thus computed by summing this array. Note that summing this array
|
||||
does not give the number of `unique` source files, only the total number of source
|
||||
file contributions to modules.
|
||||
|
||||
**FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles**
|
||||
here refers to the 32-bit value obtained from summing **ModFileCountArray**), where
|
||||
each integer is an offset into **NamesBuffer** pointing to a null terminated string.
|
||||
|
||||
**NamesBuffer** - An array of null terminated strings containing the actual source
|
||||
file names.
|
||||
|
||||
.. _dbi_type_server_substream:
|
||||
|
||||
Type Server Substream
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends,
|
||||
and consumes ``Header->TypeServerSize`` bytes. Neither the purpose nor the layout
|
||||
of this substream is understood, although it is assumed to related somehow to the
|
||||
usage of ``/Zi`` and ``mspdbsrv.exe``. This substream will not be discussed further.
|
||||
|
||||
.. _dbi_ec_substream:
|
||||
|
||||
EC Substream
|
||||
^^^^^^^^^^^^
|
||||
Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends,
|
||||
and consumes ``Header->ECSubstreamSize`` bytes. Neither the purpose nor the layout
|
||||
of this substream is understood, and it will not be discussed further.
|
||||
|
||||
.. _dbi_optional_dbg_stream:
|
||||
|
||||
Optional Debug Header Stream
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and
|
||||
consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of
|
||||
stream indices (e.g. ``uint16_t``'s), each of which identifies a stream
|
||||
index in the larger MSF file which contains some additional debug information.
|
||||
Each position of this array has a special meaning, allowing one to determine
|
||||
what kind of debug information is at the referenced stream. ``11`` indices
|
||||
are currently understood, although it's possible there may be more. The
|
||||
layout of each stream generally corresponds exactly to a particular type
|
||||
of debug data directory from the PE/COFF file. The format of these fields
|
||||
can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__.
|
||||
|
||||
**FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is a
|
||||
debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``
|
||||
|
||||
**Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream
|
||||
is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``.
|
||||
|
||||
**Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a
|
||||
debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
|
||||
|
||||
**Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream
|
||||
is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This
|
||||
is used for mapping addresses between instrumented and uninstrumented code.
|
||||
|
||||
**Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream
|
||||
is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This
|
||||
is used for mapping addresses between instrumented and uninstrumented code.
|
||||
|
||||
**Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from
|
||||
the original executable.
|
||||
|
||||
**Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not
|
||||
understood, but it is assumed to be a mapping from ``CLR Token`` to
|
||||
``CLR Record ID``. Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__
|
||||
for more information.
|
||||
|
||||
**Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the
|
||||
executable.
|
||||
|
||||
**Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata``
|
||||
section from the executable, but that would make it identical to
|
||||
``DbgStreamArray[1]``. The difference between these two indices is not well
|
||||
understood.
|
||||
|
||||
**New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a
|
||||
debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. It is not clear how this
|
||||
differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have
|
||||
used the "new" format rather than the "old" format.
|
||||
|
||||
**Original Section Header Data** - ``DbgStreamArray[10]``. Assumed to be similar
|
||||
to ``DbgStreamArray[5]``, but has not been observed in practice.
|
||||
|
@ -37,6 +37,11 @@ repo <https://github.com/Microsoft/microsoft-pdb>`__.
|
||||
File Layout
|
||||
===========
|
||||
|
||||
.. important::
|
||||
Unless otherwise specified, all numeric values are encoded in little endian.
|
||||
If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
|
||||
assume it is little endian!
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user