Imported Upstream version 5.18.0.167

Former-commit-id: 289509151e0fee68a1b591a20c9f109c3c789d3a
This commit is contained in:
Xamarin Public Jenkins (auto-signing)
2018-10-20 08:25:10 +00:00
parent e19d552987
commit b084638f15
28489 changed files with 184 additions and 3866856 deletions

View File

@ -1,4 +0,0 @@
=====================================
CodeView Symbol Records
=====================================

View File

@ -1,4 +0,0 @@
=====================================
CodeView Type Records
=====================================

View File

@ -1,445 +0,0 @@
=====================================
The PDB DBI (Debug Info) Stream
=====================================
.. contents::
:local:
.. _dbi_intro:
Introduction
============
The PDB DBI Stream (Index 3) is one of the largest and most important streams
in a PDB file. It contains information about how the program was compiled,
(e.g. compilation flags, etc), the compilands (e.g. object files) that
were used to link together the program, the source files which were used
to build the program, as well as references to other streams that contain more
detailed information about each compiland, such as the CodeView symbol records
contained within each compiland and the source and line information for
functions and other symbols within each compiland.
.. _dbi_header:
Stream Header
=============
At offset 0 of the DBI Stream is a header with the following layout:
.. code-block:: c++
struct DbiStreamHeader {
int32_t VersionSignature;
uint32_t VersionHeader;
uint32_t Age;
uint16_t GlobalStreamIndex;
uint16_t BuildNumber;
uint16_t PublicStreamIndex;
uint16_t PdbDllVersion;
uint16_t SymRecordStream;
uint16_t PdbDllRbld;
int32_t ModInfoSize;
int32_t SectionContributionSize;
int32_t SectionMapSize;
int32_t SourceInfoSize;
int32_t TypeServerSize;
uint32_t MFCTypeServerIndex;
int32_t OptionalDbgHeaderSize;
int32_t ECSubstreamSize;
uint16_t Flags;
uint16_t Machine;
uint32_t Padding;
};
- **VersionSignature** - Unknown meaning. Appears to always be ``-1``.
- **VersionHeader** - A value from the following enum.
.. code-block:: c++
enum class DbiStreamVersion : uint32_t {
VC41 = 930803,
V50 = 19960307,
V60 = 19970606,
V70 = 19990903,
V110 = 20091201
};
Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
``V70``, and it is not clear what the other values are for.
- **Age** - The number of times the PDB has been written. Equal to the same
field from the :ref:`PDB Stream header <pdb_stream_header>`.
- **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`,
which contains CodeView symbol records for all global symbols. Actual records
are stored in the symbol record stream, and are referenced from this stream.
- **BuildNumber** - A bitfield containing values representing the major and minor
version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
program, with the following layout:
.. code-block:: c++
uint16_t MinorVersion : 8;
uint16_t MajorVersion : 7;
uint16_t NewVersionFormat : 1;
For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``.
If it is ``false``, the layout above does not apply and the reader should consult
the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for
further guidance.
- **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`,
which contains CodeView symbol records for all public symbols. Actual records
are stored in the symbol record stream, and are referenced from this stream.
- **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
- **SymRecordStream** - The stream containing all CodeView symbol records used
by the program. This is used for deduplication, so that many different
compilands can refer to the same symbols without having to include the full record
content inside of each module stream.
- **PdbDllRbld** - Unknown
- **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream
- **Flags** - A bitfield with the following layout, containing various
information about how the program was built:
.. code-block:: c++
uint16_t WasIncrementallyLinked : 1;
uint16_t ArePrivateSymbolsStripped : 1;
uint16_t HasConflictingTypes : 1;
uint16_t Reserved : 13;
The only one of these that is not self-explanatory is ``HasConflictingTypes``.
Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``.
If it is passed to ``link.exe``, this field will be set. Otherwise it will
not be set. It is unclear what this flag does, although it seems to have
subtle implications on the algorithm used to look up type records.
- **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__
enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86).
Immediately after the fixed-size DBI Stream header are ``7`` variable-length
`substreams`. The following ``7`` fields of the DBI Stream header specify the
number of bytes of the corresponding substream. Each substream's contents will
be described in detail :ref:`below <dbi_substreams>`. The length of the entire
DBI Stream should equal ``64`` (the length of the header above) plus the value
of each of the following ``7`` fields.
- **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
- **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
- **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
- **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`.
- **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`.
- **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`.
- **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`.
.. _dbi_substreams:
Substreams
==========
.. _dbi_mod_info_substream:
Module Info Substream
^^^^^^^^^^^^^^^^^^^^^
Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`. The
module info substream is an array of variable-length records, each one
describing a single module (e.g. object file) linked into the program. Each
record in the array has the format:
.. code-block:: c++
struct SectionContribEntry {
uint16_t Section;
char Padding1[2];
int32_t Offset;
int32_t Size;
uint32_t Characteristics;
uint16_t ModuleIndex;
char Padding2[2];
uint32_t DataCrc;
uint32_t RelocCrc;
};
While most of these are self-explanatory, the ``Characteristics`` field
warrants some elaboration. It corresponds to the ``Characteristics``
field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__
structure.
.. code-block:: c++
struct ModInfo {
uint32_t Unused1;
SectionContribEntry SectionContr;
uint16_t Flags;
uint16_t ModuleSymStream;
uint32_t SymByteSize;
uint32_t C11ByteSize;
uint32_t C13ByteSize;
uint16_t SourceFileCount;
char Padding[2];
uint32_t Unused2;
uint32_t SourceFileNameIndex;
uint32_t PdbFilePathNameIndex;
char ModuleName[];
char ObjFileName[];
};
- **SectionContr** - Describes the properties of the section in the final binary
which contain the code and data from this module.
- **Flags** - A bitfield with the following format:
.. code-block:: c++
uint16_t Dirty : 1; // ``true`` if this ModInfo has been written since reading the PDB.
uint16_t EC : 1; // ``true`` if EC information is present for this module. It is unknown what EC actually is.
uint16_t Unused : 6;
uint16_t TSM : 8; // Type Server Index for this module. It is unknown what this is used for, but it is not used by LLVM.
- **ModuleSymStream** - The index of the stream that contains symbol information
for this module. This includes CodeView symbol information as well as source
and line information.
- **SymByteSize** - The number of bytes of data from the stream identified by
``ModuleSymStream`` that represent CodeView symbol records.
- **C11ByteSize** - The number of bytes of data from the stream identified by
``ModuleSymStream`` that represent C11-style CodeView line information.
- **C13ByteSize** - The number of bytes of data from the stream identified by
``ModuleSymStream`` that represent C13-style CodeView line information. At
most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero.
- **SourceFileCount** - The number of source files that contributed to this
module during compilation.
- **SourceFileNameIndex** - The offset in the names buffer of the primary
translation unit used to build this module. All PDB files observed to date
always have this value equal to 0.
- **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file
containing this module's symbol information. This has only been observed
to be non-zero for the special ``* Linker *`` module.
- **ModuleName** - The module name. This is usually either a full path to an
object file (either directly passed to ``link.exe`` or from an archive) or
a string of the form ``Import:<dll name>``.
- **ObjFileName** - The object file name. In the case of an module that is
linked directly passed to ``link.exe``, this is the same as **ModuleName**.
In the case of a module that comes from an archive, this is usually the full
path to the archive.
.. _dbi_sec_contr_substream:
Section Contribution Substream
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
and consumes ``Header->SectionContributionSize`` bytes. This substream begins
with a single ``uint32_t`` which will be one of the following values:
.. code-block:: c++
enum class SectionContrSubstreamVersion : uint32_t {
Ver60 = 0xeffe0000 + 19970605,
V2 = 0xeffe0000 + 20140516
};
``Ver60`` is the only value which has been observed in a PDB so far. Following
this ``4`` byte field is an array of fixed-length structures. If the version
is ``Ver60``, it is an array of ``SectionContribEntry`` structures. If the
version is ``V2``, it is an array of ``SectionContribEntry2`` structures,
defined as follows:
.. code-block:: c++
struct SectionContribEntry2 {
SectionContribEntry SC;
uint32_t ISectCoff;
};
The purpose of the second field is not well understood.
.. _dbi_section_map_substream:
Section Map Substream
^^^^^^^^^^^^^^^^^^^^^
Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends,
and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``8``
byte header followed by an array of fixed-length records. The header and records
have the following layout:
.. code-block:: c++
struct SectionMapHeader {
uint16_t Count; // Number of segment descriptors
uint16_t LogCount; // Number of logical segment descriptors
};
struct SectionMapEntry {
uint16_t Flags; // See the SectionMapEntryFlags enum below.
uint16_t Ovl; // Logical overlay number
uint16_t Group; // Group index into descriptor array.
uint16_t Frame;
uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF.
uint16_t ClassName; // Byte index of class in string table, or 0xFFFF.
uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group.
uint32_t SectionLength; // Byte count of the segment or group.
};
enum class SectionMapEntryFlags : uint16_t {
Read = 1 << 0, // Segment is readable.
Write = 1 << 1, // Segment is writable.
Execute = 1 << 2, // Segment is executable.
AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address.
IsSelector = 1 << 8, // Frame represents a selector.
IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
IsGroup = 1 << 10 // If set, descriptor represents a group.
};
Many of these fields are not well understood, so will not be discussed further.
.. _dbi_file_info_substream:
File Info Substream
^^^^^^^^^^^^^^^^^^^
Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends,
and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping
from module to the source files that contribute to that module. Since multiple
modules can use the same source file (for example, a header file), this substream
uses a string table to store each unique file name only once, and then have each
module use offsets into the string table rather than embedding the string's value
directly. The format of this substream is as follows:
.. code-block:: c++
struct FileInfoSubstream {
uint16_t NumModules;
uint16_t NumSourceFiles;
uint16_t ModIndices[NumModules];
uint16_t ModFileCounts[NumModules];
uint32_t FileNameOffsets[NumSourceFiles];
char NamesBuffer[][NumSourceFiles];
};
**NumModules** - The number of modules for which source file information is
contained within this substream. Should match the corresponding value from the
ref:`dbi_header`.
**NumSourceFiles**: In theory this is supposed to contain the number of source
files for which this substream contains information. But that would present a
problem in that the width of this field being ``16``-bits would prevent one from
having more than 64K source files in a program. In early versions of the file
format, this seems to have been the case. In order to support more than this, this
field of the is simply ignored, and computed dynamically by summing up the values of
the ``ModFileCounts`` array (discussed below). In short, this value should be
ignored.
**ModIndices** - This array is present, but does not appear to be useful.
**ModFileCountArray** - An array of ``NumModules`` integers, each one containing
the number of source files which contribute to the module at the specified index.
While each individual module is limited to 64K contributing source files, the
union of all modules' source files may be greater than 64K. The real number of
source files is thus computed by summing this array. Note that summing this array
does not give the number of `unique` source files, only the total number of source
file contributions to modules.
**FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles**
here refers to the 32-bit value obtained from summing **ModFileCountArray**), where
each integer is an offset into **NamesBuffer** pointing to a null terminated string.
**NamesBuffer** - An array of null terminated strings containing the actual source
file names.
.. _dbi_type_server_substream:
Type Server Substream
^^^^^^^^^^^^^^^^^^^^^
Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends,
and consumes ``Header->TypeServerSize`` bytes. Neither the purpose nor the layout
of this substream is understood, although it is assumed to related somehow to the
usage of ``/Zi`` and ``mspdbsrv.exe``. This substream will not be discussed further.
.. _dbi_ec_substream:
EC Substream
^^^^^^^^^^^^
Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends,
and consumes ``Header->ECSubstreamSize`` bytes. Neither the purpose nor the layout
of this substream is understood, and it will not be discussed further.
.. _dbi_optional_dbg_stream:
Optional Debug Header Stream
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and
consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of
stream indices (e.g. ``uint16_t``'s), each of which identifies a stream
index in the larger MSF file which contains some additional debug information.
Each position of this array has a special meaning, allowing one to determine
what kind of debug information is at the referenced stream. ``11`` indices
are currently understood, although it's possible there may be more. The
layout of each stream generally corresponds exactly to a particular type
of debug data directory from the PE/COFF file. The format of these fields
can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__.
**FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is a
debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``
**Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream
is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``.
**Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a
debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
**Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream
is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This
is used for mapping addresses between instrumented and uninstrumented code.
**Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream
is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This
is used for mapping addresses between instrumented and uninstrumented code.
**Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from
the original executable.
**Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not
understood, but it is assumed to be a mapping from ``CLR Token`` to
``CLR Record ID``. Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__
for more information.
**Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the
executable.
**Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata``
section from the executable, but that would make it identical to
``DbgStreamArray[1]``. The difference between these two indices is not well
understood.
**New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a
debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. It is not clear how this
differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have
used the "new" format rather than the "old" format.
**Original Section Header Data** - ``DbgStreamArray[10]``. Assumed to be similar
to ``DbgStreamArray[5]``, but has not been observed in practice.

View File

@ -1,3 +0,0 @@
=====================================
The PDB Global Symbol Stream
=====================================

View File

@ -1,3 +0,0 @@
=====================================
The TPI & IPI Hash Streams
=====================================

View File

@ -1,80 +0,0 @@
=====================================
The Module Information Stream
=====================================
.. contents::
:local:
.. _modi_stream_intro:
Introduction
============
The Module Info Stream (henceforth referred to as the Modi stream) contains
information about a single module (object file, import library, etc that
contributes to the binary this PDB contains debug information about. There
is one modi stream for each module, and the mapping between modi stream index
and module is contained in the :doc:`DBI Stream <DbiStream>`. The modi stream
for a single module contains line information for the compiland, as well as
all CodeView information for the symbols defined in the compiland. Finally,
there is a "global refs" substream which is not well understood.
.. _modi_stream_layout:
Stream Layout
=============
A modi stream is laid out as follows:
.. code-block:: c++
struct ModiStream {
uint32_t Signature;
uint8_t Symbols[SymbolSize-4];
uint8_t C11LineInfo[C11Size];
uint8_t C13LineInfo[C13Size];
uint32_t GlobalRefsSize;
uint8_t GlobalRefs[GlobalRefsSize];
};
- **Signature** - Unknown. In practice only the value of ``4`` has been
observed. It is hypothesized that this value corresponds to the set of
``CV_SIGNATURE_xx`` defines in ``cvinfo.h``, with the value of ``4``
meaning that this module has C13 line information (as opposed to C11 line
information). A corollary of this is that we expect to only ever see
C13 line info, and that we do not understand the format of C11 line info.
- **Symbols** - The :ref:`CodeView Symbol Substream <modi_symbol_substream>`.
``SymbolSize`` is equal to the value of ``SymByteSize`` for the
corresponding module's entry in the :ref:`Module Info Substream <dbi_mod_info_substream>`
of the :doc:`DBI Stream <DbiStream>`.
- **C11LineInfo** - A block containing CodeView line information in C11
format. ``C11Size`` is equal to the value of ``C11ByteSize`` from the
:ref:`Module Info Substream <dbi_mod_info_substream>` of the
:doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C11 line
information is not present. As mentioned previously, the format of
C11 line info is not understood and we assume all line in modern PDBs
to be in C13 format.
- **C13LineInfo** - A block containing CodeView line information in C13
format. ``C13Size`` is equal to the value of ``C13ByteSize`` from the
:ref:`Module Info Substream <dbi_mod_info_substream>` of the
:doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C13 line
information is not present.
- **GlobalRefs** - The meaning of this substream is not understood.
.. _modi_symbol_substream:
The CodeView Symbol Substream
=============================
The CodeView Symbol Substream. This is an array of variable length
records describing the functions, variables, inlining information,
and other symbols defined in the compiland. The entire array consumes
``SymbolSize-4`` bytes. The format of a CodeView Symbol Record (and
thusly, an array of CodeView Symbol Records) is described in
:doc:`CodeViewSymbols`.

View File

@ -1,121 +0,0 @@
=====================================
The MSF File Format
=====================================
.. contents::
:local:
.. _msf_superblock:
The Superblock
==============
At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
follows:
.. code-block:: c++
struct SuperBlock {
char FileMagic[sizeof(Magic)];
ulittle32_t BlockSize;
ulittle32_t FreeBlockMapBlock;
ulittle32_t NumBlocks;
ulittle32_t NumDirectoryBytes;
ulittle32_t Unknown;
ulittle32_t BlockMapAddr;
};
- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
followed by the bytes ``1A 44 53 00 00 00``.
- **BlockSize** - The block size of the internal file system. Valid values are
512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
depending on the block sizes. For the purposes of LLVM, we handle only block
sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
- **FreeBlockMapBlock** - The index of a block within the file, at which begins
a bitfield representing the set of all blocks within the file which are "free"
(i.e. the data within that block is not used). This bitfield is spread across
the MSF file at ``BlockSize`` intervals.
**Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field
is designed to support incremental and atomic updates of the underlying MSF
file. While writing to an MSF file, if the value of this field is `1`, you
can write your new modified bitfield to page 2, and vice versa. Only when
you commit the file to disk do you need to swap the value in the SuperBlock
to point to the new ``FreeBlockMapBlock``.
- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
should equal the size of the file on disk.
- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
directory contains information about each stream's size and the set of blocks
that it occupies. It will be described in more detail later.
- **BlockMapAddr** - The index of a block within the MSF file. At this block is
an array of ``ulittle32_t``'s listing the blocks that the stream directory
resides on. For large MSF files, the stream directory (which describes the
block layout of each stream) may not fit entirely on a single block. As a
result, this extra layer of indirection is introduced, whereby this block
contains the list of blocks that the stream directory occupies, and the stream
directory itself can be stitched together accordingly. The number of
``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
The Stream Directory
====================
The Stream Directory is the root of all access to the other streams in an MSF
file. Beginning at byte 0 of the stream directory is the following structure:
.. code-block:: c++
struct StreamDirectory {
ulittle32_t NumStreams;
ulittle32_t StreamSizes[NumStreams];
ulittle32_t StreamBlocks[NumStreams][];
};
And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
Note that each of the last two arrays is of variable length, and in particular
that the second array is jagged.
**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
Stream 0: ceil(1000 / 4096) = 1 block
Stream 1: ceil(8000 / 4096) = 2 blocks
Stream 2: ceil(16000 / 4096) = 4 blocks
Stream 3: ceil(9000 / 4096) = 3 blocks
In total, 10 blocks are used. Let's see what the stream directory might look
like:
.. code-block:: c++
struct StreamDirectory {
ulittle32_t NumStreams = 4;
ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
ulittle32_t StreamBlocks[][] = {
{4},
{5, 6},
{11, 9, 7, 8},
{10, 15, 12}
};
};
In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
Note also that the streams are discontiguous, and that part of stream 3 is in the
middle of part of stream 2. You cannot assume anything about the layout of the
blocks!
Alignment and Block Boundaries
==============================
As may be clear by now, it is possible for a single field (whether it be a high
level record, a long string field, or even a single ``uint16``) to begin and
end in separate blocks. For example, if the block size is 4096 bytes, and a
``uint16`` field begins at the last byte of the current block, then it would
need to end on the first byte of the next block. Since blocks are not
necessarily contiguously laid out in the file, this means that both the consumer
and the producer of an MSF file must be prepared to split data apart
accordingly. In the aforementioned example, the high byte of the ``uint16``
would be written to the last byte of block N, and the low byte would be written
to the first byte of block N+1, which could be tens of thousands of bytes later
(or even earlier!) in the file, depending on what the stream directory says.

View File

@ -1,80 +0,0 @@
========================================
The PDB Info Stream (aka the PDB Stream)
========================================
.. contents::
:local:
.. _pdb_stream_header:
Stream Header
=============
At offset 0 of the PDB Stream is a header with the following layout:
.. code-block:: c++
struct PdbStreamHeader {
ulittle32_t Version;
ulittle32_t Signature;
ulittle32_t Age;
Guid UniqueId;
};
- **Version** - A Value from the following enum:
.. code-block:: c++
enum class PdbStreamVersion : uint32_t {
VC2 = 19941610,
VC4 = 19950623,
VC41 = 19950814,
VC50 = 19960307,
VC98 = 19970604,
VC70Dep = 19990604,
VC70 = 20000404,
VC80 = 20030901,
VC110 = 20091201,
VC140 = 20140508,
};
While the meaning of this field appears to be obvious, in practice we have
never observed a value other than ``VC70``, even with modern versions of
the toolchain, and it is unclear why the other values exist. It is assumed
that certain aspects of the PDB stream's layout, and perhaps even that of
the other streams, will change if the value is something other than ``VC70``.
- **Signature** - A 32-bit time-stamp generated with a call to ``time()`` at
the time the PDB file is written. Note that due to the inherent uniqueness
problems of using a timestamp with 1-second granularity, this field does not
really serve its intended purpose, and as such is typically ignored in favor
of the ``Guid`` field, described below.
- **Age** - The number of times the PDB file has been written. This can be used
along with ``Guid`` to match the PDB to its corresponding executable.
- **Guid** - A 128-bit identifier guaranteed to be unique across space and time.
In general, this can be thought of as the result of calling the Win32 API
`UuidCreate <https://msdn.microsoft.com/en-us/library/windows/desktop/aa379205(v=vs.85).aspx>`__,
although LLVM cannot rely on that, as it must work on non-Windows platforms.
Matching a PDB to its executable
================================
The linker is responsible for writing both the PDB and the final executable, and
as a result is the only entity capable of writing the information necessary to
match the PDB to the executable.
In order to accomplish this, the linker generates a guid for the PDB (or
re-uses the existing guid if it is linking incrementally) and increments the Age
field.
The executable is a PE/COFF file, and part of a PE/COFF file is the presence of
number of "directories". For our purposes here, we are interested in the "debug
directory". The exact format of a debug directory is described by the
`IMAGE_DEBUG_DIRECTORY structure <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680307(v=vs.85).aspx>`__.
For this particular case, the linker emits a debug directory of type
``IMAGE_DEBUG_TYPE_CODEVIEW``. The format of this record is defined in
``llvm/DebugInfo/CodeView/CVDebugRecord.h``, but it suffices to say here only
that it includes the same ``Guid`` and ``Age`` fields. At runtime, a
debugger or tool can scan the COFF executable image for the presence of
a debug directory of the correct type and verify that the Guid and Age match.

View File

@ -1,3 +0,0 @@
=====================================
The PDB Public Symbol Stream
=====================================

View File

@ -1,3 +0,0 @@
=====================================
The PDB TPI Stream
=====================================

View File

@ -1,167 +0,0 @@
=====================================
The PDB File Format
=====================================
.. contents::
:local:
.. _pdb_intro:
Introduction
============
PDB (Program Database) is a file format invented by Microsoft and which contains
debug information that can be consumed by debuggers and other tools. Since
officially supported APIs exist on Windows for querying debug information from
PDBs even without the user understanding the internals of the file format, a
large ecosystem of tools has been built for Windows to consume this format. In
order for Clang to be able to generate programs that can interoperate with these
tools, it is necessary for us to generate PDB files ourselves.
At the same time, LLVM has a long history of being able to cross-compile from
any platform to any platform, and we wish for the same to be true here. So it
is necessary for us to understand the PDB file format at the byte-level so that
we can generate PDB files entirely on our own.
This manual describes what we know about the PDB file format today. The layout
of the file, the various streams contained within, the format of individual
records within, and more.
We would like to extend our heartfelt gratitude to Microsoft, without whom we
would not be where we are today. Much of the knowledge contained within this
manual was learned through reading code published by Microsoft on their `GitHub
repo <https://github.com/Microsoft/microsoft-pdb>`__.
.. _pdb_layout:
File Layout
===========
.. important::
Unless otherwise specified, all numeric values are encoded in little endian.
If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
assume it is little endian!
.. toctree::
:hidden:
MsfFile
PdbStream
TpiStream
DbiStream
ModiStream
PublicStream
GlobalStream
HashStream
CodeViewSymbols
CodeViewTypes
.. _msf:
The MSF Container
-----------------
A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
An MSF file is actually a miniature "file system within a file". It contains
multiple streams (aka files) which can represent arbitrary data, and these
streams are divided into blocks which may not necessarily be contiguously
laid out within the file (aka fragmented). Additionally, the MSF contains a
stream directory (aka MFT) which describes how the streams (files) are laid
out within the MSF.
For more information about the MSF container format, stream directory, and
block layout, see :doc:`MsfFile`.
.. _streams:
Streams
-------
The PDB format contains a number of streams which describe various information
such as the types, symbols, source files, and compilands (e.g. object files)
of a program, as well as some additional streams containing hash tables that are
used by debuggers and other tools to provide fast lookup of records and types
by name, and various other information about how the program was compiled such
as the specific toolchain used, and more. A summary of streams contained in a
PDB file is as follows:
+--------------------+------------------------------+-------------------------------------------+
| Name | Stream Index | Contents |
+====================+==============================+===========================================+
| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
+--------------------+------------------------------+-------------------------------------------+
| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
| | | - Fields to match EXE to this PDB |
| | | - Map of named streams to stream indices |
+--------------------+------------------------------+-------------------------------------------+
| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
| | | - Index of TPI Hash Stream |
+--------------------+------------------------------+-------------------------------------------+
| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
| | | - Indices of individual module streams |
| | | - Indices of public / global streams |
| | | - Section Contribution Information |
| | | - Source File Information |
| | | - FPO / PGO Data |
+--------------------+------------------------------+-------------------------------------------+
| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
| | | - Index of IPI Hash Stream |
+--------------------+------------------------------+-------------------------------------------+
| /LinkInfo | - Contained in PDB Stream | - Unknown |
| | Named Stream map | |
+--------------------+------------------------------+-------------------------------------------+
| /src/headerblock | - Contained in PDB Stream | - Unknown |
| | Named Stream map | |
+--------------------+------------------------------+-------------------------------------------+
| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
| | Named Stream map | string de-duplication |
+--------------------+------------------------------+-------------------------------------------+
| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
| | - One for each compiland | - Line Number Information |
+--------------------+------------------------------+-------------------------------------------+
| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
| | | - Index of Public Hash Stream |
+--------------------+------------------------------+-------------------------------------------+
| Global Stream | - Contained in DBI Stream | - Global Symbol Records |
| | | - Index of Global Hash Stream |
+--------------------+------------------------------+-------------------------------------------+
| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
| | | by name |
+--------------------+------------------------------+-------------------------------------------+
| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
| | | by name |
+--------------------+------------------------------+-------------------------------------------+
More information about the structure of each of these can be found on the
following pages:
:doc:`PdbStream`
Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
:doc:`TpiStream`
Information about the TPI stream and the CodeView records contained within.
:doc:`DbiStream`
Information about the DBI stream and relevant substreams including the Module Substreams,
source file information, and CodeView symbol records contained within.
:doc:`ModiStream`
Information about the Module Information Stream, of which there is one for each compilation
unit and the format of symbols contained within.
:doc:`PublicStream`
Information about the Public Symbol Stream.
:doc:`GlobalStream`
Information about the Global Symbol Stream.
:doc:`HashStream`
Information about the Hash Table stream, and how it can be used to quickly look up records
by name.
CodeView
========
CodeView is another format which comes into the picture. While MSF defines
the structure of the overall file, and PDB defines the set of streams that
appear within the MSF file and the format of those streams, CodeView defines
the format of **symbol and type records** that appear within specific streams.
Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
more information about the CodeView format.