You've already forked linux-packaging-mono
							
							
		
			
				
	
	
		
			446 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			446 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| =====================================
 | |
| The PDB DBI (Debug Info) Stream
 | |
| =====================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| .. _dbi_intro:
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| The PDB DBI Stream (Index 3) is one of the largest and most important streams
 | |
| in a PDB file.  It contains information about how the program was compiled,
 | |
| (e.g. compilation flags, etc), the compilands (e.g. object files) that
 | |
| were used to link together the program, the source files which were used
 | |
| to build the program, as well as references to other streams that contain more
 | |
| detailed information about each compiland, such as the CodeView symbol records
 | |
| contained within each compiland and the source and line information for
 | |
| functions and other symbols within each compiland.
 | |
| 
 | |
| 
 | |
| .. _dbi_header:
 | |
| 
 | |
| Stream Header
 | |
| =============
 | |
| At offset 0 of the DBI Stream is a header with the following layout:
 | |
| 
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct DbiStreamHeader {
 | |
|     int32_t VersionSignature;
 | |
|     uint32_t VersionHeader;
 | |
|     uint32_t Age;
 | |
|     uint16_t GlobalStreamIndex;
 | |
|     uint16_t BuildNumber;
 | |
|     uint16_t PublicStreamIndex;
 | |
|     uint16_t PdbDllVersion;
 | |
|     uint16_t SymRecordStream;
 | |
|     uint16_t PdbDllRbld;
 | |
|     int32_t ModInfoSize;
 | |
|     int32_t SectionContributionSize;
 | |
|     int32_t SectionMapSize;
 | |
|     int32_t SourceInfoSize;
 | |
|     int32_t TypeServerSize;
 | |
|     uint32_t MFCTypeServerIndex;
 | |
|     int32_t OptionalDbgHeaderSize;
 | |
|     int32_t ECSubstreamSize;
 | |
|     uint16_t Flags;
 | |
|     uint16_t Machine;
 | |
|     uint32_t Padding;
 | |
|   };
 | |
|   
 | |
| - **VersionSignature** - Unknown meaning.  Appears to always be ``-1``.
 | |
| 
 | |
| - **VersionHeader** - A value from the following enum.
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   enum class DbiStreamVersion : uint32_t {
 | |
|     VC41 = 930803,
 | |
|     V50 = 19960307,
 | |
|     V60 = 19970606,
 | |
|     V70 = 19990903,
 | |
|     V110 = 20091201
 | |
|   };
 | |
| 
 | |
| Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
 | |
| ``V70``, and it is not clear what the other values are for.
 | |
| 
 | |
| - **Age** - The number of times the PDB has been written.  Equal to the same
 | |
|   field from the :ref:`PDB Stream header <pdb_stream_header>`.
 | |
|   
 | |
| - **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`,
 | |
|   which contains CodeView symbol records for all global symbols.  Actual records
 | |
|   are stored in the symbol record stream, and are referenced from this stream.
 | |
|   
 | |
| - **BuildNumber** - A bitfield containing values representing the major and minor
 | |
|   version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
 | |
|   program, with the following layout:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   uint16_t MinorVersion : 8;
 | |
|   uint16_t MajorVersion : 7;
 | |
|   uint16_t NewVersionFormat : 1;
 | |
| 
 | |
| For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``.
 | |
| If it is ``false``, the layout above does not apply and the reader should consult
 | |
| the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for
 | |
| further guidance.
 | |
|   
 | |
| - **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`,
 | |
|   which contains CodeView symbol records for all public symbols.  Actual records
 | |
|   are stored in the symbol record stream, and are referenced from this stream.
 | |
|   
 | |
| - **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
 | |
|   PDB.  Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
 | |
|   
 | |
| - **SymRecordStream** - The stream containing all CodeView symbol records used
 | |
|   by the program.  This is used for deduplication, so that many different
 | |
|   compilands can refer to the same symbols without having to include the full record
 | |
|   content inside of each module stream.
 | |
|   
 | |
| - **PdbDllRbld** - Unknown
 | |
| 
 | |
| - **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream
 | |
| 
 | |
| - **Flags** - A bitfield with the following layout, containing various
 | |
|   information about how the program was built:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   uint16_t WasIncrementallyLinked : 1;
 | |
|   uint16_t ArePrivateSymbolsStripped : 1;
 | |
|   uint16_t HasConflictingTypes : 1;
 | |
|   uint16_t Reserved : 13;
 | |
| 
 | |
| The only one of these that is not self-explanatory is ``HasConflictingTypes``.
 | |
| Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``.
 | |
| If it is passed to ``link.exe``, this field will be set.  Otherwise it will
 | |
| not be set.  It is unclear what this flag does, although it seems to have
 | |
| subtle implications on the algorithm used to look up type records.
 | |
| 
 | |
| - **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__
 | |
|   enumeration.  Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86).
 | |
| 
 | |
| Immediately after the fixed-size DBI Stream header are ``7`` variable-length
 | |
| `substreams`.  The following ``7`` fields of the DBI Stream header specify the
 | |
| number of bytes of the corresponding substream.  Each substream's contents will
 | |
| be described in detail :ref:`below <dbi_substreams>`.  The length of the entire
 | |
| DBI Stream should equal ``64`` (the length of the header above) plus the value
 | |
| of each of the following ``7`` fields.
 | |
| 
 | |
| - **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
 | |
|   
 | |
| - **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
 | |
| 
 | |
| - **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
 | |
| 
 | |
| - **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`.
 | |
| 
 | |
| - **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`. 
 | |
| 
 | |
| - **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`.
 | |
| 
 | |
| - **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`.
 | |
| 
 | |
| .. _dbi_substreams:
 | |
| 
 | |
| Substreams
 | |
| ==========
 | |
| 
 | |
| .. _dbi_mod_info_substream:
 | |
| 
 | |
| Module Info Substream
 | |
| ^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`.  The
 | |
| module info substream is an array of variable-length records, each one
 | |
| describing a single module (e.g. object file) linked into the program.  Each
 | |
| record in the array has the format:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct SectionContribEntry {
 | |
|     uint16_t Section;
 | |
|     char Padding1[2];
 | |
|     int32_t Offset;
 | |
|     int32_t Size;
 | |
|     uint32_t Characteristics;
 | |
|     uint16_t ModuleIndex;
 | |
|     char Padding2[2];
 | |
|     uint32_t DataCrc;
 | |
|     uint32_t RelocCrc;
 | |
|   };
 | |
|   
 | |
| While most of these are self-explanatory, the ``Characteristics`` field
 | |
| warrants some elaboration.  It corresponds to the ``Characteristics``
 | |
| field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__
 | |
| structure.
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct ModInfo {
 | |
|     uint32_t Unused1;
 | |
|     SectionContribEntry SectionContr;
 | |
|     uint16_t Flags;
 | |
|     uint16_t ModuleSymStream;
 | |
|     uint32_t SymByteSize;
 | |
|     uint32_t C11ByteSize;
 | |
|     uint32_t C13ByteSize;
 | |
|     uint16_t SourceFileCount;
 | |
|     char Padding[2];
 | |
|     uint32_t Unused2;
 | |
|     uint32_t SourceFileNameIndex;
 | |
|     uint32_t PdbFilePathNameIndex;
 | |
|     char ModuleName[];
 | |
|     char ObjFileName[];
 | |
|   };
 | |
|   
 | |
| - **SectionContr** - Describes the properties of the section in the final binary
 | |
|   which contain the code and data from this module.
 | |
| 
 | |
| - **Flags** - A bitfield with the following format:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   uint16_t Dirty : 1;  // ``true`` if this ModInfo has been written since reading the PDB.
 | |
|   uint16_t EC : 1;     // ``true`` if EC information is present for this module. It is unknown what EC actually is.
 | |
|   uint16_t Unused : 6;
 | |
|   uint16_t TSM : 8;    // Type Server Index for this module.  It is unknown what this is used for, but it is not used by LLVM.
 | |
|   
 | |
| 
 | |
| - **ModuleSymStream** - The index of the stream that contains symbol information
 | |
|   for this module.  This includes CodeView symbol information as well as source
 | |
|   and line information.
 | |
| 
 | |
| - **SymByteSize** - The number of bytes of data from the stream identified by
 | |
|   ``ModuleSymStream`` that represent CodeView symbol records.
 | |
| 
 | |
| - **C11ByteSize** - The number of bytes of data from the stream identified by
 | |
|   ``ModuleSymStream`` that represent C11-style CodeView line information.
 | |
| 
 | |
| - **C13ByteSize** - The number of bytes of data from the stream identified by
 | |
|   ``ModuleSymStream`` that represent C13-style CodeView line information.  At
 | |
|   most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero.
 | |
| 
 | |
| - **SourceFileCount** - The number of source files that contributed to this
 | |
|   module during compilation.
 | |
| 
 | |
| - **SourceFileNameIndex** - The offset in the names buffer of the primary
 | |
|   translation unit used to build this module.  All PDB files observed to date
 | |
|   always have this value equal to 0.
 | |
| 
 | |
| - **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file
 | |
|   containing this module's symbol information.  This has only been observed
 | |
|   to be non-zero for the special ``* Linker *`` module.
 | |
| 
 | |
| - **ModuleName** - The module name.  This is usually either a full path to an
 | |
|   object file (either directly passed to ``link.exe`` or from an archive) or
 | |
|   a string of the form ``Import:<dll name>``.
 | |
| 
 | |
| - **ObjFileName** - The object file name.  In the case of an module that is
 | |
|   linked directly passed to ``link.exe``, this is the same as **ModuleName**.
 | |
|   In the case of a module that comes from an archive, this is usually the full
 | |
|   path to the archive.
 | |
| 
 | |
| .. _dbi_sec_contr_substream:
 | |
| 
 | |
| Section Contribution Substream
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
 | |
| and consumes ``Header->SectionContributionSize`` bytes.  This substream begins
 | |
| with a single ``uint32_t`` which will be one of the following values:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   enum class SectionContrSubstreamVersion : uint32_t {
 | |
|     Ver60 = 0xeffe0000 + 19970605,
 | |
|     V2 = 0xeffe0000 + 20140516
 | |
|   };
 | |
|   
 | |
| ``Ver60`` is the only value which has been observed in a PDB so far.  Following
 | |
| this ``4`` byte field is an array of fixed-length structures.  If the version
 | |
| is ``Ver60``, it is an array of ``SectionContribEntry`` structures.  If the
 | |
| version is ``V2``, it is an array of ``SectionContribEntry2`` structures,
 | |
| defined as follows:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct SectionContribEntry2 {
 | |
|     SectionContribEntry SC;
 | |
|     uint32_t ISectCoff;
 | |
|   };
 | |
|   
 | |
| The purpose of the second field is not well understood.
 | |
|   
 | |
| 
 | |
| .. _dbi_section_map_substream:
 | |
| 
 | |
| Section Map Substream
 | |
| ^^^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends,
 | |
| and consumes ``Header->SectionMapSize`` bytes.  This substream begins with an ``8``
 | |
| byte header followed by an array of fixed-length records.  The header and records
 | |
| have the following layout:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct SectionMapHeader {
 | |
|     uint16_t Count;    // Number of segment descriptors
 | |
|     uint16_t LogCount; // Number of logical segment descriptors
 | |
|   };
 | |
|   
 | |
|   struct SectionMapEntry {
 | |
|     uint16_t Flags;         // See the SectionMapEntryFlags enum below.
 | |
|     uint16_t Ovl;           // Logical overlay number
 | |
|     uint16_t Group;         // Group index into descriptor array.
 | |
|     uint16_t Frame;
 | |
|     uint16_t SectionName;   // Byte index of segment / group name in string table, or 0xFFFF.
 | |
|     uint16_t ClassName;     // Byte index of class in string table, or 0xFFFF.
 | |
|     uint32_t Offset;        // Byte offset of the logical segment within physical segment.  If group is set in flags, this is the offset of the group.
 | |
|     uint32_t SectionLength; // Byte count of the segment or group.
 | |
|   };
 | |
|   
 | |
|   enum class SectionMapEntryFlags : uint16_t {
 | |
|     Read = 1 << 0,              // Segment is readable.
 | |
|     Write = 1 << 1,             // Segment is writable.
 | |
|     Execute = 1 << 2,           // Segment is executable.
 | |
|     AddressIs32Bit = 1 << 3,    // Descriptor describes a 32-bit linear address.
 | |
|     IsSelector = 1 << 8,        // Frame represents a selector.
 | |
|     IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
 | |
|     IsGroup = 1 << 10           // If set, descriptor represents a group.
 | |
|   };
 | |
|   
 | |
| Many of these fields are not well understood, so will not be discussed further.
 | |
| 
 | |
| .. _dbi_file_info_substream:
 | |
| 
 | |
| File Info Substream
 | |
| ^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends,
 | |
| and consumes ``Header->SourceInfoSize`` bytes.  This substream defines the mapping
 | |
| from module to the source files that contribute to that module.  Since multiple
 | |
| modules can use the same source file (for example, a header file), this substream
 | |
| uses a string table to store each unique file name only once, and then have each
 | |
| module use offsets into the string table rather than embedding the string's value
 | |
| directly.  The format of this substream is as follows:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct FileInfoSubstream {
 | |
|     uint16_t NumModules;
 | |
|     uint16_t NumSourceFiles;
 | |
|     
 | |
|     uint16_t ModIndices[NumModules];
 | |
|     uint16_t ModFileCounts[NumModules];
 | |
|     uint32_t FileNameOffsets[NumSourceFiles];
 | |
|     char NamesBuffer[][NumSourceFiles];
 | |
|   };
 | |
| 
 | |
| **NumModules** - The number of modules for which source file information is
 | |
| contained within this substream.  Should match the corresponding value from the
 | |
| ref:`dbi_header`.
 | |
| 
 | |
| **NumSourceFiles**: In theory this is supposed to contain the number of source
 | |
| files for which this substream contains information.  But that would present a
 | |
| problem in that the width of this field being ``16``-bits would prevent one from
 | |
| having more than 64K source files in a program.  In early versions of the file
 | |
| format, this seems to have been the case.  In order to support more than this, this
 | |
| field of the is simply ignored, and computed dynamically by summing up the values of
 | |
| the ``ModFileCounts`` array (discussed below).  In short, this value should be
 | |
| ignored.
 | |
| 
 | |
| **ModIndices** - This array is present, but does not appear to be useful.
 | |
| 
 | |
| **ModFileCountArray** - An array of ``NumModules`` integers, each one containing
 | |
| the number of source files which contribute to the module at the specified index.
 | |
| While each individual module is limited to 64K contributing source files, the
 | |
| union of all modules' source files may be greater than 64K.  The real number of
 | |
| source files is thus computed by summing this array.  Note that summing this array
 | |
| does not give the number of `unique` source files, only the total number of source
 | |
| file contributions to modules.
 | |
| 
 | |
| **FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles**
 | |
| here refers to the 32-bit value obtained from summing **ModFileCountArray**), where
 | |
| each integer is an offset into **NamesBuffer** pointing to a null terminated string.
 | |
| 
 | |
| **NamesBuffer** - An array of null terminated strings containing the actual source
 | |
| file names.
 | |
| 
 | |
| .. _dbi_type_server_substream:
 | |
| 
 | |
| Type Server Substream
 | |
| ^^^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends,
 | |
| and consumes ``Header->TypeServerSize`` bytes.  Neither the purpose nor the layout
 | |
| of this substream is understood, although it is assumed to related somehow to the
 | |
| usage of ``/Zi`` and ``mspdbsrv.exe``.  This substream will not be discussed further.
 | |
| 
 | |
| .. _dbi_ec_substream:
 | |
| 
 | |
| EC Substream
 | |
| ^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends,
 | |
| and consumes ``Header->ECSubstreamSize`` bytes.  Neither the purpose nor the layout
 | |
| of this substream is understood, and it will not be discussed further.
 | |
| 
 | |
| .. _dbi_optional_dbg_stream:
 | |
| 
 | |
| Optional Debug Header Stream
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and
 | |
| consumes ``Header->OptionalDbgHeaderSize`` bytes.  This field is an array of
 | |
| stream indices (e.g. ``uint16_t``'s), each of which identifies a stream
 | |
| index in the larger MSF file which contains some additional debug information.
 | |
| Each position of this array has a special meaning, allowing one to determine
 | |
| what kind of debug information is at the referenced stream.  ``11`` indices
 | |
| are currently understood, although it's possible there may be more.  The
 | |
| layout of each stream generally corresponds exactly to a particular type
 | |
| of debug data directory from the PE/COFF file.  The format of these fields
 | |
| can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__.
 | |
| 
 | |
| **FPO Data** - ``DbgStreamArray[0]``.  The data in the referenced stream is a
 | |
| debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``
 | |
| 
 | |
| **Exception Data** - ``DbgStreamArray[1]``.  The data in the referenced stream
 | |
| is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``.
 | |
| 
 | |
| **Fixup Data** - ``DbgStreamArray[2]``.  The data in the referenced stream is a
 | |
| debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
 | |
| 
 | |
| **Omap To Src Data** - ``DbgStreamArray[3]``.  The data in the referenced stream
 | |
| is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``.  This 
 | |
| is used for mapping addresses between instrumented and uninstrumented code.
 | |
| 
 | |
| **Omap From Src Data** - ``DbgStreamArray[4]``.  The data in the referenced stream
 | |
| is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``.  This 
 | |
| is used for mapping addresses between instrumented and uninstrumented code.
 | |
| 
 | |
| **Section Header Data** - ``DbgStreamArray[5]``.  A dump of all section headers from
 | |
| the original executable.
 | |
| 
 | |
| **Token / RID Map** - ``DbgStreamArray[6]``.  The layout of this stream is not
 | |
| understood, but it is assumed to be a mapping from ``CLR Token`` to 
 | |
| ``CLR Record ID``.  Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__
 | |
| for more information.
 | |
| 
 | |
| **Xdata** - ``DbgStreamArray[7]``.  A copy of the ``.xdata`` section from the
 | |
| executable.
 | |
| 
 | |
| **Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata``
 | |
| section from the executable, but that would make it identical to
 | |
| ``DbgStreamArray[1]``.  The difference between these two indices is not well
 | |
| understood.
 | |
| 
 | |
| **New FPO Data** - ``DbgStreamArray[9]``.  The data in the referenced stream is a
 | |
| debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``.  It is not clear how this
 | |
| differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have
 | |
| used the "new" format rather than the "old" format.
 | |
| 
 | |
| **Original Section Header Data** - ``DbgStreamArray[10]``.  Assumed to be similar
 | |
| to ``DbgStreamArray[5]``, but has not been observed in practice.
 |