The Portable PDB (Program Database) format describes an encoding of debugging information produced by compilers of Common Language Infrastructure (CLI) languages and consumed by debuggers and other tools. The format is based on the ECMA-335 Partition II metadata standard. It extends its schema while using the same physical table and stream layouts and encodings. The schema of the debugging metadata is complementary to the ECMA-335 metadata schema, therefore, the debugging metadata can (but doesn’t need to) be stored in the same metadata section of the PE/COFF file as the type system metadata.
The format is based on the ECMA-335 Partition II metadata standard. The physical layout of the data is described in the ECMA-335-II Chapter 24 and the Portable PDB debugging metadata format introduces no changes to the fundamental structure.
The ECMA-335-II standard is amended by an addition of the following tables to the “#~” metadata stream:
- Document
- MethodDebugInformation
- LocalScope
- LocalVariable
- LocalConstant
- ImportScope
- StateMachineMethod
- CustomDebugInformation
Debugging metadata tables may be embedded into type system metadata (and part of a PE file), or they may be stored separately in a metadata blob contained in a .pdb file. In the latter case additional information is included that connects the debugging metadata to the type system metadata.
When debugging metadata is generated to a separate data blob "#Pdb" and "#~" streams shall be present. The standalone debugging metadata may also include #Guid, #String and #Blob heaps, which have the same physical layout but are distinct from the corresponding streams of the type system metadata.
The #Pdb stream has the following structure:
Offset | Size | Field | Description |
---|---|---|---|
0 | 20 | PDB id | A byte sequence uniquely representing the debugging metadata blob content. |
20 | 4 | EntryPoint | Entry point MethodDef token, or 0 if not applicable. The same value as stored in CLI header of the PE file. See ECMA-335-II 15.4.1.2. |
24 | 8 | ReferencedTypeSystemTables | Bit vector of referenced type system metadata tables, let n be the number of bits that are 1. |
32 | 4*n | TypeSystemTableRows | Array of n 4-byte unsigned integers indicating the number of rows for each referenced type system metadata table. |
"#~" stream shall only contain debugging information tables defined above.
References to heaps (strings, blobs, guids) are references to heaps of the debugging metadata. The sizes of references to type system tables are determined using the algorithm described in ECMA-335-II Chapter 24.2.6, except their respective row counts are found in TypeSystemTableRows field of the #Pdb stream.
The Document table has the following columns:
- Name (Blob heap index of document name blob)
- HashAlgorithm (Guid heap index)
- Hash (Blob heap index)
- Language (Guid heap index)
The table is not required to be sorted.
There shall be no duplicate rows in the Document table, based upon document name.
Name shall not be nil. It can however encode an empty name string.
Hash is the file content hashed using the specified HashAlgorithm. It is used to validate that a source file matches the one used by the compiler when compiling the source code.
The values for which field Language has a defined meaning are listed in the following tables along with the corresponding interpretation:
Language field value | language |
---|---|
3f5162f8-07c6-11d3-9053-00c04fa302a1 | Visual C# |
3a12d0b8-c26c-11d0-b442-00a0244a1dd2 | Visual Basic |
ab4f38c9-b6e6-43ba-be3b-58080b2ccce3 | Visual F# |
The values for which HashAlgorithm has defined meaning are listed in the following table along with the corresponding semantics of the Hash value.
HashAlgorithm field value | hash field semantics |
---|---|
ff1816ec-aa5e-4d10-87f7-6f4963833460 | SHA-1 hash |
8829d00f-11b8-4213-878b-770e8597ac16 | SHA-256 hash |
Otherwise, the meaning of Language, HashAlgorithm and Hash values is undefined and the reader can interpret them arbitrarily.
Document name blob is a sequence:
Blob ::= separator part+
where
- separator is a UTF8 encoded character, or byte 0 to represent an empty separator.
- part is a compressed integer into the #Blob heap, where the part is stored in UTF8 encoding (0 represents an empty string).
The document name is a concatenation of the parts separated by the separator.
Note Document names are usually normalized full paths, e.g. "C:\Source\file.cs" "/home/user/source/file.cs". The representation is optimized for an efficient deserialization of the name into a UTF8 encoded string while minimizing the overall storage space for document names.
MethodDebugInformation table is either empty (missing) or has exactly as many rows as MethodDef table and the following column:
- Document (The row id of the single document containing all sequence points of the method, or 0 if the method doesn't have sequence points or spans multiple documents)
- SequencePoints (Blob heap index, 0 if the method doesn’t have sequence points, encoding: sequence points blob)
The table is a logical extension of MethodDef table (adding a column to the table) and as such can be indexed by MethodDef row id.
Sequence point is a quintuple of integers and a document reference:
- IL Offset
- Start Line
- Start Column
- End Line
- End Column
- Document
Hidden sequence point is a sequence point whose Start Line = End Line = 0xfeefee and Start Column = End Column = 0.
The values of non-hidden sequence point must satisfy the following constraints
- IL Offset is within range [0, 0x20000000)
- IL Offset of a sequence point is lesser than IL Offset of the subsequent sequence point.
- Start Line is within range [0, 0x20000000) and not equal to 0xfeefee.
- End Line is within range [0, 0x20000000) and not equal to 0xfeefee.
- Start Column is within range [0, 0x10000)
- End Column is within range [0, 0x10000)
- End Line is greater or equal to Start Line.
- If Start Line is equal to End Line then End Column is greater than Start Column.
Sequence points blob has the following structure:
Blob ::= header SequencePointRecord (SequencePointRecord | document-record)*
SequencePointRecord ::= sequence-point-record | hidden-sequence-point-record
component | value stored | integer representation |
---|---|---|
LocalSignature | StandAloneSig table row id | unsigned compressed |
InitialDocument (opt) | Document row id | unsigned compressed |
LocalSignature stores the row id of the local signature of the method. This information is somewhat redundant since it can be retrieved from the IL stream. However in some scenarios the IL stream is not available or loading it would unnecessary page in memory that might not otherwise be needed.
InitialDocument is only present if the Document field of the MethodDebugInformation table is nil (i.e. the method body spans multiple documents).
component | value stored | integer representation |
---|---|---|
δILOffset | ILOffset if this is the first sequence point | unsigned compressed |
ILOffset - Previous.ILOffset otherwise | unsigned compressed, non-zero | |
ΔLines | EndLine - StartLine | unsigned compressed |
ΔColumns | EndColumn - StartColumn | ΔLines = 0: unsigned compressed, non-zero |
ΔLines > 0: signed compressed | ||
δStartLine | StartLine if this is the first non-hidden sequence point | unsigned compressed |
StartLine - PreviousNonHidden.StartLine otherwise | signed compressed | |
δStartColumn | StartColumn if this is the first non-hidden sequence point | unsigned compressed |
StartColumn - PreviousNonHidden.StartColumn otherwise | signed compressed |
hidden-sequence-point-record
component | value stored | integer representation |
---|---|---|
δILOffset | ILOffset if this is the first sequence point | unsigned compressed |
ILOffset - Previous.ILOffset otherwise | unsigned compressed, non-zero | |
ΔLine | 0 | unsigned compressed |
ΔColumn | 0 | unsigned compressed |
component | value stored | integer representation |
---|---|---|
δILOffset | 0 | unsigned compressed |
Document | Document row id | unsigned compressed |
Each SequencePointRecord represents a single sequence point. The sequence point inherits the value of Document property from the previous record (SequencePointRecord or document-record), from the Document field of the MethodDebugInformation table if it's the first sequence point of a method body that spans a single document, or from InitialDocument if it's the first sequence point of a method body that spans multiple documents. The value of IL Offset is calculated using the value of the previous sequence point (if any) and the value stored in the record.
The values of Start Line, Start Column, End Line and End Column of a non-hidden sequence point are calculated based upon the values of the previous non-hidden sequence point (if any) and the data stored in the record.
The LocalScope table has the following columns:
-
Method (MethodDef row id)
-
ImportScope (ImportScope row id)
-
VariableList (LocalVariable row id)
An index into the LocalVariable table; it marks the first of a contiguous run of LocalVariables owned by this LocalScope. The run continues to the smaller of:
- the last row of the LocalVariable table
- the next run of LocalVariables, found by inspecting the VariableList of the next row in this LocalScope table.
-
ConstantList (LocalConstant row id)
An index into the LocalConstant table; it marks the first of a contiguous run of LocalConstants owned by this LocalScope. The run continues to the smaller of:
- the last row of the LocalConstant table
- the next run of LocalConstants, found by inspecting the ConstantList of the next row in this LocalScope table.
-
StartOffset (integer [0..0x80000000), encoding: uint32)
Starting IL offset of the scope.
-
Length (integer (0..0x80000000), encoding: uint32)
The scope length in bytes.
The table is required to be sorted first by Method in ascending order, then by StartOffset in ascending order, then by Length in descending order.
StartOffset + Length shall be in range (0..0x80000000).
Each scope spans IL instructions in range [StartOffset, StartOffset + Length).
The first scope of each Method shall span all IL instructions of the Method, i.e. StartOffset shall be 0 and Length shall be equal to the size of the IL stream of the Method.
StartOffset shall point to the starting byte of an instruction of the Method.
StartOffset + Length shall point to the starting byte of an instruction of the Method or be equal to the size of the IL stream of the Method.
For each pair of scopes belonging to the same Method the intersection of their respective ranges R1 and R2 shall be either R1 or R2 or empty.
The LocalVariable table has the following columns:
-
Attributes (LocalVariableAttributes value, encoding: uint16)
-
Index (integer [0..0x10000), encoding: uint16)
Slot index in the local signature of the containing MethodDef.
-
Name (String heap index)
Conceptually, every row in the LocalVariable table is owned by one, and only one, row in the LocalScope table.
There shall be no duplicate rows in the LocalVariable table, based upon owner and Index.
There shall be no duplicate rows in the LocalVariable table, based upon owner and Name.
flag | value | description |
---|---|---|
DebuggerHidden | 0x0001 | Variable shouldn’t appear in the list of variables displayed by the debugger |
The LocalConstant table has the following columns:
- Name (String heap index)
- Signature (Blob heap index, LocalConstantSig blob)
Conceptually, every row in the LocalConstant table is owned by one, and only one, row in the LocalScope table.
There shall be no duplicate rows in the LocalConstant table, based upon owner and Name.
The structure of the blob is
Blob ::= CustomMod* (PrimitiveConstant | EnumConstant | GeneralConstant)
PrimitiveConstant ::= PrimitiveTypeCode PrimitiveValue
PrimitiveTypeCode ::= BOOLEAN | CHAR | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8 | STRING
EnumConstant ::= EnumTypeCode EnumValue EnumType
EnumTypeCode ::= BOOLEAN | CHAR | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8
EnumType ::= TypeDefOrRefOrSpecEncoded
GeneralConstant ::= (CLASS | VALUETYPE) TypeDefOrRefOrSpecEncoded GeneralValue? |
OBJECT
component | description |
---|---|
PrimitiveTypeCode | A 1-byte constant describing the structure of the PrimitiveValue. |
PrimitiveValue | The value of the constant. |
EnumTypeCode | A 1-byte constant describing the structure of the EnumValue. |
EnumValue | The underlying value of the enum. |
CustomMod | Custom modifier as specified in ECMA-335 §II.23.2.7 |
TypeDefOrRefOrSpecEncoded | TypeDef, TypeRef or TypeSpec encoded as specified in ECMA-335 §II.23.2.8 |
The encoding of the PrimitiveValue and EnumValue is determined based upon the value of PrimitiveTypeCode and EnumTypeCode, respectively.
Type code | Value |
---|---|
BOOLEAN |
uint8: 0 represents false, 1 represents true |
CHAR |
uint16 |
I1 |
int8 |
U1 |
uint8 |
I2 |
int16 |
U2 |
uint16 |
I4 |
int32 |
U4 |
uint32 |
I8 |
int64 |
U8 |
uint64 |
R4 |
float32 |
R8 |
float64 |
STRING |
A single byte 0xff (represents a null string reference), or a UTF-16 little-endian encoded string (possibly empty). |
The numeric values of the type codes are defined by ECMA-335 §II.23.1.16.
EnumType must be an enum type as defined in ECMA-335 §II.14.3. The value of EnumTypeCode must match the underlying type of the EnumType.
The encoding of the GeneralValue is determined based upon the type expressed by TypeDefOrRefOrSpecEncoded specified in GeneralConstant. GeneralValue for special types listed in the table below has to be present and is encoded as specified. If the GeneralValue is not present the value of the constant is the default value of the type. If the type is a reference type the value is a null reference, if the type is a pointer type the value is a null pointer, etc.
Namespace | Name | GeneralValue encoding |
---|---|---|
System | Decimal | sign (highest bit), scale (bits 0..7), low (uint32), mid (uint32), high (uint32) |
System | DateTime | int64: ticks |
The ImportScope table has the following columns:
- Parent (ImportScope row id or nil)
- Imports (Blob index, encoding: Imports blob)
Imports blob represents all imports declared by an import scope.
Imports blob has the following structure:
Blob ::= Import*
Import ::= kind alias? target-assembly? target-namespace? target-type?
terminal | value | description |
---|---|---|
kind | Compressed unsigned integer | Import kind. |
alias | Compressed unsigned Blob heap index of a UTF8 string. | A name that can be used to refer to the target within the import scope. |
target-assembly | Compressed unsigned integer. | Row id of the AssemblyRef table. |
target-namespace | Compressed unsigned Blob heap index of a UTF8 string. | Fully qualified namespace name or XML namespace name. |
target-type | Compressed unsigned integer. | TypeDef, TypeRef or TypeSpec encoded as TypeDefOrRefOrSpecEncoded (see section II.23.2.8 of the ECMA-335 Metadata specification). |
kind | description |
---|---|
1 | Imports members of target-namespace. |
2 | Imports members of target-namespace defined in assembly target-assembly. |
3 | Imports members of target-type. |
4 | Imports members of XML namespace target-namespace with prefix alias. |
5 | Imports assembly reference alias defined in an ancestor scope. |
6 | Defines an alias for assembly target-assembly. |
7 | Defines an alias for the target-namespace. |
8 | Defines an alias for the part of target-namespace defined in assembly target-assembly. |
9 | Defines an alias for the target-type. |
The exact import semantics are language specific.
The blob may be empty. An empty import scope may still be target of custom debug information record.
The StateMachineMethod table has the following columns:
- MoveNextMethod (MethodDef row id)
- KickoffMethod (MethodDef row id)
The table associates the kickoff implementation method of an async or an iterator method (the method that initializes and starts the state machine) with the MoveNext method that implements the state transition.
The table is required to be sorted by MoveNextMethod column.
There shall be no duplicate rows in the StateMachineMethod table, based upon MoveNextMethod.
There shall be no duplicate rows in the StateMachineMethod table, based upon KickoffMethod.
The CustomDebugInformation table has the following columns:
- Parent (HasCustomDebugInformation coded index)
- Kind (Guid heap index)
- Value (Blob heap index)
The table is required to be sorted by Parent.
Kind is an id defined by the tool producing the information.
HasCustomDebugInformation | tag (5 bits) |
---|---|
MethodDef | 0 |
Field | 1 |
TypeRef | 2 |
TypeDef | 3 |
Param | 4 |
InterfaceImpl | 5 |
MemberRef | 6 |
Module | 7 |
DeclSecurity | 8 |
Property | 9 |
Event | 10 |
StandAloneSig | 11 |
ModuleRef | 12 |
TypeSpec | 13 |
Assembly | 14 |
AssemblyRef | 15 |
File | 16 |
ExportedType | 17 |
ManifestResource | 18 |
GenericParam | 19 |
GenericParamConstraint | 20 |
MethodSpec | 21 |
Document | 22 |
LocalScope | 23 |
LocalVariable | 24 |
LocalConstant | 25 |
ImportScope | 26 |
The following Custom Debug Information records are currently produced by C#, VB and F# compilers. In future the compilers and other tools may define new records. Once specified they may not change. If a change is needed the owner has to define a new record with a new kind (GUID).
Parent: MethodDef
Kind: {6DA9A61E-F8C7-4874-BE62-68BC5630DF71}
Scopes of local variables hoisted to state machine fields.
Structure:
Blob ::= Scope{hoisted-variable-count}
Scope::= start-offset length
terminal | encoding | description |
---|---|---|
start-offset | uint32 | Start IL offset of the scope, a value in range [0..0x80000000). |
length | uint32 | Length of the scope span, a value in range (0..0x80000000). |
Each scope spans IL instructions in range [start-offset, start-offset + length).
start-offset shall point to the starting byte of an instruction of the MoveNext method of the state machine type.
start-offset + length shall point to the starting byte of an instruction or be equal to the size of the IL stream of the MoveNext method of the state machine type.
Parent: LocalVariable or LocalConstant
Kind: {83C563C4-B4F3-47D5-B824-BA5441477EA8}
Structure:
Blob ::= bit-sequence
A sequence of bits for a local variable or constant whose type contains dynamic type (e.g. dynamic
, dynamic[]
, List<dynamic>
etc.) that describes which System.Object types encoded in the metadata signature of the local type were specified as dynamic in source code.
Bits of the sequence are grouped by 8. If the sequence length is not a multiple of 8 it is padded by 0 bit to the closest multiple of 8. Each group of 8 bits is encoded as a byte whose least significant bit is the first bit of the group and the highest significant bit is the 8th bit of the group. The sequence is encoded as a sequence of bytes representing these groups. Trailing zero bytes may be omitted.
TODO: Specify the meaning of the bits in the sequence.
Parent: Module
Kind: {58b2eab6-209f-4e4e-a22c-b2d0f910c782}
Structure:
Blob ::= namespace
terminal | encoding | description |
---|---|---|
namespace | UTF8 string | The default namespace for the module/project. |
Parent: MethodDef
Kind: {755F52A8-91C5-45BE-B4B8-209571E552BD}
If Parent is a kickoff method of a state machine (marked in metadata by a custom attribute derived from System.Runtime.CompilerServices.StateMachineAttribute) associates variables hoisted to fields of the state machine type with their syntax offsets. Otherwise, associates slots of the Parent method local signature with their syntax offsets.
Syntax offset is an integer distance from the start of the method body (it may be negative). It is used by the compiler to map the slot to the syntax node that declares the corresponding variable.
The blob has the following structure:
Blob ::= (has-syntax-offset-baseline syntax-offset-baseline)? SlotId{slot count}
SlotId ::= has-ordinal kind syntax-offset ordinal?
terminal | encoding | description |
---|---|---|
has-syntax-offset-baseline | 8 bits or none | 0xff or not present. |
syntax-offset-baseline | compressed unsigned integer | Negated syntax offset baseline. Only present if the minimal syntax offset stored in the slot map is less than -1. Defaults to -1 if not present. |
has-ordinal | 1 bit (highest) | Set iff ordinal is present. |
kind | 7 bits (lowest) | Implementation specific slot kind in range [0, 0x7f). |
syntax-offset | compressed unsigned integer | The value of syntax-offset + syntax-offset-baseline is the distance of the syntax node that declares the corresponding variable from the start of the method body. |
ordinal | compressed unsigned integer | Defines ordering of slots with the same syntax offset. |
The exact algorithm used to calculate syntax offsets and the algorithm that maps slots to syntax nodes is language and implementation specific and may change in future versions of the compiler.
Parent: MethodDef
Kind: {A643004C-0240-496F-A783-30D64F4979DE}
Encodes information used by the compiler when mapping lambdas and closures declared in the Parent method to their implementing methods and types and to the syntax nodes that declare them.
The blob has the following structure:
Blob ::= method-ordinal syntax-offset-baseline closure-count Closure{closure-count} Lambda*
Closure ::= syntax-offset
Lambda ::= syntax-offset closure-ordinal
The number of lambda entries is determined by the size of the blob (the reader shall read lambda records until the end of the blob is reached).
terminal | encoding | description |
---|---|---|
method-ordinal | compressed unsigned integer | Implementation specific number derived from the source location of Parent method. |
syntax-offset-baseline | compressed unsigned integer | Negated minimum of syntax offsets stored in the map and -1. |
closure-count | compressed unsigned integer | The number of closure entries. |
syntax-offset | compressed unsigned integer | The value of syntax-offset + syntax-offset-baseline is the distance of the syntax node that represents the lambda/closure in the source from the start of the method body. |
closure-ordinal | compressed unsigned integer | 0 if the lambda doesn’t have a closure. Otherwise, 1-based index into the closure list. |
The exact algorithm used to calculate syntax offsets and the algorithm that maps lambdas/closures to their implementing methods, types and syntax nodes is language and implementation specific and may change in future versions of the compiler.
Parent: Document
Kind: {0E8A571B-6926-466E-B4AD-8AB04611F5FE}
Embeds the content of the corresponding document in the PDB.
The blob has the following structure:
Blob ::= format content
terminal | encoding | description |
---|---|---|
format | int32 | Indicates how the content is serialized. 0 = raw bytes, uncompressed. Positive value = compressed by deflate algorithm and value indicates uncompressed size. Negative values reserved for future formats. |
content | format-specific | The text of the document in the specified format. The length is implied by the length of the blob minus four bytes for the format. |
Parent: Module
Kind: {CC110556-A091-4D38-9FEC-25AB9A351A6A}
The blob stores UTF8 encoded text file in JSON format that includes information on how to locate the content of documents listed in Document table on a source server.
Parent: Module
Kind: {7E4D4708-096E-4C5C-AEDA-CB10BA6A740D}
Stores information about all metadata references used to compile the module.
The blob has the following structure:
Blob ::= MetadataReferenceInfo+
MetadataReferenceInfo ::= file-name aliases flags time-stamp file-size mvid
terminal | encoding | description |
---|---|---|
file-name | UTF8 NIL-terminated | Name of the metadata file (includes an extension). |
aliases | UTF8 NIL-terminated, comma-separated list | List of external aliases for the reference. May be empty. |
flags | byte | Flags. |
time-stamp | uint32 | PE COFF header Timestamp field. |
file-size | uint32 | PE COFF header SizeOfImage field. |
mvid | GUID (16 bytes) | Module Version Id (ModuleDef table field). |
The meaning of the flags byte:
flag | description |
---|---|
0b0000001 | The referenced file is an assembly (as opposed to a netmodule). |
0b0000010 | Embed interop types. |
The remaining bits are reserved for future use and have currently no meaning.
The data can be used to find the reference in a file indexing service such as a symbol server. For example, the Simple Symbol Query Protocol uses a combination of file-name, time-stamp and file-size as a key. Other services might use the MVID as it uniquely identifies the module.
Parent: Module
Kind: {B5FEEC05-8CD0-4A83-96DA-466284BB4BD8}
Stores compilation options used to compile the module. Only captures information that is not present elsewhere in the PDB, in the PE headers or metadata of the module.
The blob has the following structure:
Blob ::= (name value)*
terminal | encoding | description |
---|---|---|
name | UTF8 NIL-terminated | Name of the compilation option. |
value | UTF8 NIL-terminated | Value of the compilation option. |
There shall be no two entries with the same name in the list.
It is recommended, but not required that name is lower-case and uses hyphen (-
) for separating words.
Common options:
name | value format | description |
---|---|---|
language |
CSharp or VisualBasic |
Language name. |
compiler-version |
SemVer2 version string | Version of the compiler used to build the module with build metadata set to commit SHA for officially released compiler. |
runtime-version |
SemVer2 version string | Version of the CLR used to build the module with build metadata set to commit SHA for officially released .NET Core runtime. |
Other options listed in the blob are specific to each compiler. Future versions of the compiler may add additional options. The order of the options in the list is insignificant.
The
runtime-version
is significant since the compiler may have used certain functionality from the runtime that impacts the compilation output (e.g. Unicode tables, etc.)
The purpose of this data is to allow a tool to reconstruct the compilation the module was built from. The source files for the compilation are expected to be recovered from the source server using SourceLink and/or from sources embedded in the PDB. The metadata references for the compilation are expected to be recovered from a file indexing service (e.g. symbol server) using information in Compilation Metadata References record.