Inside PDB: Part 1 — Blocks and Streams
Looking inside the debugging format used by Windows applications.
Preface
While the microsoft-pdb repository and the LLVM documentation on the PDB format are a decent place to start learning about the PDB file format, I found them rather overwhelming and confusing with the lack of visual representation. So, I hope this new series of blog posts help newcomers into this incredibly messy file format, or at least entertain your curiosity for such thing!
Additionally, these posts has already helped me improve my debugger project by writing technical details and forcing me to verify my implementation by testing my own code. While I won’t share code in these posts, you’re free to check its code or other implementations. After all, there really are a million ways to write software.
All information provided are of my own research, including the history chapter (I really did install various C compilers to see if they output PDBs). The first post has some information on PDB 2.0, but will mainly focus on PDB 7.0 in later parts.
Happy reading!
Sources / Further reading:
- https://llvm.org/docs/PDB/MsfFile.html
- llvm/include/llvm/DebugInfo/PDB/ and llvm-pdbutil(1) (llvm-pdbutil dump –summary FILE)
- https://github.com/microsoft/microsoft-pdb
- https://github.com/ziglang/zig/blob/master/lib/std/pdb.zig
- https://github.com/MolecularMatters/raw_pdb
- https://devblogs.microsoft.com/cppblog/faster-c-build-cycle-in-vs-15-with-debugfastlink/
- https://web.archive.org/web/20250318100020/https://www.informit.com/articles/article.aspx?p=22685
- http://www.godevtool.com/Other/pdb.htm
- http://www.debuginfo.com/articles/debuginfomatch.html
History
The Program Database file format, known by their extension name as PDB, is a debugging-focused container format initially introduced with Microsoft Visual C++ 1.00 (MSVC 1.0) in 1993.
The CL compiler is able to produce PDB output using the /Zi
(Full using program database) and /Fd
(PDB output filename) command options.
The PDB 2.0 format was introduced with Microsoft Visual C++ 2.0 in 1994 (MSVC 2.0), which adds support for incremental linking. This version of the format is used up to Visual Studio 97 (MSVC 5.0) and Microsoft Visual C++ 6.0 (MSVC 6.0).
The PDB 7.0 format was introduced with Visual Studio .NET 2002 (MSVC 7.0, CL version 12.0), and is still currently used in Visual Studio 2022 (MSVC 17.0).
Overview
The Program Database container format acts similarly to a filesystem. There are two major variants: PDB 2.0 with 16-bit block indexes, and PDB “Multi-Stream Format (MSF)” 7.0 with 32-bit bit block indexes and Free Page Maps.
All values are assumed to be in little-endian word order. Ceiling operations (rounding up) are defined as ⌈x⌉. All figures are only for a general visual representation and do not reflect block locations accurately.
A PDB file, in its most simple construction, is made up of fixed-sized blocks (also known as pages). Blocks are zero-based referenced (meaning indexes start at 0).
Blocks
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
Multiple blocks can define a single stream across the file. Streams are not contiguous by block and can contain any form of data. Streams are zero-based referenced (indexes start at 0).
Blocks
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| | ----+
+-----+ |
| | |
+-----+ |
| | ----+- Stream
+-----+ |
| | ----+
+-----+
| |
+-----+
Header
The header of a PDB file, also known as the SuperBlock, contains specification information, such as the size of a block, the number of blocks (totaling the file size), the size of the directory, and, in later versions, the FPM Index.
Blocks
+-----+
| | <--- You are here (SuperBlock)
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
PDB 2.0 header
Offset (in Bytes) | Size (in Bytes) | Description |
---|---|---|
0 | 44 | Signature: “Microsoft C/C++ program database 2.00\r\n\x1aJG\0\0 “ |
44 | 4 | BlockSize: The size of a block, in Bytes. Only observed values are 1024 (1024 is most common), 2048, and 4096. |
48 | 2 | StartPage: The block index of the first usable block. |
50 | 2 | BlockCount: The complete amount of blocks, totaling the file size. |
52 | 4 | RootSize: The size of the root directory, in Bytes. |
56 | 4 | Reserved. |
PDB 7.0 header
Offset (in Bytes) | Size (in Bytes) | Description |
---|---|---|
0 | 32 | Signature: “Microsoft C/C++ MSF 7.00\r\n\x1aDS\0\0\0 “ |
32 | 4 | BlockSize: The size of a block, in Bytes. Only observed values are 1024, 2048, and 4096 (4096 is most common). |
36 | 4 | FreeIndex: Block index to FPM (Free Page Map). Can only be 1 or 2. |
40 | 4 | BlockCount: The complete amount of blocks, totaling to the file size. |
44 | 4 | DirectorySize: The size of the directory, in Bytes. |
48 | 4 | Unknown: Must be unset (zero). |
52 | 4 | DirectoryOffset: The block index to the directory. |
Loading Root Directory Stream
The header has information on the location and size of the root directory.
PDB 2.0
Immediately following the header structure (after the Reserved field) are the block IDs that make up the root directory stream. These IDs are 16-bit in size. Remember, the root directory blocks are not contiguous!
Blocks
+-----+
| | ---+ Blocks here...
+-----+ |
| | |
+-----+ |
| | |
+-----+ |
| | |
+-----+ |
| | |
+-----+ |
| | |
+-----+ |
| | <--+ Should lead you to the root directory
+-----+ |
| | <--+
+-----+
The number of blocks used by the root directory can be obtained using a ceiling division (rounding up) with the RootSize and BlockSize. This will be the count of block IDs to read into memory.
For example, a RootSize of 1140 with a BlockSize of 1024 means the Root Directory takes two blocks of 1024 Bytes, at least totaling 2048 Bytes. This means there are two indexes to read after the header, say for example, blocks 299 (0x12b) and 303 (0x12f) will contain the root directory.
PDB 7.0
The block location of the Root Directory is given by DirectoryOffset within the header. To get its offset within the file, you’ll need to multiply this offset by BlockSize. The size of the directory is given by DirectorySize in Bytes. The block containing the block offsets to the root directory stream is the only part of the PDB that is contiguous.
Blocks
+-----+
| | ---+ Header has directory location
+-----+ |
| | |
+-----+ |
| | |
+-----+ |
| | |
+-----+ |
| | |
Directory +-----+ |
blocks +-> | | |
| +-----+ |
+-> | | |
| +-----+ |
+-- | | <--+ Then, you should be here (typically!)
+-----+
Its contents is composed of 32-bit block offsets to load the root directory stream. The number of blocks to read can be obtained with a ceiling division (BlockCount = ⌈DirectorySize ÷ BlockSize⌉). Remember, the root directory blocks are not contiguous!
Free Page Map (PDB 7.0)
With PDB “MSF” 7.0 files, the Free Page Map (FPM) is a series of blocks containing an array of bits indicating which blocks are currently used. There are always two FPM blocks per clusters of BlockSize, but only one is in use. The other block is used as a spare while updating the file content.
The location of the first currently used FPM is specified by the FPM block index from the header. To get a file offset, multiply the BlockSize by the FreeIndex value. For example, a BlockSize value of 4096 being multiplied with a FreeIndex value of 2, means that the first current FPM will be at file location 8192 (4096 × 2).
Blocks
+-----+
| |
+-----+
| | <--- You are here
+-----+
| | <--- Or here, depending on the FPM Index in use
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
The total size of the array in Bytes (BlockCount ÷ 8), assuming machines use 8 bits per byte.
Each bit corresponds to a block index, where the most-significant bit (leftmost) indicates if the first block, block 0, is used. An unset bit (0) means the block is used, a set bit (1) defines an unused block and thus free to be used.
1 2 3
00000000 e0 36 f0 ...
For example, a bit array of 11100000 (0xe0) indicates that the first three blocks are free. Blocks 0 to 2 should not be used, as they contain the Superblock and FPM blocks in this example.
For compatibility reasons, even if a FPM block can contain data for used blocks up to 32768 blocks, assuming a BlockSize of 4096, if a PDB has more blocks than BlockSize, a new set of FPM blocks are defined for the cluster of blocks:
Blocks (Assuming BlockSize=4096 and FreeIndex=1)
+-----+ Block 0
| | <- Superblock
+-----+
| | <- FPM1 (For blocks 0-4095)
+-----+
| | <- FPM2
+-----+
| | <- Data
...
+-----+ Block 4096
| | <- Data
+-----+
| | <- FPM1 (For blocks 4096-8191)
+-----+
| | <- FPM2
+-----+
| | <- Data
...
Loading a Stream
To load a stream, we’ll need to read the content of the root directory.
Blocks
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| | <-+ Stream 1 blocks...
+-----+ |
| | |
+-----+ |
| | <-+ Could be anywhere!
+-----+ |
| | --+ Root directory has information about Stream 1
+-----+
| |
+-----+
Using PDB 2.0 Root Directory
After loading the root directory, it follows a particular structure.
Offset (in Bytes) | Size (in Bytes) | Description |
---|---|---|
0 | 2 | StreamCount: The number of streams that are present in this PDB. |
2 | 2 | Reserved. Should be zero. |
4 | 8 × StreamCount | Structure of two 32-bit (4 Byte) numbers for each stream: StreamSize (in Bytes) and an identification number. The first entry is Stream 0. |
4 + (8 × StreamCount) | 2 × BlockCount | Following the sizes and IDs, an array of 16-bit numbers make up each block offsets this stream uses. The first entry is Stream 0. BlockCount can be obtained using a ceiling division (rounding up): BlockCount = ⌈StreamSize ÷ BlockSize⌉ |
Here is an over-simplified example of a PDB 2.0 Root Directory (assuming BlockSize=1024):
Offset 0 1 2 3 4 5 6 7 8 9 a b c d e f
0 03 00 00 00 22 00 00 00 1c 0f 9c 00 74 04 00 00
10 14 f7 9c 00 64 00 00 00 24 f7 9c 00 22 00 24 00
20 28 00 29 00
- Bytes 0-1: 0x0003
- There are three (3) streams.
- Bytes 2-3: 0
- Reserved. Usually always zero.
- Bytes 4-7: 0x00000022
- Stream 0 has a size of 34 Bytes. Occupies one block.
- Bytes 8-11: 0x009c0f1c
- Stream 0 has an ID of 0x009c0f1c.
- Bytes 12-15: 0x00000474
- Stream 1 has a size of 1140 Bytes. Two blocks.
- Bytes 16-19: 0x009cf714
- Stream 1 has an ID of 0x009cf714
- Bytes 20-23: 0x00000064
- Stream 2 has a size of 100 Bytes. One block.
- Bytes 24-27: 0x009cf724
- Stream 2 has an ID of 0x009cf724
- Stream 0 uses these blocks:
- Bytes 28-29: 0x0022 (34)
- Stream 1 uses these blocks:
- Bytes 30-31: 0x0024 (36)
- Bytes 32-33: 0x0028 (40)
- Stream 2 uses these blocks:
- Bytes 34-35: 0x0029 (41)
In this example, to load Stream 1, because its size is 1140 Bytes, it takes two blocks, we will need to read blocks 36 and 40. Getting the file offset to the first block require you to multiply the BlockSize by the Block Offset (36 × 1024 = 36864 Bytes into PDB).
Block values of 0xffff (for 1K and 2K pages) and 0x7fff (and higher, for 4K pages) are considered unused. Anything lower than these values are valid block offsets, as long as they do not go over the total amount of blocks.
Using PDB 7.0 Root Directory
After loading the root directory, it follows a particular structure.
Offset (in Bytes) | Size (in Bytes) | Description |
---|---|---|
0 | 4 | StreamCount: The number of streams that are present in this PDB. |
4 | 4 × StreamCount | For each stream, a 32-bit value for its size, in Bytes. The first entry is Stream 0. |
4 + (4 × StreamCount) | 4 × BlockCount | Following the stream sizes, an array of 32-bit numbers make up each block offsets this stream uses. The first entry is Stream 0. BlockCount can be obtained using a ceiling division (rounding up): BlockCount = ⌈StreamSize ÷ BlockSize⌉ |
Here is an over-simplified example of a PDB 7.0 Root Directory (assuming BlockSize=4096):
Offset 0 1 2 3 4 5 6 7 8 9 a b c d e f
0 04 00 00 00 30 00 00 00 4a 15 00 00 64 00 00 00
10 5a 34 00 00 22 00 00 00 23 00 00 00 25 00 00 00
20 26 00 00 00 28 00 00 00 29 00 00 00 2d 00 00 00
30 2e 00 00 00
- Bytes 0-3: 0x00000004
- There are four (4) streams.
- Bytes 4-7: 0x00000030
- Stream 0 is 48 Bytes in size. Occupies one block.
- Bytes 8-11: 0x0000154a
- Stream 1 is 5540 Bytes in size. Two blocks.
- Bytes 12-15: 0x00000064
- Stream 2 is 100 Bytes in size. One block.
- Bytes 16-19: 0x0000345a
- Stream 3 is 13402 Bytes in size. Four blocks.
- Stream 0 uses these blocks:
- Bytes 20-23: 0x00000022 (34)
- Stream 1 uses these blocks:
- Bytes 24-27: 0x00000023 (35)
- Bytes 28-31: 0x00000025 (37)
- Stream 2 uses these blocks:
- Bytes 32-35: 0x00000026 (38)
- Stream 3 uses these blocks:
- Bytes 36-39: 0x00000028 (40)
- Bytes 40-43: 0x00000029 (41)
- Bytes 44-47: 0x0000002d (45)
- Bytes 48-51: 0x0000002e (46)
Stream 0
Stream 0 does not particularly contains interesting data. It usually contains a copy of the previous stream directory. We can safely skip it.
Stream 1
Finally, after reading the blocks for Stream 1… We hopefully now have its content!
Stream 1 for both versions is the PDB Information stream.
PDB 2.0 Stream 1 Example
Offset 0 1 2 3 4 5 6 7 8 9 a b c d e f
0 f3 91 30 01 7e c3 a1 65 00 00 00 00 00 00 00 00
10 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
20 00 00 00 00
- Bytes 0-3: 0x013091f3
- Version. 19960307 means VC50 (MSVC 5.0), matching the output of a Visual Studio 97 compiler.
- Bytes 4-7: 0x65a1c37e
- Signature?
PDB 7.0 Stream 1 Example
Offset 0 1 2 3 4 5 6 7 8 9 a b c d e f
0 94 2e 31 01 3d e8 84 67 01 00 00 00 a9 e2 a6 06
10 2d 57 59 45 94 06 16 99 f4 5c a0 92 57 00 00 00
20 2f 4c 69 6e 6b 49 6e 66 6f 00 2f 54 4d 43 61 63
30 68 65 00 2f 6e 61 6d 65 73 00 2f 73 72 63 2f 68
40 65 61 64 65 72 62 6c 6f 63 6b 00 2f 55 44 54 53
50 52 43 4c 49 4e 45 55 4e 44 4f 4e 45 00 73 6f 75
60 72 63 65 6c 69 6e 6b 24 31 00 73 6f 75 72 63 65
70 6c 69 6e 6b 24 31 00 06 00 00 00 0a 00 00 00 01
80 00 00 00 6f 00 00 00 01 00 00 00 00 00 00 00 2b
90 00 00 00 62 00 00 00 1a 00 00 00 60 00 00 00 00
a0 00 00 00 05 00 00 00 0a 00 00 00 06 00 00 00 13
b0 00 00 00 07 00 00 00 4a 00 00 00 65 00 00 00 00
c0 00 00 00 dc 51 33 01
- Bytes 0-3: 0x01312e94
- Version. 20000404 means VC70 (MSVC 7.0), matching the output of a Visual Studio 2022 compiler. Later version identifiers exist, but here it is used for compatibility purposes.
- Bytes 4-7: 0x6784e83d
- Signature?
- Bytes 8-11: 0x00000001
- Age. Incremental counter. I disabled incremental compilations, so it will likely stay as 1.
- Bytes 12-27: 4559572D-0694-9916-5CF4-A0925700000.
- Identification GUID used to match the PDB debugging entry in a PE32 executable.
And what follows looks like a string table, followed by a series of 32-bit offsets.
Furthermore
PDB files have these reserved fixed stream numbers:
- Stream 0: Previous stream directory.
- Stream 1: PDB information.
- Stream 2: TPI Stream.
- Stream 3: DBI Stream.
- Stream 4: IPI Stream.
- Stream 7: Public symbols (at least for PDB 2.0).
The rest of the streams typically hold object data used during compilation.
Stay tuned, we’ll explore some of these streams eventually!