Parsing Raw blk.dat
Parsing a Raw blk.dat Record
To understand how Bitcoin data is read from disk, let's look at the hexadecimal representation of a real entry at the start of a blk00000.dat file.
1. The Raw Hex
Imagine we open blk00000.dat in a hex editor. The first few bytes look like this:
F9 BE B4 D9 1D 01 00 00 01 00 00 00 ...
2. Breaking it Down
Part A: Magic Bytes
F9 BE B4 D9
-
This is the Mainnet Magic Value.
-
Note that it is stored in little-endian order. The value is
0xD9B4BEF9.
Part B: Block Size
1D 01 00 00
-
This is the size of the following block data in little-endian.
-
In Big-Endian:
00 00 01 1D -
Decimal: 285 Bytes.
-
This tells the parser: "Read the next 285 bytes as the block data."
Part C: Block Header (First 4 Bytes)
01 00 00 00
-
This is the Version field of the block header.
-
Value: Version 1.
3. The VarInt (Transaction Count)
After the 80-byte header, the parser encounters a Variable Length Integer representing how many transactions are in the block.
- In the Genesis Block (and early blocks), this is usually
01(one transaction, the coinbase).
4. The Data Stream
Because the "Block Size" was defined at the start, the node can use a streaming parser. It doesn't need to load the whole file into RAM. It reads the 8-byte prefix, then reads the specified number of bytes into a buffer, processes it, and then looks for the next F9 BE B4 D9 marker.
| Byte Offset | Data Type | Meaning |
|---|---|---|
| 0 - 3 | uint32 | Magic Bytes (0xD9B4BEF9) |
| 4 - 7 | uint32 | Block Size (N) |
| 8 - 87 | 80 Bytes | Block Header |
| 88 - ? | VarInt | Transaction Count |
| ? - End | Raw Data | Serialized Transactions |
In the final section, we will build a Python blk.dat Parser to extract headers from your own local blockchain files.
TeachMeBitcoin is an ad-free, open-source educational repository curated by a passionate team of Bitcoin researchers and educators for public benefit. If you found our articles helpful, please consider supporting our hosting and ongoing content updates with a clean donation: