TeachMeBitcoin

Parsing Raw blk.dat

From TeachMeBitcoin, the free encyclopedia Reading time: 2 min

Parsing a Raw blk.dat Record

To understand how Bitcoin data is read from disk, let's look at the hexadecimal representation of a real entry at the start of a blk00000.dat file.

1. The Raw Hex

Imagine we open blk00000.dat in a hex editor. The first few bytes look like this:

F9 BE B4 D9 1D 01 00 00 01 00 00 00 ...

2. Breaking it Down

Part A: Magic Bytes

F9 BE B4 D9

Part B: Block Size

1D 01 00 00

Part C: Block Header (First 4 Bytes)

01 00 00 00

3. The VarInt (Transaction Count)

After the 80-byte header, the parser encounters a Variable Length Integer representing how many transactions are in the block.

4. The Data Stream

Because the "Block Size" was defined at the start, the node can use a streaming parser. It doesn't need to load the whole file into RAM. It reads the 8-byte prefix, then reads the specified number of bytes into a buffer, processes it, and then looks for the next F9 BE B4 D9 marker.

Byte Offset Data Type Meaning
0 - 3 uint32 Magic Bytes (0xD9B4BEF9)
4 - 7 uint32 Block Size (N)
8 - 87 80 Bytes Block Header
88 - ? VarInt Transaction Count
? - End Raw Data Serialized Transactions

In the final section, we will build a Python blk.dat Parser to extract headers from your own local blockchain files.

☕ Help support TeachMeBitcoin

TeachMeBitcoin is an ad-free, open-source educational repository curated by a passionate team of Bitcoin researchers and educators for public benefit. If you found our articles helpful, please consider supporting our hosting and ongoing content updates with a clean donation:

Ethereum: 0x578417C51783663D8A6A811B3544E1f779D39A85
Bitcoin: bc1q77k9e95rn669kpzyjr8ke9w95zhk7pa5s63qzz
Solana: 4ycT2ayqeMucixj3wS8Ay8Tq9NRDYRPKYbj3UGESyQ4J
Address copied to clipboard!