Disk Serialization of Blocks: Raw blk*.dat File Delimiters
Disk Serialization of Blocks: Raw blk*.dat File Delimiters
When a Bitcoin full node validates and saves the blockchain ledger, it writes raw block data directly to non-volatile disk storage. These files are saved in the node's datadir under blocks/blk*.dat (e.g., blk00000.dat, blk00001.dat).
To parse, read, and recover blocks from these raw database files, Bitcoin Core utilizes Magic Bytes as serialized disk-level delimiters.
1. The Structure of a blk*.dat File
A blk*.dat file contains multiple validated blocks written sequentially. Each raw block entry is structured as a 3-field sequential packet:
DISK BLOCK SERIALIZATION LAYOUT
Bytes: 0-3 4-7 8 - ?
┌──────────────────────────────┬──────────────────────────┬────────────────────────┐
│ Magic Bytes (Network Magic) │ Block Size (uint32) │ Raw Block Payload │
│ 4 Bytes (e.g. 0xF9BEB4D9) │ 4 Bytes (Little Endian) │ S Bytes (BIP-152 schema)│
└──────────────────────────────┴──────────────────────────┴────────────────────────┘
Detailed Field Breakdown:
-
Magic Bytes (4 Bytes): Identifies the network type. For standard Mainnet, this is
0xF9BEB4D9. This prefix acts as the sentinel block separator. -
Block Size (4 Bytes): A 32-bit unsigned little-endian integer outlining the exact size in bytes of the following raw block payload.
-
Raw Block Payload: The serialized block header (80 bytes), followed by the transaction list (preceded by a CompactSize transaction counter).
️ 2. Step-by-Step Disk Offset Layout
If we open a standard blk00000.dat file in a hex editor, we can view the sequential block streams:
F9 BE B4 D9 A0 5C 01 00 01 00 00 00 ... [Raw Genesis Block Data]
F9 BE B4 D9 B4 5D 01 00 01 00 00 00 ... [Raw Block 1 Data]
Unpacking the Genesis Block Offset:
-
F9 BE B4 D9(Bytes 0-3): Mainnet Magic Bytes. -
A0 5C 01 00(Bytes 4-7): Block size value. - Unpacked Little Endian representation:
0x00015CA0. -
Decimal Value:
89,248bytes. -
Bytes 8 to 89,255: The actual serialized Genesis block data ($89,248$ bytes payload).
-
Byte 89,256: The stream transitions instantly to
F9 BE B4 D9, indicating the beginning of Block 1.
️ 3. "File Carving" Database Recovery
Why store magic bytes in disk files if the node already maintains an index database (blocks/index built using LevelDB)?
The Corruption Recovery Scenario
If a node experiences a sudden power loss or system crash:
-
The LevelDB index database mapping block hashes to file offsets may become corrupted or desynchronized.
-
Rather than forcing the node to download the entire blockchain history ($500+\text{ GB}$) over the internet again, Bitcoin Core triggers a recovery process called Block File Carving (or Reindexing).
-
The node scans the
blk*.datfiles sequentially from byte 0. -
It scans for the magic byte sentinel
0xF9BEB4D9. -
Once located, it reads the next 4 bytes to determine block size, extracts the block payload, recalculates the block hash, and automatically rebuilds the LevelDB index database entirely from local disk.
-
This recovery is launched via the
-reindexconfiguration flag:bash bitcoind -reindexThis robust design makes Bitcoin Core extremely resilient to disk-level data losses.
TeachMeBitcoin is an ad-free, open-source educational repository curated by a passionate team of Bitcoin researchers and educators for public benefit. If you found our articles helpful, please consider supporting our hosting and ongoing content updates with a clean donation: