Disk Serialization of Block Data
Disk Serialization of Blocks: Raw blk*.dat File Delimiters
When a Bitcoin full node validates and saves the blockchain ledger, it writes raw block data directly to non-volatile disk storage. These files are saved in the node's datadir under blocks/blk*.dat (e.g., blk00000.dat, blk00001.dat).
To parse, read, and recover blocks from these raw database files, Bitcoin Core utilizes Magic Bytes as serialized disk-level delimiters.
💾 1. The Structure of a blk*.dat File
A blk*.dat file contains multiple validated blocks written sequentially. Each raw block entry is structured as a 3-field sequential packet:
DISK BLOCK SERIALIZATION LAYOUT
Bytes: 0-3 4-7 8 - ?
┌──────────────────────────────┬──────────────────────────┬────────────────────────┐
│ Magic Bytes (Network Magic) │ Block Size (uint32) │ Raw Block Payload │
│ 4 Bytes (e.g. 0xF9BEB4D9) │ 4 Bytes (Little Endian) │ S Bytes (BIP-152 schema)│
└──────────────────────────────┴──────────────────────────┴────────────────────────┘
Detailed Field Breakdown:
- Magic Bytes (4 Bytes): Identifies the network type. For standard Mainnet, this is
0xF9BEB4D9. This prefix acts as the sentinel block separator. - Block Size (4 Bytes): A 32-bit unsigned little-endian integer outlining the exact size in bytes of the following raw block payload.
- Raw Block Payload: The serialized block header (80 bytes), followed by the transaction list (preceded by a CompactSize transaction counter).
🏗️ 2. Step-by-Step Disk Offset Layout
If we open a standard blk00000.dat file in a hex editor, we can view the sequential block streams:
F9 BE B4 D9 A0 5C 01 00 01 00 00 00 ... [Raw Genesis Block Data]
F9 BE B4 D9 B4 5D 01 00 01 00 00 00 ... [Raw Block 1 Data]
Unpacking the Genesis Block Offset:
F9 BE B4 D9(Bytes 0-3): Mainnet Magic Bytes.A0 5C 01 00(Bytes 4-7): Block size value.- Unpacked Little Endian representation:
0x00015CA0. - Decimal Value:
89,248bytes.
- Unpacked Little Endian representation:
- Bytes 8 to 89,255: The actual serialized Genesis block data ($89,248$ bytes payload).
- Byte 89,256: The stream transitions instantly to
F9 BE B4 D9, indicating the beginning of Block 1.
🛠️ 3. "File Carving" Database Recovery
Why store magic bytes in disk files if the node already maintains an index database (blocks/index built using LevelDB)?
The Corruption Recovery Scenario
If a node experiences a sudden power loss or system crash:
1. The LevelDB index database mapping block hashes to file offsets may become corrupted or desynchronized.
2. Rather than forcing the node to download the entire blockchain history ($500+\text{ GB}$) over the internet again, Bitcoin Core triggers a recovery process called Block File Carving (or Reindexing).
3. The node scans the blk*.dat files sequentially from byte 0.
4. It scans for the magic byte sentinel 0xF9BEB4D9.
5. Once located, it reads the next 4 bytes to determine block size, extracts the block payload, recalculates the block hash, and automatically rebuilds the LevelDB index database entirely from local disk.
6. This recovery is launched via the -reindex configuration flag:
bash
bitcoind -reindex
This robust design makes Bitcoin Core extremely resilient to disk-level data losses.
TeachMeBitcoin is an ad-free, open-source educational repository curated by a passionate team of Bitcoin researchers and educators for public benefit. If you found our articles helpful, please consider supporting our hosting and ongoing content updates with a clean donation: