TeachMeBitcoin

Disk Serialization of Block Data

From TeachMeBitcoin, the free encyclopedia ⏱️ 3 min read

Disk Serialization of Blocks: Raw blk*.dat File Delimiters

When a Bitcoin full node validates and saves the blockchain ledger, it writes raw block data directly to non-volatile disk storage. These files are saved in the node's datadir under blocks/blk*.dat (e.g., blk00000.dat, blk00001.dat).

To parse, read, and recover blocks from these raw database files, Bitcoin Core utilizes Magic Bytes as serialized disk-level delimiters.


💾 1. The Structure of a blk*.dat File

A blk*.dat file contains multiple validated blocks written sequentially. Each raw block entry is structured as a 3-field sequential packet:

                          DISK BLOCK SERIALIZATION LAYOUT

  Bytes: 0-3                     4-7                        8 - ?
 ┌──────────────────────────────┬──────────────────────────┬────────────────────────┐
 │ Magic Bytes (Network Magic)   │ Block Size (uint32)      │ Raw Block Payload      │
 │ 4 Bytes (e.g. 0xF9BEB4D9)    │ 4 Bytes (Little Endian)  │ S Bytes (BIP-152 schema)│
 └──────────────────────────────┴──────────────────────────┴────────────────────────┘

Detailed Field Breakdown:


🏗️ 2. Step-by-Step Disk Offset Layout

If we open a standard blk00000.dat file in a hex editor, we can view the sequential block streams:

F9 BE B4 D9   A0 5C 01 00   01 00 00 00 ... [Raw Genesis Block Data]
F9 BE B4 D9   B4 5D 01 00   01 00 00 00 ... [Raw Block 1 Data]

Unpacking the Genesis Block Offset:

  1. F9 BE B4 D9 (Bytes 0-3): Mainnet Magic Bytes.
  2. A0 5C 01 00 (Bytes 4-7): Block size value.
    • Unpacked Little Endian representation: 0x00015CA0.
    • Decimal Value: 89,248 bytes.
  3. Bytes 8 to 89,255: The actual serialized Genesis block data ($89,248$ bytes payload).
  4. Byte 89,256: The stream transitions instantly to F9 BE B4 D9, indicating the beginning of Block 1.

🛠️ 3. "File Carving" Database Recovery

Why store magic bytes in disk files if the node already maintains an index database (blocks/index built using LevelDB)?

The Corruption Recovery Scenario

If a node experiences a sudden power loss or system crash: 1. The LevelDB index database mapping block hashes to file offsets may become corrupted or desynchronized. 2. Rather than forcing the node to download the entire blockchain history ($500+\text{ GB}$) over the internet again, Bitcoin Core triggers a recovery process called Block File Carving (or Reindexing). 3. The node scans the blk*.dat files sequentially from byte 0. 4. It scans for the magic byte sentinel 0xF9BEB4D9. 5. Once located, it reads the next 4 bytes to determine block size, extracts the block payload, recalculates the block hash, and automatically rebuilds the LevelDB index database entirely from local disk. 6. This recovery is launched via the -reindex configuration flag: bash bitcoind -reindex This robust design makes Bitcoin Core extremely resilient to disk-level data losses.

☕ Help support TeachMeBitcoin

TeachMeBitcoin is an ad-free, open-source educational repository curated by a passionate team of Bitcoin researchers and educators for public benefit. If you found our articles helpful, please consider supporting our hosting and ongoing content updates with a clean donation:

Ethereum: 0x578417C51783663D8A6A811B3544E1f779D39A85
Bitcoin: bc1q77k9e95rn669kpzyjr8ke9w95zhk7pa5s63qzz
Solana: 4ycT2ayqeMucixj3wS8Ay8Tq9NRDYRPKYbj3UGESyQ4J
Address copied to clipboard!