Decoding Witness Data from Raw Transactions
11. Decoding Witness Data from Raw Transactions
What Is Witness Data?
Introduced in BIP141 (Segregated Witness), the witness is a separate data structure attached to each transaction input. It was moved outside the traditional transaction serialization to fix transaction malleability and enable more efficient validation.
The witness is a stack of byte vectors — literally a list of data items that serve as the unlocking data for SegWit inputs. For non-SegWit inputs, the witness is an empty list.
Raw Transaction Format with Witness
SegWit transactions use a new serialization format (identified by the segwit marker and flag bytes):
[version: 4 bytes]
[marker: 0x00]
[flag: 0x01]
[input count: varint]
[inputs]
[output count: varint]
[outputs]
[witness data for each input]
[locktime: 4 bytes]
The witness section is NOT hashed for txid computation. The txid uses the legacy format. The wtxid (witness txid) hashes the full SegWit serialization.
Parsing Witness Data
def parse_witness(data: bytes, offset: int) -> tuple:
"""Parse witness stack starting at offset. Returns (witness_items, new_offset)."""
item_count = data[offset]
offset += 1
items = []
for _ in range(item_count):
item_len = data[offset]
offset += 1
# Handle multi-byte varints
if item_len == 0xfd:
item_len = int.from_bytes(data[offset:offset+2], 'little')
offset += 2
item = data[offset:offset+item_len].hex()
items.append(item)
offset += item_len
return items, offset
def decode_segwit_tx(raw_hex: str):
data = bytes.fromhex(raw_hex)
offset = 0
version = int.from_bytes(data[offset:offset+4], 'little')
offset += 4
is_segwit = (data[offset] == 0x00 and data[offset+1] == 0x01)
if is_segwit:
offset += 2 # skip marker and flag
# Parse inputs
input_count = data[offset]; offset += 1
inputs = []
for _ in range(input_count):
txid = data[offset:offset+32][::-1].hex(); offset += 32
vout = int.from_bytes(data[offset:offset+4], 'little'); offset += 4
scriptsig_len = data[offset]; offset += 1
scriptsig = data[offset:offset+scriptsig_len].hex(); offset += scriptsig_len
sequence = int.from_bytes(data[offset:offset+4], 'little'); offset += 4
inputs.append({"txid": txid, "vout": vout, "scriptsig": scriptsig})
# Parse outputs (simplified)
output_count = data[offset]; offset += 1
for _ in range(output_count):
value = int.from_bytes(data[offset:offset+8], 'little'); offset += 8
spk_len = data[offset]; offset += 1
offset += spk_len
# Parse witness
witnesses = []
if is_segwit:
for i in range(input_count):
witness_items, offset = parse_witness(data, offset)
witnesses.append(witness_items)
return {"version": version, "inputs": inputs, "witnesses": witnesses}
P2WPKH Witness Structure
For a P2WPKH input, the witness stack has exactly two items:
witness[0]: <DER signature + sighash byte>
witness[1]: <compressed public key (33 bytes)>
P2WSH Witness Structure
For a P2WSH input spending a 2-of-3 multisig:
witness[0]: "" (empty — same off-by-one bug as bare multisig)
witness[1]: <sig1>
witness[2]: <sig2>
witness[3]: <witness script (the serialized 2-of-3 multisig script)>
Taproot Witness Structures
Key-path spend:
witness[0]: <64-byte Schnorr signature>
Script-path spend:
witness[0]: <data for script execution>
witness[1]: <tapscript>
witness[2]: <control block: version + parity byte + internal key + merkle path>
The control block encodes the path through the Taproot merkle tree (MAST structure) proving that the tapscript is one of the committed spending conditions.
Pro Tip
When debugging scripts, always start with a high-level disassembly before diving into the stack trace. Tools like bitcoin-cli decodescript are your first line of defense in identifying standard script patterns.
TeachMeBitcoin is an ad-free, open-source educational repository curated by a passionate team of Bitcoin researchers and educators for public benefit. If you found our articles helpful, please consider supporting our hosting and ongoing content updates with a clean donation: