Hashing and Data Integrity
Hashing and Data Integrity¶
Previous: 02. Cryptography Basics | Next: 04. TLS/SSL and Public Key Infrastructure
Hash functions are the workhorses of modern security. They appear everywhere: password storage, digital signatures, message authentication, blockchain, software distribution, and data deduplication. This lesson covers hash functions in depth, from the mathematical properties that make them useful to practical implementations of password hashing, HMACs, and Merkle trees.
Difficulty: βββ
Learning Objectives: - Understand the properties of cryptographic hash functions - Implement hashing with SHA-256, SHA-3, BLAKE2, and BLAKE3 - Store passwords securely using bcrypt, scrypt, and Argon2 - Construct and verify HMACs for message authentication - Build and understand Merkle trees - Implement content-addressable storage - Recognize and prevent timing attacks using constant-time comparison
Table of Contents¶
- What Is a Hash Function?
- Cryptographic Hash Properties
- Hash Function Survey
- Python Hashing with hashlib
- Password Hashing
- HMAC: Message Authentication
- Merkle Trees
- Content-Addressable Storage
- Timing Attacks and Constant-Time Comparison
- Hash Function Attacks and Mitigations
- Exercises
- References
1. What Is a Hash Function?¶
A hash function takes an arbitrary-length input and produces a fixed-length output (the "hash" or "digest"). A cryptographic hash function has additional security properties that make it suitable for security applications.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hash Function β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Input (any size) Hash Function Output (fixed) β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ β
β β
β "Hello" βββββββΆ SHA-256 βββββββΆ 2cf24dba5fb0... β
β (5 bytes) (32 bytes / 256 bits) β
β β
β "Hello World" βββββββΆ SHA-256 βββββββΆ a591a6d40bf4... β
β (11 bytes) (32 bytes / 256 bits) β
β β
β War and Peace βββββββΆ SHA-256 βββββββΆ b28f8b893c45... β
β (~3.2 MB) (32 bytes / 256 bits) β
β β
β Key insight: Regardless of input size, the output is ALWAYS β
β the same fixed size. β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1.1 Non-Cryptographic vs Cryptographic Hashes¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Non-Cryptographic vs Cryptographic Hash Functions β
ββββββββββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ€
β β Non-Cryptographic β Cryptographic β
ββββββββββββββββββββΌβββββββββββββββββββββββΌββββββββββββββββββββββββββββ€
β Purpose β Hash tables, checksumsβ Security applications β
β Speed β Extremely fast β Intentionally slower β
β Collision resist.β Weak / none β Strong (required) β
β Pre-image resist.β Not guaranteed β Strong (required) β
β Examples β CRC32, MurmurHash, β SHA-256, SHA-3, BLAKE2, β
β β xxHash, FNV β BLAKE3 β
β Use cases β Hash maps, checksums,β Passwords, signatures, β
β β deduplication β certificates, HMAC β
ββββββββββββββββββββ΄βββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ€
β RULE: NEVER use a non-cryptographic hash for security purposes. β
β NEVER use a cryptographic hash where speed matters and security β
β does not (e.g., hash tables). β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Cryptographic Hash Properties¶
A secure cryptographic hash function must satisfy three properties:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Three Properties of Cryptographic Hashes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. PRE-IMAGE RESISTANCE (one-way) β
β Given hash h, it is infeasible to find any m such that β
β H(m) = h. β
β β
β Hash βββ³βββΆ Original input β
β (You cannot reverse a hash to get the input) β
β β
β 2. SECOND PRE-IMAGE RESISTANCE (weak collision resistance) β
β Given mβ, it is infeasible to find mβ β mβ such that β
β H(mβ) = H(mβ). β
β β
β "Hello" β abc123 Cannot find another input β abc123 β
β (Cannot find a second input with the same hash) β
β β
β 3. COLLISION RESISTANCE (strong collision resistance) β
β It is infeasible to find ANY two distinct messages mβ β mβ β
β such that H(mβ) = H(mβ). β
β β
β Cannot find ANY pair of inputs with the same hash β
β (This is strictly stronger than second pre-image resistance) β
β β
β Security levels (for an n-bit hash): β
β β’ Pre-image: O(2^n) work (birthday paradox does not apply) β
β β’ 2nd pre-image: O(2^n) work β
β β’ Collision: O(2^(n/2)) work (birthday paradox) β
β β
β This is why SHA-256 provides 128-bit collision resistance β
β (2^128 β 3.4 Γ 10^38 operations). β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2.1 The Avalanche Effect¶
A good hash function exhibits the avalanche effect: a tiny change in input produces a drastically different output.
import hashlib
# Demonstrate the avalanche effect
msg1 = b"Hello, World!"
msg2 = b"Hello, World?" # Changed '!' to '?'
msg3 = b"Hello, World! " # Added a space
hash1 = hashlib.sha256(msg1).hexdigest()
hash2 = hashlib.sha256(msg2).hexdigest()
hash3 = hashlib.sha256(msg3).hexdigest()
print("Avalanche Effect Demonstration (SHA-256):")
print(f" '{msg1.decode()}' β {hash1}")
print(f" '{msg2.decode()}' β {hash2}")
print(f" '{msg3.decode()}'β {hash3}")
# Count differing bits
def bit_difference(hex1: str, hex2: str) -> tuple:
"""Count the number of differing bits between two hex strings."""
b1 = int(hex1, 16)
b2 = int(hex2, 16)
xor = b1 ^ b2
diff_bits = bin(xor).count('1')
total_bits = len(hex1) * 4
return diff_bits, total_bits
diff, total = bit_difference(hash1, hash2)
print(f"\n Bits changed (1 char diff): {diff}/{total} "
f"({diff/total*100:.1f}%)")
diff2, _ = bit_difference(hash1, hash3)
print(f" Bits changed (space added): {diff2}/{total} "
f"({diff2/total*100:.1f}%)")
# Ideal: ~50% of bits differ for any change
print(f" Expected for random: ~{total//2} bits ({50.0}%)")
3. Hash Function Survey¶
3.1 Hash Function Comparison¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hash Function Comparison Table β
ββββββββββββββ¬βββββββββ¬ββββββββββββ¬βββββββββββ¬βββββββββββββββββββββββββ€
β Algorithm β Output β Speed β Status β Notes β
β β (bits) β (GB/s)* β β β
ββββββββββββββΌβββββββββΌββββββββββββΌβββββββββββΌβββββββββββββββββββββββββ€
β MD5 β 128 β ~5.0 β BROKEN β Collisions found 2004 β
β SHA-1 β 160 β ~3.0 β BROKEN β SHAttered attack 2017 β
β SHA-256 β 256 β ~1.5 β SECURE β Most widely used β
β SHA-512 β 512 β ~2.0β β SECURE β Faster on 64-bit CPUs β
β SHA-3-256 β 256 β ~0.5 β SECURE β Different construction β
β BLAKE2b β 256 β ~3.5 β SECURE β Faster than SHA-256 β
β BLAKE2s β 256 β ~2.0 β SECURE β Optimized for 32-bit β
β BLAKE3 β 256 β ~10.0 β SECURE β Parallelizable, newest β
ββββββββββββββ΄βββββββββ΄ββββββββββββ΄βββββββββββ΄βββββββββββββββββββββββββ€
β * Approximate throughput on modern x86-64 with hardware support β
β β SHA-512 is faster than SHA-256 on 64-bit processors β
β β
β Recommendation: SHA-256 for interoperability, BLAKE2b/BLAKE3 for β
β performance-sensitive applications β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.2 SHA-2 Family¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SHA-2 Family β
ββββββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬ββββββββββββββββββββββ€
β Variant β Output β Block β Word β Rounds β
β β (bits) β (bits) β (bits) β β
ββββββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌββββββββββββββββββββββ€
β SHA-224 β 224 β 512 β 32 β 64 β
β SHA-256 β 256 β 512 β 32 β 64 β
β SHA-384 β 384 β 1024 β 64 β 80 β
β SHA-512 β 512 β 1024 β 64 β 80 β
β SHA-512/256 β 256 β 1024 β 64 β 80 β
ββββββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ΄ββββββββββββββββββββββ€
β SHA-256 and SHA-512 are the most commonly used. β
β SHA-512/256 gives SHA-256 output size with SHA-512's speed on β
β 64-bit processors and is not vulnerable to length-extension. β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.3 SHA-3 (Keccak)¶
SHA-3 uses the sponge construction, which is fundamentally different from SHA-2's Merkle-Damgard construction. This diversity is valuable: if SHA-2 is ever broken, SHA-3 will likely remain secure.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SHA-3 Sponge Construction (Simplified) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β State = [0...0] (1600 bits for Keccak) β
β β
β ABSORBING PHASE: β
β βββββββββ XOR βββββββββ XOR βββββββββ β
β β msgβ ββββββββΆ f βββββββΆβ msgβ ββββββββΆ f βββββββΆ ... β
β βββββββββ State βββββββββ State βββββββββ β
β β
β SQUEEZING PHASE: β
β ... βββΆ f βββΆ [outputβ] βββΆ f βββΆ [outputβ] βββΆ ... β
β β
β f = Keccak permutation (5 sub-rounds Γ 24 rounds) β
β β
β Key advantage: NOT vulnerable to length-extension attacks β
β (unlike SHA-256 / SHA-512) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.4 BLAKE2 and BLAKE3¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BLAKE2 and BLAKE3 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β BLAKE2 (RFC 7693): β
β βββ BLAKE2b: Optimized for 64-bit platforms (up to 64 bytes) β
β βββ BLAKE2s: Optimized for 32-bit platforms (up to 32 bytes) β
β βββ Built-in keying (can act as a MAC without HMAC) β
β βββ Built-in personalization and salt support β
β βββ Tree hashing mode for parallelism β
β βββ Used in: Argon2 password hash, WireGuard, libsodium β
β β
β BLAKE3 (2020): β
β βββ Based on BLAKE2 but redesigned for speed β
β βββ Merkle tree structure (inherently parallelizable) β
β βββ Single algorithm for hash, MAC, KDF, XOF β
β βββ 256-bit output (extendable) β
β βββ ~10x faster than SHA-256 on modern CPUs β
β βββ Used in: Bao (verified streaming), many new projects β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4. Python Hashing with hashlib¶
4.1 Basic Hashing¶
import hashlib
message = b"The quick brown fox jumps over the lazy dog"
# SHA-256 (most common)
sha256 = hashlib.sha256(message).hexdigest()
print(f"SHA-256: {sha256}")
# SHA-512
sha512 = hashlib.sha512(message).hexdigest()
print(f"SHA-512: {sha512[:64]}...{sha512[-8:]}")
# SHA-3-256
sha3_256 = hashlib.sha3_256(message).hexdigest()
print(f"SHA-3-256: {sha3_256}")
# BLAKE2b (variable output size, up to 64 bytes)
blake2b = hashlib.blake2b(message, digest_size=32).hexdigest()
print(f"BLAKE2b: {blake2b}")
# BLAKE2s (variable output size, up to 32 bytes)
blake2s = hashlib.blake2s(message, digest_size=32).hexdigest()
print(f"BLAKE2s: {blake2s}")
# List all available algorithms
print(f"\nAvailable: {sorted(hashlib.algorithms_available)[:10]}...")
4.2 Incremental Hashing (Streaming)¶
For large files, you should hash data incrementally rather than loading everything into memory:
import hashlib
from pathlib import Path
def hash_file(filepath: str, algorithm: str = "sha256",
chunk_size: int = 8192) -> str:
"""Hash a file incrementally without loading it all into memory."""
hasher = hashlib.new(algorithm)
with open(filepath, 'rb') as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
hasher.update(chunk)
return hasher.hexdigest()
# Create a test file
test_file = "/tmp/test_hash_file.bin"
with open(test_file, 'wb') as f:
for i in range(1000):
f.write(f"Line {i}: some data for hashing\n".encode())
# Hash it
sha256_hash = hash_file(test_file, "sha256")
sha3_hash = hash_file(test_file, "sha3_256")
blake2_hash = hash_file(test_file, "blake2b")
print(f"File size: {Path(test_file).stat().st_size:,} bytes")
print(f"SHA-256: {sha256_hash}")
print(f"SHA-3-256: {sha3_hash}")
print(f"BLAKE2b: {blake2_hash}")
# Equivalent to reading all at once (but memory-efficient)
data = Path(test_file).read_bytes()
assert hashlib.sha256(data).hexdigest() == sha256_hash
print("\nIncremental hash matches full-read hash: OK")
# Clean up
Path(test_file).unlink()
4.3 BLAKE2 with Key (Keyed Hashing)¶
BLAKE2 has built-in support for keyed hashing, making it usable as a MAC without the HMAC construction:
import hashlib
import os
# BLAKE2b with key (acts as a MAC)
key = os.urandom(32) # 256-bit key
message = b"Authenticate this message"
# Keyed hash
mac = hashlib.blake2b(message, key=key, digest_size=32).hexdigest()
print(f"BLAKE2b keyed hash: {mac}")
# Verify
verification = hashlib.blake2b(message, key=key, digest_size=32).hexdigest()
print(f"Verification match: {mac == verification}")
# With personalization (domain separation)
mac1 = hashlib.blake2b(
b"shared data",
key=key,
person=b"payment-v1", # Max 16 bytes
digest_size=32
).hexdigest()
mac2 = hashlib.blake2b(
b"shared data",
key=key,
person=b"session-v1", # Different domain
digest_size=32
).hexdigest()
print(f"\nSame data, different personalization:")
print(f" payment context: {mac1[:32]}...")
print(f" session context: {mac2[:32]}...")
print(f" Same? {mac1 == mac2}") # False - different domains
4.4 BLAKE3¶
# pip install blake3
import blake3
message = b"BLAKE3 is extremely fast"
# Basic hash
digest = blake3.blake3(message).hexdigest()
print(f"BLAKE3: {digest}")
# Keyed hash (for MAC)
key = b"0" * 32 # 32-byte key required
keyed = blake3.blake3(message, key=key).hexdigest()
print(f"BLAKE3 keyed: {keyed}")
# Key derivation
derived = blake3.blake3(
b"input key material",
derive_key_context="my-app 2026-01-15 encryption key"
).hexdigest()
print(f"BLAKE3 KDF: {derived}")
# Incremental hashing
hasher = blake3.blake3()
hasher.update(b"Hello, ")
hasher.update(b"World!")
print(f"BLAKE3 incremental: {hasher.hexdigest()}")
# Extendable output (XOF) - get any number of bytes
digest_64 = blake3.blake3(message).hexdigest(length=64)
print(f"BLAKE3 64-byte: {digest_64}")
5. Password Hashing¶
Password hashing is fundamentally different from general-purpose hashing. Passwords have low entropy (humans choose predictable passwords), so an attacker who obtains password hashes can try to brute-force them. Password hashing algorithms are designed to be deliberately slow to make brute-force attacks impractical.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Why General-Purpose Hashes Are BAD for Passwords β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β SHA-256 speed on modern GPU: ~10 billion hashes/second β
β β
β Password "P@ssw0rd": β
β SHA-256 β d74ff0ee8da3b9806b18c877d... β
β Time to brute-force 8-char password: seconds to minutes β
β β
β bcrypt (cost=12): β
β bcrypt β $2b$12$LJ3m4ys3Tdb2vNQhk9Oy... β
β Time to brute-force: years to centuries β
β β
β KEY DIFFERENCES: β
β β’ Password hashes include a random SALT (prevents rainbow tables) β
β β’ Password hashes have a tunable WORK FACTOR (adapts to hardware) β
β β’ Password hashes may require large MEMORY (defeats GPU attacks) β
β β
β NEVER hash passwords with SHA-256, MD5, or any general hash! β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5.1 bcrypt¶
# pip install bcrypt
import bcrypt
import time
password = b"correct-horse-battery-staple"
# Hash a password
# bcrypt automatically generates a random 16-byte salt
# cost=12 means 2^12 = 4096 iterations
start = time.time()
hashed = bcrypt.hashpw(password, bcrypt.gensalt(rounds=12))
elapsed = time.time() - start
print(f"Password: {password.decode()}")
print(f"Hash: {hashed.decode()}")
print(f"Time: {elapsed:.3f}s")
print()
# Anatomy of a bcrypt hash:
# $2b$12$LJ3m4ys3Tdb2vNQhk9OyAeKK3b3eAGQjT5xKp2JFe5cF5NY5U/a2e
# ββββ€βββ€ββββββββββββββββββββββββ€ββββββββββββββββββββββββββββββββββββ€
# Algo Cost Salt (22 chars) Hash (31 chars)
#
# $2b = bcrypt version
# $12 = cost factor (2^12 iterations)
# Verify a password
start = time.time()
is_valid = bcrypt.checkpw(password, hashed)
elapsed = time.time() - start
print(f"Valid password: {is_valid} ({elapsed:.3f}s)")
# Wrong password
is_valid = bcrypt.checkpw(b"wrong-password", hashed)
print(f"Wrong password: {is_valid}")
# Each hash is unique even for the same password (different salt)
hash2 = bcrypt.hashpw(password, bcrypt.gensalt(rounds=12))
print(f"\nHash 1: {hashed.decode()}")
print(f"Hash 2: {hash2.decode()}")
print(f"Same? {hashed == hash2}") # False (different salts)
# But both verify the same password
print(f"Both verify 'correct-horse-battery-staple': "
f"{bcrypt.checkpw(password, hashed) and bcrypt.checkpw(password, hash2)}")
5.2 scrypt¶
scrypt is memory-hard: it requires a large amount of memory proportional to the cost parameter, making it resistant to GPU and ASIC attacks.
import hashlib
import os
import time
password = b"my-secure-password"
salt = os.urandom(16)
# scrypt parameters:
# n = CPU/memory cost (must be power of 2). Higher = slower + more memory.
# r = block size (8 is standard)
# p = parallelism (1 is standard)
# Memory usage β 128 * n * r bytes
start = time.time()
derived = hashlib.scrypt(
password,
salt=salt,
n=2**14, # 16384 iterations
r=8, # Block size
p=1, # Parallelism
dklen=32 # Output 256 bits
)
elapsed = time.time() - start
print(f"scrypt derived key: {derived.hex()}")
print(f"Salt: {salt.hex()}")
print(f"Time: {elapsed:.3f}s")
print(f"Memory used: ~{128 * 2**14 * 8 / 1024 / 1024:.0f} MB")
# Verify
derived2 = hashlib.scrypt(password, salt=salt, n=2**14, r=8, p=1, dklen=32)
print(f"Verification: {derived == derived2}")
5.3 Argon2 (Recommended)¶
Argon2 won the 2015 Password Hashing Competition and is the currently recommended algorithm. It comes in three variants:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Argon2 Variants β
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Variant β Description β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Argon2d β Data-dependent memory access. Faster, more GPU- β
β β resistant, but vulnerable to side-channel attacks. β
β β Best for cryptocurrency mining, NOT for passwords. β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Argon2i β Data-independent memory access. Resistant to side- β
β β channel attacks. Better for password hashing in β
β β shared environments. β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Argon2id β Hybrid: first pass is Argon2i, subsequent passes β
β β are Argon2d. RECOMMENDED for password hashing. β
β β Best of both worlds. β
ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# pip install argon2-cffi
from argon2 import PasswordHasher, Type
import time
# Create a hasher with recommended parameters
# Adjust time_cost and memory_cost based on your server's capabilities
# Target: ~0.5-1.0 seconds per hash
ph = PasswordHasher(
time_cost=3, # Number of iterations
memory_cost=65536, # Memory in KB (64 MB)
parallelism=4, # Number of threads
hash_len=32, # Output hash length
salt_len=16, # Salt length
type=Type.ID, # Argon2id (recommended)
)
password = "correct-horse-battery-staple"
# Hash
start = time.time()
hashed = ph.hash(password)
elapsed = time.time() - start
print(f"Argon2id hash: {hashed}")
print(f"Time: {elapsed:.3f}s")
print()
# Anatomy of an Argon2 hash string:
# $argon2id$v=19$m=65536,t=3,p=4$c2FsdDEyMzQ1Njc4$aXaShZ7S2X3yBqBvP5WF4w
# βββββββββ€βββββ€ββββββββββββββββ€ββββββββββββββββββββ€ββββββββββββββββββββββββββββ€
# Algo Ver Parameters Salt (b64) Hash (b64)
# Verify
try:
is_valid = ph.verify(hashed, password)
print(f"Valid password: {is_valid}")
except Exception as e:
print(f"Verification failed: {e}")
# Wrong password
try:
ph.verify(hashed, "wrong-password")
except Exception as e:
print(f"Wrong password: {type(e).__name__}")
# Check if rehash is needed (parameters changed)
if ph.check_needs_rehash(hashed):
print("Hash needs rehash with updated parameters")
else:
print("Hash parameters are current")
5.4 Password Hashing Comparison¶
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Password Hashing Algorithm Comparison β
βββββββββββββββ¬ββββββββββ¬ββββββββββββββββ¬βββββββββββ¬βββββββββββββββββββ€
β Algorithm β Memory β GPU/ASIC β Side-Ch. β Recommendation β
β β Hard? β Resistant? β Safe? β β
βββββββββββββββΌββββββββββΌββββββββββββββββΌβββββββββββΌβββββββββββββββββββ€
β bcrypt β No β Moderate β Yes β Good (mature) β
β scrypt β Yes β Good β Partial β Good β
β Argon2id β Yes β Best β Yes β Best (preferred) β
β PBKDF2 β No β Poor β Yes β Legacy only β
βββββββββββββββ΄ββββββββββ΄ββββββββββββββββ΄βββββββββββ΄βββββββββββββββββββ€
β β
β Recommended parameters (2025+, targeting ~0.5s on server): β
β β’ Argon2id: m=64MB, t=3, p=4 β
β β’ bcrypt: cost=12 to 14 β
β β’ scrypt: N=2^15, r=8, p=1 β
β β’ PBKDF2: 600,000+ iterations with SHA-256 β
β β
β OWASP recommends: Argon2id first, bcrypt second. β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5.5 Complete Password Storage System¶
from argon2 import PasswordHasher, Type, exceptions
from dataclasses import dataclass, field
from typing import Optional, Dict
import secrets
import time
@dataclass
class UserRecord:
username: str
password_hash: str
created_at: float
last_login: Optional[float] = None
failed_attempts: int = 0
locked_until: Optional[float] = None
class PasswordStore:
"""Production-quality password storage using Argon2id."""
MAX_ATTEMPTS = 5
LOCKOUT_SECONDS = 300 # 5 minutes
MIN_PASSWORD_LENGTH = 12
def __init__(self):
self.hasher = PasswordHasher(
time_cost=3,
memory_cost=65536,
parallelism=4,
hash_len=32,
salt_len=16,
type=Type.ID,
)
self.users: Dict[str, UserRecord] = {}
def _validate_password_strength(self, password: str) -> list:
"""Check password against basic strength requirements."""
issues = []
if len(password) < self.MIN_PASSWORD_LENGTH:
issues.append(f"Password must be at least {self.MIN_PASSWORD_LENGTH} chars")
if password.lower() == password:
issues.append("Password must contain uppercase letters")
if not any(c.isdigit() for c in password):
issues.append("Password must contain digits")
# Check against common passwords (abbreviated list)
common = {"password", "123456", "qwerty", "admin", "letmein"}
if password.lower() in common:
issues.append("Password is too common")
return issues
def register(self, username: str, password: str) -> dict:
"""Register a new user."""
if username in self.users:
return {"success": False, "error": "Username already exists"}
issues = self._validate_password_strength(password)
if issues:
return {"success": False, "error": "Weak password", "issues": issues}
hashed = self.hasher.hash(password)
self.users[username] = UserRecord(
username=username,
password_hash=hashed,
created_at=time.time(),
)
return {"success": True, "message": f"User {username} registered"}
def authenticate(self, username: str, password: str) -> dict:
"""Authenticate a user."""
if username not in self.users:
# Perform a dummy hash to prevent timing-based user enumeration
self.hasher.hash("dummy-to-waste-time")
return {"success": False, "error": "Invalid credentials"}
user = self.users[username]
# Check lockout
if user.locked_until and time.time() < user.locked_until:
remaining = int(user.locked_until - time.time())
return {"success": False, "error": f"Account locked. Try in {remaining}s"}
try:
self.hasher.verify(user.password_hash, password)
# Check if rehash needed (parameters upgraded)
if self.hasher.check_needs_rehash(user.password_hash):
user.password_hash = self.hasher.hash(password)
user.failed_attempts = 0
user.locked_until = None
user.last_login = time.time()
return {"success": True, "message": f"Welcome, {username}!"}
except exceptions.VerifyMismatchError:
user.failed_attempts += 1
if user.failed_attempts >= self.MAX_ATTEMPTS:
user.locked_until = time.time() + self.LOCKOUT_SECONDS
return {
"success": False,
"error": "Invalid credentials",
"attempts_remaining": max(0, self.MAX_ATTEMPTS - user.failed_attempts)
}
# Usage
store = PasswordStore()
# Register
print(store.register("alice", "short")) # Fails - too short
print(store.register("alice", "MySecureP@ss123")) # Succeeds
print(store.register("alice", "AnotherPassword1")) # Fails - exists
# Authenticate
print(store.authenticate("alice", "MySecureP@ss123")) # Success
print(store.authenticate("alice", "wrong-password")) # Failure
print(store.authenticate("bob", "anything")) # User not found
6. HMAC: Message Authentication¶
HMAC (Hash-based Message Authentication Code) provides both integrity and authenticity. It combines a secret key with a hash function to produce a tag that can only be created and verified by someone who knows the key.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HMAC Construction β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β HMAC(K, m) = H((K' β opad) || H((K' β ipad) || m)) β
β β
β Where: β
β K = secret key β
β K' = key padded/hashed to block size β
β opad = 0x5c repeated to block size β
β ipad = 0x36 repeated to block size β
β H = hash function (SHA-256, etc.) β
β || = concatenation β
β β = XOR β
β β
β Step by step: β
β 1. If key > block size: K' = H(K) β
β 2. Else: K' = K padded with zeros β
β 3. Inner hash: H((K' β ipad) || message) β
β 4. Outer hash: H((K' β opad) || inner_hash) β
β β
β Why not just H(key || message)? β
β β Vulnerable to length-extension attacks with Merkle-Damgard β
β hashes (SHA-256). HMAC's nested structure prevents this. β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6.1 HMAC in Python¶
import hmac
import hashlib
import os
import time
# Generate a secret key
key = os.urandom(32) # 256-bit key
# Create HMAC
message = b"Transfer $1000 to account 12345"
mac = hmac.new(key, message, hashlib.sha256).hexdigest()
print(f"Message: {message.decode()}")
print(f"HMAC: {mac}")
# Verify HMAC (constant-time comparison)
received_mac = mac
is_valid = hmac.compare_digest(
mac,
hmac.new(key, message, hashlib.sha256).hexdigest()
)
print(f"Valid: {is_valid}")
# Tampered message
tampered = b"Transfer $9999 to account 12345"
is_valid = hmac.compare_digest(
mac,
hmac.new(key, tampered, hashlib.sha256).hexdigest()
)
print(f"Tampered valid: {is_valid}") # False
6.2 Practical: API Request Signing¶
import hmac
import hashlib
import json
import time
import os
from urllib.parse import urlencode
class APIRequestSigner:
"""
Sign API requests using HMAC-SHA256.
Similar to AWS Signature V4, Stripe webhook signatures, etc.
"""
def __init__(self, api_key: str, api_secret: bytes):
self.api_key = api_key
self.api_secret = api_secret
def sign_request(self, method: str, path: str,
body: dict = None, timestamp: float = None) -> dict:
"""Sign an API request and return headers."""
timestamp = timestamp or time.time()
ts_str = str(int(timestamp))
# Build canonical request string
body_str = json.dumps(body, sort_keys=True, separators=(',', ':')) if body else ""
canonical = f"{method}\n{path}\n{ts_str}\n{body_str}"
# Compute HMAC
signature = hmac.new(
self.api_secret,
canonical.encode(),
hashlib.sha256
).hexdigest()
return {
"X-API-Key": self.api_key,
"X-Timestamp": ts_str,
"X-Signature": signature,
}
def verify_request(self, method: str, path: str,
headers: dict, body: dict = None,
max_age_seconds: int = 300) -> dict:
"""Verify a signed API request."""
api_key = headers.get("X-API-Key", "")
timestamp = headers.get("X-Timestamp", "")
signature = headers.get("X-Signature", "")
# Check API key
if api_key != self.api_key:
return {"valid": False, "error": "Invalid API key"}
# Check timestamp (prevent replay attacks)
try:
ts = int(timestamp)
age = abs(time.time() - ts)
if age > max_age_seconds:
return {"valid": False, "error": f"Request too old ({age:.0f}s)"}
except ValueError:
return {"valid": False, "error": "Invalid timestamp"}
# Recompute and verify signature
body_str = json.dumps(body, sort_keys=True, separators=(',', ':')) if body else ""
canonical = f"{method}\n{path}\n{timestamp}\n{body_str}"
expected = hmac.new(
self.api_secret,
canonical.encode(),
hashlib.sha256
).hexdigest()
if not hmac.compare_digest(signature, expected):
return {"valid": False, "error": "Invalid signature"}
return {"valid": True, "api_key": api_key}
# Usage
api_key = "ak_live_abc123"
api_secret = os.urandom(32)
signer = APIRequestSigner(api_key, api_secret)
# Sign a request
headers = signer.sign_request(
method="POST",
path="/api/v1/transfers",
body={"amount": 1000, "currency": "USD", "to": "acct_xyz"}
)
print("Signed Request Headers:")
for k, v in headers.items():
print(f" {k}: {v}")
# Verify the request
result = signer.verify_request(
method="POST",
path="/api/v1/transfers",
headers=headers,
body={"amount": 1000, "currency": "USD", "to": "acct_xyz"}
)
print(f"\nVerification: {result}")
# Tampered body
result = signer.verify_request(
method="POST",
path="/api/v1/transfers",
headers=headers,
body={"amount": 9999, "currency": "USD", "to": "acct_xyz"}
)
print(f"Tampered body: {result}")
6.3 Webhook Signature Verification¶
import hmac
import hashlib
def verify_webhook_signature(
payload: bytes,
signature_header: str,
secret: bytes,
tolerance_seconds: int = 300,
) -> bool:
"""
Verify a webhook signature (Stripe-style).
Header format: t=<timestamp>,v1=<signature>
"""
# Parse the header
elements = {}
for part in signature_header.split(","):
key, _, value = part.partition("=")
elements[key.strip()] = value.strip()
timestamp = elements.get("t", "")
received_sig = elements.get("v1", "")
if not timestamp or not received_sig:
return False
# Check timestamp freshness
import time
try:
ts = int(timestamp)
if abs(time.time() - ts) > tolerance_seconds:
return False
except ValueError:
return False
# Compute expected signature
signed_payload = f"{timestamp}.".encode() + payload
expected_sig = hmac.new(
secret, signed_payload, hashlib.sha256
).hexdigest()
# Constant-time comparison
return hmac.compare_digest(expected_sig, received_sig)
# Simulate webhook
import time
webhook_secret = b"whsec_test_secret_key_123"
payload = b'{"event":"payment.success","amount":5000}'
ts = str(int(time.time()))
sig = hmac.new(
webhook_secret,
f"{ts}.".encode() + payload,
hashlib.sha256
).hexdigest()
header = f"t={ts},v1={sig}"
is_valid = verify_webhook_signature(payload, header, webhook_secret)
print(f"Webhook signature valid: {is_valid}")
# Tampered payload
is_valid = verify_webhook_signature(
b'{"event":"payment.success","amount":50000}', # Changed amount
header,
webhook_secret
)
print(f"Tampered webhook valid: {is_valid}")
7. Merkle Trees¶
A Merkle tree is a binary tree of hashes where each leaf node is a hash of a data block, and each internal node is a hash of its children. The root hash (Merkle root) summarizes all the data.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Merkle Tree β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Root Hash β
β H(H12 || H34) β
β / \ β
β / \ β
β H12 H34 β
β H(H1||H2) H(H3||H4) β
β / \ / \ β
β H1 H2 H3 H4 β
β H(D1) H(D2) H(D3) H(D4) β
β | | | | β
β Data1 Data2 Data3 Data4 β
β β
β Properties: β
β β’ Root hash changes if ANY data block changes β
β β’ Can verify a single block with O(log n) hashes (Merkle proof) β
β β’ Used in: Git, Bitcoin, Certificate Transparency, IPFS β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7.1 Merkle Tree Implementation¶
import hashlib
from typing import List, Optional
from dataclasses import dataclass
def sha256(data: bytes) -> bytes:
return hashlib.sha256(data).digest()
@dataclass
class MerkleNode:
hash: bytes
left: Optional['MerkleNode'] = None
right: Optional['MerkleNode'] = None
class MerkleTree:
"""A complete Merkle tree implementation with proof generation/verification."""
def __init__(self, data_blocks: List[bytes]):
if not data_blocks:
raise ValueError("Cannot create Merkle tree from empty data")
self.leaves = [MerkleNode(hash=sha256(block)) for block in data_blocks]
self.root = self._build_tree(self.leaves)
def _build_tree(self, nodes: List[MerkleNode]) -> MerkleNode:
"""Build the tree bottom-up."""
if len(nodes) == 1:
return nodes[0]
# If odd number of nodes, duplicate the last one
if len(nodes) % 2 == 1:
nodes.append(nodes[-1])
parent_nodes = []
for i in range(0, len(nodes), 2):
combined = nodes[i].hash + nodes[i + 1].hash
parent = MerkleNode(
hash=sha256(combined),
left=nodes[i],
right=nodes[i + 1],
)
parent_nodes.append(parent)
return self._build_tree(parent_nodes)
@property
def root_hash(self) -> str:
return self.root.hash.hex()
def get_proof(self, index: int) -> List[tuple]:
"""
Generate a Merkle proof for the leaf at the given index.
Returns list of (hash, position) tuples where position is 'left' or 'right'.
"""
if index < 0 or index >= len(self.leaves):
raise IndexError(f"Index {index} out of range")
proof = []
nodes = self.leaves[:]
# If odd, duplicate last
if len(nodes) % 2 == 1:
nodes.append(nodes[-1])
current_index = index
while len(nodes) > 1:
next_level = []
for i in range(0, len(nodes), 2):
if i == current_index or i + 1 == current_index:
# This is the pair containing our node
if current_index % 2 == 0:
# Our node is on the left, sibling is on the right
proof.append((nodes[i + 1].hash, "right"))
else:
# Our node is on the right, sibling is on the left
proof.append((nodes[i].hash, "left"))
combined = nodes[i].hash + nodes[i + 1].hash
parent = MerkleNode(hash=sha256(combined))
next_level.append(parent)
current_index = current_index // 2
nodes = next_level
if len(nodes) > 1 and len(nodes) % 2 == 1:
nodes.append(nodes[-1])
return proof
@staticmethod
def verify_proof(data: bytes, proof: List[tuple], root_hash: str) -> bool:
"""Verify a Merkle proof."""
current_hash = sha256(data)
for sibling_hash, position in proof:
if position == "left":
combined = sibling_hash + current_hash
else:
combined = current_hash + sibling_hash
current_hash = sha256(combined)
return current_hash.hex() == root_hash
# Build a Merkle tree
data_blocks = [
b"Transaction: Alice -> Bob $100",
b"Transaction: Bob -> Charlie $50",
b"Transaction: Charlie -> Dave $25",
b"Transaction: Dave -> Alice $75",
]
tree = MerkleTree(data_blocks)
print(f"Merkle root: {tree.root_hash}")
print(f"Leaves: {len(tree.leaves)}")
# Generate proof for transaction at index 1
proof = tree.get_proof(1)
print(f"\nProof for block 1 ({len(proof)} nodes):")
for h, pos in proof:
print(f" {pos}: {h.hex()[:32]}...")
# Verify the proof
is_valid = MerkleTree.verify_proof(
data_blocks[1], proof, tree.root_hash
)
print(f"\nProof valid: {is_valid}")
# Tampered data fails
is_valid = MerkleTree.verify_proof(
b"Transaction: Bob -> Charlie $5000", # Tampered!
proof, tree.root_hash
)
print(f"Tampered proof valid: {is_valid}")
# Show efficiency: verifying 1 block out of 1M requires only ~20 hashes
import math
n_blocks = 1_000_000
proof_size = math.ceil(math.log2(n_blocks))
print(f"\nFor {n_blocks:,} blocks:")
print(f" Proof size: {proof_size} hashes ({proof_size * 32} bytes)")
print(f" vs checking all: {n_blocks * 32:,} bytes")
print(f" Efficiency: {n_blocks * 32 / (proof_size * 32):,.0f}x smaller")
8. Content-Addressable Storage¶
Content-addressable storage (CAS) uses the hash of data as its address/key. This is the foundation of Git, IPFS, Docker layers, and many deduplication systems.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Content-Addressable Storage (CAS) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Traditional Storage: β
β filename β data β
β /docs/report.pdf β [file contents] β
β β
β Content-Addressable Storage: β
β hash(data) β data β
β sha256:a3f2b8... β [file contents] β
β β
β Benefits: β
β β’ Automatic deduplication (same content = same hash = stored once) β
β β’ Built-in integrity verification (hash IS the address) β
β β’ Immutable by design (changing content changes the address) β
β β’ Cache-friendly (content never changes for a given address) β
β β
β Used in: β
β β’ Git (blob objects addressed by SHA-1/SHA-256) β
β β’ Docker (image layers addressed by SHA-256) β
β β’ IPFS (blocks addressed by multihash) β
β β’ Nix package manager β
β β’ Content delivery networks β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
8.1 CAS Implementation¶
import hashlib
import json
import os
from pathlib import Path
from typing import Optional
from dataclasses import dataclass, field
@dataclass
class CASStats:
total_objects: int = 0
total_bytes: int = 0
deduplicated_bytes: int = 0
class ContentAddressableStore:
"""
A simple content-addressable store using SHA-256.
Similar to how Git stores objects.
"""
def __init__(self, store_dir: str):
self.store_dir = Path(store_dir)
self.store_dir.mkdir(parents=True, exist_ok=True)
def _hash_content(self, data: bytes) -> str:
"""Compute SHA-256 hash of content."""
return hashlib.sha256(data).hexdigest()
def _object_path(self, content_hash: str) -> Path:
"""
Get the filesystem path for an object.
Uses first 2 chars as directory (like Git) to avoid
too many files in a single directory.
"""
return self.store_dir / content_hash[:2] / content_hash[2:]
def put(self, data: bytes) -> str:
"""
Store data and return its content hash.
If data already exists, this is a no-op (deduplication).
"""
content_hash = self._hash_content(data)
obj_path = self._object_path(content_hash)
if not obj_path.exists():
obj_path.parent.mkdir(parents=True, exist_ok=True)
obj_path.write_bytes(data)
return content_hash
def get(self, content_hash: str) -> Optional[bytes]:
"""Retrieve data by its content hash."""
obj_path = self._object_path(content_hash)
if not obj_path.exists():
return None
data = obj_path.read_bytes()
# Verify integrity on read
actual_hash = self._hash_content(data)
if actual_hash != content_hash:
raise RuntimeError(
f"Integrity error! Expected {content_hash}, got {actual_hash}"
)
return data
def exists(self, content_hash: str) -> bool:
"""Check if an object exists."""
return self._object_path(content_hash).exists()
def delete(self, content_hash: str) -> bool:
"""Delete an object (use with care in production)."""
obj_path = self._object_path(content_hash)
if obj_path.exists():
obj_path.unlink()
# Clean up empty directory
try:
obj_path.parent.rmdir()
except OSError:
pass
return True
return False
def stats(self) -> CASStats:
"""Compute storage statistics."""
stats = CASStats()
for subdir in self.store_dir.iterdir():
if subdir.is_dir():
for obj_file in subdir.iterdir():
stats.total_objects += 1
stats.total_bytes += obj_file.stat().st_size
return stats
# Usage
store = ContentAddressableStore("/tmp/cas_demo")
# Store some data
data1 = b"Hello, content-addressable world!"
data2 = b"Another piece of data"
data3 = b"Hello, content-addressable world!" # Duplicate of data1!
hash1 = store.put(data1)
hash2 = store.put(data2)
hash3 = store.put(data3)
print(f"Data 1 hash: {hash1}")
print(f"Data 2 hash: {hash2}")
print(f"Data 3 hash: {hash3}")
print(f"Data 1 == Data 3 hash? {hash1 == hash3}") # True - deduplication!
# Retrieve by hash
retrieved = store.get(hash1)
print(f"\nRetrieved: {retrieved.decode()}")
print(f"Integrity: {retrieved == data1}")
# Stats
stats = store.stats()
print(f"\nStore stats:")
print(f" Objects: {stats.total_objects}") # 2, not 3 (dedup!)
print(f" Total bytes: {stats.total_bytes}")
# Clean up
import shutil
shutil.rmtree("/tmp/cas_demo")
8.2 Git-Style Object Store¶
import hashlib
import zlib
import os
from pathlib import Path
class GitObjectStore:
"""
Simplified Git object store.
Git stores objects as: header + content, compressed with zlib.
Header format: "<type> <size>\0"
"""
def __init__(self, git_dir: str):
self.objects_dir = Path(git_dir) / "objects"
self.objects_dir.mkdir(parents=True, exist_ok=True)
def hash_object(self, data: bytes, obj_type: str = "blob") -> str:
"""Hash an object (like 'git hash-object')."""
header = f"{obj_type} {len(data)}\0".encode()
full_content = header + data
return hashlib.sha1(full_content).hexdigest() # Git uses SHA-1 (transitioning to SHA-256)
def write_object(self, data: bytes, obj_type: str = "blob") -> str:
"""Write an object to the store (like 'git hash-object -w')."""
header = f"{obj_type} {len(data)}\0".encode()
full_content = header + data
obj_hash = hashlib.sha1(full_content).hexdigest()
# Store compressed
obj_path = self.objects_dir / obj_hash[:2] / obj_hash[2:]
if not obj_path.exists():
obj_path.parent.mkdir(parents=True, exist_ok=True)
compressed = zlib.compress(full_content)
obj_path.write_bytes(compressed)
return obj_hash
def read_object(self, obj_hash: str) -> tuple:
"""Read an object (like 'git cat-file')."""
obj_path = self.objects_dir / obj_hash[:2] / obj_hash[2:]
if not obj_path.exists():
raise FileNotFoundError(f"Object {obj_hash} not found")
compressed = obj_path.read_bytes()
full_content = zlib.decompress(compressed)
# Parse header
null_pos = full_content.index(b'\0')
header = full_content[:null_pos].decode()
obj_type, size = header.split(' ')
data = full_content[null_pos + 1:]
assert len(data) == int(size), "Size mismatch"
return obj_type, data
# Usage
store = GitObjectStore("/tmp/git_demo/.git")
# Store a blob (file content)
content = b"print('Hello, World!')\n"
blob_hash = store.write_object(content, "blob")
print(f"Blob hash: {blob_hash}")
# Read it back
obj_type, data = store.read_object(blob_hash)
print(f"Type: {obj_type}")
print(f"Content: {data.decode()}", end="")
# Store a tree (directory listing) - simplified
tree_content = f"100644 hello.py\0".encode() + bytes.fromhex(blob_hash)
tree_hash = store.write_object(tree_content, "tree")
print(f"\nTree hash: {tree_hash}")
# Clean up
import shutil
shutil.rmtree("/tmp/git_demo")
9. Timing Attacks and Constant-Time Comparison¶
9.1 The Problem¶
When comparing two strings character by character, the time taken depends on where the first difference occurs. An attacker can measure response times to determine how many characters of a secret value they have guessed correctly.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Timing Attack on String Comparison β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Secret MAC: "a3f2b8c9d1e4" β
β β
β Attempt 1: "x3f2b8c9d1e4" β Fails at position 0 β ~100 ns β
β Attempt 2: "a4f2b8c9d1e4" β Fails at position 1 β ~110 ns β
β Attempt 3: "a3g2b8c9d1e4" β Fails at position 2 β ~120 ns β
β β
β The attacker notices each correct character adds ~10 ns. β
β By trying all values for each position, they can recover β
β the entire MAC one character at a time: O(n Γ 16) instead of β
β O(16^n). β
β β
β For a 32-character hex MAC: β
β Brute force: 16^32 β 3.4 Γ 10^38 attempts β
β Timing attack: 32 Γ 16 = 512 attempts (!) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
9.2 Vulnerable vs Secure Comparison¶
import hmac
import hashlib
import time
import os
# VULNERABLE: Early-exit string comparison
def insecure_compare(a: str, b: str) -> bool:
"""
DO NOT USE! Vulnerable to timing attacks.
Returns False as soon as a mismatch is found.
"""
if len(a) != len(b):
return False
for x, y in zip(a, b):
if x != y:
return False # Early exit leaks information!
return True
# SECURE: Constant-time comparison
def secure_compare(a: str, b: str) -> bool:
"""
Constant-time comparison. Takes the same time regardless
of where (or if) the strings differ.
"""
return hmac.compare_digest(a, b)
# Demonstrate the timing difference
secret_mac = hashlib.sha256(b"secret-key" + b"message").hexdigest()
# Generate test MACs with increasing correct prefix length
test_macs = []
for i in range(0, len(secret_mac), 4):
correct_prefix = secret_mac[:i]
wrong_suffix = "0" * (len(secret_mac) - i)
test_macs.append(correct_prefix + wrong_suffix)
print("Timing Attack Demonstration")
print("=" * 60)
print(f"Secret MAC: {secret_mac[:32]}...")
print()
# Measure insecure comparison times
print("INSECURE comparison (early exit):")
for mac in test_macs[:8]:
correct_chars = sum(a == b for a, b in zip(mac, secret_mac))
times = []
for _ in range(10000):
start = time.perf_counter_ns()
insecure_compare(mac, secret_mac)
elapsed = time.perf_counter_ns() - start
times.append(elapsed)
avg = sum(times) / len(times)
print(f" {correct_chars:2d} correct chars β avg {avg:6.0f} ns")
print()
# Measure secure comparison times
print("SECURE comparison (constant-time):")
for mac in test_macs[:8]:
correct_chars = sum(a == b for a, b in zip(mac, secret_mac))
times = []
for _ in range(10000):
start = time.perf_counter_ns()
secure_compare(mac, secret_mac)
elapsed = time.perf_counter_ns() - start
times.append(elapsed)
avg = sum(times) / len(times)
print(f" {correct_chars:2d} correct chars β avg {avg:6.0f} ns")
print()
print("Note: Secure comparison time should be roughly constant")
print("regardless of how many characters match.")
9.3 Rules for Timing-Safe Code¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Rules for Constant-Time Operations β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. NEVER use == to compare secrets (MACs, tokens, passwords) β
β USE: hmac.compare_digest() or secrets.compare_digest() β
β β
β 2. NEVER return early based on secret data β
β BAD: if mac[i] != expected[i]: return False β
β GOOD: result |= (mac[i] ^ expected[i]) # accumulate diffs β
β β
β 3. NEVER branch on secret data β
β BAD: if secret_key[0] == 'a': ... β
β GOOD: Use constant-time selection / masking β
β β
β 4. NEVER index arrays with secret values β
β BAD: table[secret_byte] # cache timing leak β
β GOOD: Use constant-time lookup tables β
β β
β 5. ALWAYS verify MACs before decrypting β
β This prevents padding oracle attacks (Encrypt-then-MAC) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
10. Hash Function Attacks and Mitigations¶
10.1 Known Attacks¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hash Function Attack Summary β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ€
β Attack β Description and Status β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€
β Birthday attack β Find ANY collision in O(2^(n/2)) β
β β SHA-256: 2^128 (safe), MD5: 2^64 (broken) β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€
β Length-extension β Given H(m), compute H(m||pad||suffix) without β
β β knowing m. Affects SHA-256, SHA-512. β
β β NOT: SHA-3, BLAKE2, HMAC, SHA-512/256 β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€
β Rainbow tables β Precomputed hash-to-password lookup tables. β
β β Defeated by salting passwords. β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€
β Dictionary attack β Try common passwords against a hash. β
β β Mitigated by slow password hashes (Argon2). β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€
β GPU brute force β GPUs can compute billions of hashes/second. β
β β Mitigated by memory-hard hashes (Argon2id). β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€
β SHAttered (2017) β Practical SHA-1 collision found by Google. β
β β Two PDFs with same SHA-1 but different content β
β β SHA-1 is now considered broken for security. β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€
β Multi-collision β Find many messages with the same hash. β
β β Easier than expected for iterative hashes. β
β β SHA-3's sponge construction is more resistant. β
ββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββ
10.2 Length-Extension Attack¶
# Length-extension attack demonstration
# This is why you should use HMAC instead of H(key || message)
import hashlib
import struct
def sha256_pad(message_len: int) -> bytes:
"""Compute SHA-256 padding for a message of given length."""
bit_len = message_len * 8
# Padding: 1 bit, then zeros, then 64-bit length
padding = b'\x80'
padding += b'\x00' * ((56 - (message_len + 1) % 64) % 64)
padding += struct.pack('>Q', bit_len)
return padding
# The vulnerability: H(secret || message) is NOT a secure MAC
#
# If an attacker knows:
# - H(secret || message) (the MAC)
# - len(secret) (or can guess it)
# - message (the original message)
#
# They can compute H(secret || message || padding || extension)
# WITHOUT knowing the secret!
#
# This is because SHA-256 processes data in blocks, and the hash
# state after processing (secret || message || padding) is exactly
# the public hash value. The attacker can resume hashing from there.
print("Length-Extension Attack Concept:")
print("=" * 60)
print()
print("VULNERABLE construction: MAC = SHA-256(secret || message)")
print("An attacker who knows MAC and len(secret) can extend the message.")
print()
print("SAFE alternatives:")
print(" 1. HMAC-SHA256(key, message) β HMAC construction")
print(" 2. SHA-3-256(key || message) β SHA-3 is not vulnerable")
print(" 3. BLAKE2b(message, key=key) β Built-in keying")
print(" 4. SHA-512/256(key || message) β Truncated hash")
11. Exercises¶
Exercise 1: Hash Explorer (Beginner)¶
Write a Python program that: 1. Takes a filename as input 2. Computes SHA-256, SHA-3-256, BLAKE2b-256, and BLAKE3 hashes 3. Displays all hashes and the time taken for each 4. Compares the speed of each algorithm on a 100 MB file 5. Verifies the file has not been modified by comparing hashes
Exercise 2: Password Cracker Defense (Intermediate)¶
Implement a password cracking simulation: 1. Create a list of 1000 "user accounts" with passwords hashed using: a. Plain SHA-256 (no salt) b. SHA-256 with unique salt c. bcrypt (cost=10) d. Argon2id (default parameters) 2. Attempt to crack all passwords using a dictionary of 10,000 common passwords 3. Measure and compare the time needed for each hashing method 4. Generate a report showing cracked vs uncracked percentages
Exercise 3: Merkle Tree File Verifier (Intermediate)¶
Build a file integrity verification system using Merkle trees: 1. Given a directory, compute a Merkle tree over all files (sorted by path) 2. Store the Merkle root and tree structure 3. Later, verify any individual file by recomputing only O(log n) hashes 4. Generate a Merkle proof that can be independently verified 5. Handle file additions and deletions efficiently
Exercise 4: HMAC-based API Authentication (Intermediate)¶
Implement a complete API authentication system: 1. Server issues API key + secret to each client 2. Client signs each request with HMAC-SHA256: - Include: method, path, timestamp, body hash, nonce 3. Server verifies: valid signature, fresh timestamp (within 5 min), unused nonce 4. Implement replay attack protection using a nonce cache 5. Handle clock skew between client and server
Exercise 5: Content-Addressable File Sync (Advanced)¶
Build a simplified file synchronization system (like rsync with content addressing): 1. Both sides maintain a CAS of their files 2. To sync, exchange only the list of content hashes 3. Transfer only the blocks that the other side is missing 4. Verify integrity of all received blocks 5. Use a Merkle tree to efficiently detect which parts of large files differ
Exercise 6: Timing Attack Lab (Advanced)¶
Build a controlled timing attack:
1. Create a server that verifies API tokens using insecure comparison (==)
2. Write a client that measures response times to guess the token one character at a time
3. Demonstrate that the attack recovers the full token
4. Fix the server to use hmac.compare_digest()
5. Show that the attack no longer works
6. Discuss other timing side channels (cache timing, power analysis)
References¶
- Aumasson, J.P. (2017). Serious Cryptography. No Starch Press.
- NIST FIPS 180-4: Secure Hash Standard (SHA-2)
- NIST FIPS 202: SHA-3 Standard
- RFC 7693: The BLAKE2 Cryptographic Hash
- BLAKE3 Specification - https://github.com/BLAKE3-team/BLAKE3-specs
- RFC 2104: HMAC: Keyed-Hashing for Message Authentication
- OWASP Password Storage Cheat Sheet - https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html
- Argon2 Reference - https://github.com/P-H-C/phc-winner-argon2
- Merkle, R.C. (1979). "A Certified Digital Signature"
Previous: 02. Cryptography Basics | Next: 04. TLS/SSL and Public Key Infrastructure