Database Scaling
Database Scaling¶
Overview¶
This document covers database scaling strategies. You will learn the difference between partitioning and sharding, various sharding strategies (Range, Hash, Directory), shard key selection, hotspot prevention, and rebalancing.
Difficulty: βββ Estimated Learning Time: 2-3 hours Prerequisites: 07_Distributed_Cache_Systems.md, PostgreSQL Folder
Table of Contents¶
- The Need for Database Scaling
- Partitioning vs Sharding
- Sharding Strategies
- Shard Key Selection
- Hotspot Prevention
- Rebalancing
- Practice Problems
- Next Steps
- References
1. The Need for Database Scaling¶
1.1 Limitations of a Single Database¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Single Database Limitations β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Issues with Traffic/Data Growth: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β 1. Storage Capacity Limits β β
β β β’ Physical limits of single disk/server β β
β β β’ Difficult to manage data exceeding tens of TB β β
β β β β
β β 2. Throughput Limits β β
β β β’ Single server CPU/memory limitations β β
β β β’ Limited concurrent connections β β
β β β β
β β 3. Increased Response Time β β
β β β’ Large tables β Slow queries β β
β β β’ Growing index sizes β β
β β β β
β β 4. Single Point of Failure (SPOF) β β
β β β’ Server failure = Entire service failure β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Solutions: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Vertical Scaling (Scale Up) β β
β β β More powerful hardware (has limits) β β
β β β β
β β Horizontal Scaling (Scale Out) β β
β β β Distribute across multiple servers (partitioning/ β β
β β sharding) β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1.2 Scaling Strategy Overview¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Database Scaling Strategies β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β 1. Read Replica β β
β β β’ Distribute read load β β
β β β’ Covered in detail in next lesson β β
β β β β
β β 2. Partitioning β β
β β β’ Split tables within a single DB β β
β β β’ PostgreSQL native partitioning β β
β β β β
β β 3. Sharding β β
β β β’ Distribute data across multiple DB servers β β
β β β’ Core of horizontal scaling β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β [Single DB] β
β β β
β ββββββββββββββββΌβββββββββββββββ β
β βΌ βΌ βΌ β
β [Replication] [Partitioning] [Sharding] β
β Read scaling Single DB split Multi-DB distributed β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Partitioning vs Sharding¶
2.1 Partitioning¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Partitioning β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β "Splitting a table into multiple partitions within a single β
β DB server" β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Database Server β β β
β β β β β β
β β β βββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β orders (logical table) β β β β
β β β βββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β β β β
β β β ββββββββββββββββΌβββββββββββββββ β β β
β β β βΌ βΌ βΌ β β β
β β β βββββββββββββ βββββββββββββ βββββββββββββ β β β
β β β βorders_2022β βorders_2023β βorders_2024β β β β
β β β β(partition1)β β(partition2)β β(partition3)β β β β
β β β βββββββββββββ βββββββββββββ βββββββββββββ β β β
β β β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β PostgreSQL Example: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β -- Create partitioned table β β
β β CREATE TABLE orders ( β β
β β id SERIAL, β β
β β order_date DATE, β β
β β amount DECIMAL β β
β β ) PARTITION BY RANGE (order_date); β β
β β β β
β β -- Create partition β β
β β CREATE TABLE orders_2024 PARTITION OF orders β β
β β FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'); β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Advantages: β
β β’ Improved query performance (partition pruning) β
β β’ Easier data management (delete/backup per partition) β
β β’ Reduced index size β
β β
β Limitations: β
β β’ Still a single server (capacity, throughput limits) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2.2 Sharding¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Sharding β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β "Distributing data across multiple DB servers" β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββ β β
β β β Application β β β
β β ββββββββββββββββββββ¬βββββββββββββββββββββββ β β
β β β β β
β β ββββββββββ΄βββββββββ β β
β β β Shard Router β β β
β β β (Shard Selector)β β β
β β ββββββββββ¬βββββββββ β β
β β β β β
β β ββββββββββββββββββββΌβββββββββββββββββββ β β
β β β β β β β
β β βΌ βΌ βΌ β β
β β ββββββββββββββ ββββββββββββββ ββββββββββββββ β β
β β β Shard 1 β β Shard 2 β β Shard 3 β β β
β β β (Server 1) β β (Server 2) β β (Server 3) β β β
β β β β β β β β β β
β β β user_id β β user_id β β user_id β β β
β β β 1-1000000 β β 1000001- β β 2000001- β β β
β β β β β 2000000 β β 3000000 β β β
β β ββββββββββββββ ββββββββββββββ ββββββββββββββ β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Advantages: β
β β’ Theoretically unlimited scaling β
β β’ Reduced load on each shard β
β β’ Fault isolation β
β β
β Disadvantages: β
β β’ Increased complexity β
β β’ Difficult cross-shard queries β
β β’ Transaction constraints β
β β’ Difficult rebalancing β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2.3 Comparison¶
| Aspect | Partitioning | Sharding |
|---|---|---|
| Location | Within single server | Across multiple servers |
| Scalability | Limited | High |
| Complexity | Low | High |
| Transactions | Full support | Within shard only |
| JOIN | Possible | Cross-shard difficult |
| Management | DB handles automatically | Application manages |
3. Sharding Strategies¶
3.1 Range-Based Sharding¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Range-Based Sharding β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β "Determine shard based on value ranges" β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β user_id based Range sharding: β β
β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Shard 1 β β Shard 2 β β Shard 3 β β β
β β β β β β β β β β
β β β user_id β β user_id β β user_id β β β
β β β 1 ~ 1M β β 1M+1 ~ 2M β β 2M+1 ~ 3M β β β
β β β β β β β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β β
β β Date based Range sharding: β β
β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Shard 1 β β Shard 2 β β Shard 3 β β β
β β β β β β β β β β
β β β 2022 β β 2023 β β 2024 β β β
β β β data β β data β β data β β β
β β β β β β β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Routing Logic: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β def get_shard(user_id): β β
β β if user_id <= 1_000_000: β β
β β return "shard_1" β β
β β elif user_id <= 2_000_000: β β
β β return "shard_2" β β
β β else: β β
β β return "shard_3" β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Advantages: β
β β’ Efficient range queries β
β β’ Simple implementation β
β β
β Disadvantages: β
β β’ Possible hotspots (new users concentrated in last shard) β
β β’ Uneven distribution possible β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.2 Hash-Based Sharding¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hash-Based Sharding β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β "Determine shard by hash value of the key" β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β shard = hash(user_id) % N β β
β β β β
β β user_id: 12345 β β
β β hash(12345) = 67890 β β
β β 67890 % 3 = 0 β Shard 0 β β
β β β β
β β user_id: 12346 β β
β β hash(12346) = 12347 β β
β β 12347 % 3 = 1 β Shard 1 β β
β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Shard 0 β β Shard 1 β β Shard 2 β β β
β β β β β β β β β β
β β β user: 12345 β β user: 12346 β β user: 12347 β β β
β β β user: 12348 β β user: 12349 β β user: 12350 β β β
β β β ... β β ... β β ... β β β
β β β β β β β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Routing Logic: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β def get_shard(user_id, num_shards=3): β β
β β hash_value = hash(str(user_id)) β β
β β shard_id = hash_value % num_shards β β
β β return f"shard_{shard_id}" β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Advantages: β
β β’ Even distribution β
β β’ Hotspot prevention β
β β
β Disadvantages: β
β β’ Difficult range queries (requires querying all shards) β
β β’ Large-scale redistribution when adding/removing shards β
β β’ β Solved with consistent hashing β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.3 Directory-Based Sharding¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Directory-Based Sharding β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β "Determine shard using a lookup table (directory)" β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Application β β
β β β β β
β β β user_id: 12345 β β
β β βΌ β β
β β ββββββββββββββββββββ β β
β β β Directory Serviceβ β β
β β β (Lookup Table) β β β
β β β β β β
β β β user_id β shard β β β
β β β βββββββββΌβββββββ β β β
β β β 12345 β shard_2β β β
β β β 12346 β shard_1β β β
β β β 12347 β shard_3β β β
β β β ... β ... β β β
β β βββββββββββ¬βββββββββ β β
β β β β β
β β β shard_2 β β
β β βΌ β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Shard 1 β β Shard 2 β β Shard 3 β β β
β β βββββββββββββββ ββββββββ¬βββββββ βββββββββββββββ β β
β β β β β
β β βΌ β β
β β user: 12345 β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Advantages: β
β β’ Flexible data movement β
β β’ Easy to resolve imbalances β
β β’ Can move specific users to specific shards β
β β
β Disadvantages: β
β β’ Directory service is SPOF β
β β’ Additional lookup overhead β
β β’ Directory size growth β
β β
β Solutions: β
β β’ Cache the directory (Redis) β
β β’ Replicate the directory β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.4 Strategy Comparison¶
| Strategy | Even Distribution | Range Queries | Rebalancing | Complexity |
|---|---|---|---|---|
| Range | Low | Good | Difficult | Low |
| Hash | Good | Difficult | Difficult | Low |
| Directory | Good | Possible | Easy | High |
| Consistent Hash | Good | Difficult | Easy | Medium |
4. Shard Key Selection¶
4.1 Characteristics of a Good Shard Key¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Good Shard Key Characteristics β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. High Cardinality β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Good: user_id (millions of unique values) β β
β β Bad: country (dozens) β β
β β Bad: status (few) β β
β β β β
β β β Many unique values enable even distribution β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Even Distribution β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Good: hashed user_id β β
β β Bad: created_date (new data concentrated on one shard) β β
β β Bad: popular product_id (queries concentrated on certain β β
β β products) β β
β β β β
β β β Data and queries should be evenly distributed β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 3. Query Pattern Alignment β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Most queries: "WHERE user_id = ?" β β
β β β Use user_id as shard key! β β
β β β β
β β Most queries: "WHERE order_date BETWEEN ... AND ..." β β
β β β Use order_date as shard key! β β
β β β β
β β β Frequent queries should be processed within a single β β
β β shard β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 4. Immutability β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Good: user_id (doesn't change) β β
β β Bad: email (user can change it) β β
β β Bad: status (changes) β β
β β β β
β β β Changing shard key = Data migration (expensive) β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4.2 Shard Key Examples by Domain¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Shard Key Examples by Domain β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Social Media (User-centric): β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ users: user_id β β
β β β’ posts: user_id (author) β β
β β β’ followers: user_id (the one being followed) β β
β β β’ messages: conversation_id β β
β β β β
β β β User's data stays in the same shard! β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β E-commerce (Tenant-centric): β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ products: merchant_id β β
β β β’ orders: user_id or order_id β β
β β β’ reviews: product_id β β
β β β β
β β Note: If orders use user_id, β β
β β seller order queries require all shards β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β SaaS (Tenant-centric): β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ All tables: tenant_id β β
β β β β
β β β Complete isolation per tenant β β
β β β No cross-tenant queries needed β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Logs/Analytics (Time-series): β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ logs: timestamp (Range sharding) β β
β β β’ metrics: (device_id, timestamp) β β
β β β β
β β β Efficient for range queries β β
β β β Easy to delete old shards β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5. Hotspot Prevention¶
5.1 The Hotspot Problem¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β The Hotspot Problem β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β "A phenomenon where traffic/data concentrates on a specific β
β shard" β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Normal: β β
β β βββββββββ βββββββββ βββββββββ β β
β β βShard 1β βShard 2β βShard 3β β β
β β β 33% β β 33% β β 33% β β β
β β βββββββββ βββββββββ βββββββββ β β
β β β β
β β Hotspot: β β
β β βββββββββ βββββββββ βββββββββ β β
β β βShard 1β βShard 2β βShard 3β β β
β β β 80% β β 10% β β 10% β Overloaded! β β
β β βββββββββ βββββββββ βββββββββ β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Causes: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β 1. Range sharding + Sequential keys β β
β β β New data concentrated in last shard β β
β β β β
β β 2. Popular entities β β
β β β Celebrity accounts, popular products β β
β β β β
β β 3. Time-based patterns β β
β β β Specific shard concentration at certain times β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5.2 Hotspot Prevention Strategies¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hotspot Prevention Strategies β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Use Hash-Based Sharding β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Use Hash instead of Range for even distribution β β
β β user_id: 1, 2, 3 β distributed, not in same shard β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Composite Shard Key β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Single key: user_id β β
β β β Popular users become hotspots β β
β β β β
β β Composite key: (user_id, post_id) β β
β β β Same user's posts are also distributed β β
β β β β
β β shard = hash(user_id + "_" + post_id) % N β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 3. Salting β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Original key: celebrity_123 β β
β β Salting: celebrity_123_0 β β
β β celebrity_123_1 β β
β β celebrity_123_2 β β
β β β β
β β # Write: Add random salt β β
β β salt = random(0, 10) β β
β β key = f"{user_id}_{salt}" β β
β β β β
β β # Read: Query all salts and combine β β
β β for salt in range(10): β β
β β results += query(f"{user_id}_{salt}") β β
β β β β
β β β Write distribution, increased read complexity β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 4. Hot Entity Special Handling β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β β’ Cache popular entities (Redis) β β
β β β’ Dedicated shard/server β β
β β β’ Async processing (counters, etc.) β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6. Rebalancing¶
6.1 When Rebalancing is Needed¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β When Rebalancing is Needed β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Adding Shards (Expansion) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Before: [Shard 1] [Shard 2] [Shard 3] β β
β β After: [Shard 1] [Shard 2] [Shard 3] [Shard 4] β β
β β β β
β β Move some data to new shard β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Removing Shards (Contraction) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Before: [Shard 1] [Shard 2] [Shard 3] β β
β β After: [Shard 1] [Shard 2] β β
β β β β
β β Move Shard 3 data to other shards β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 3. Resolving Imbalance β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Before: [Shard 1: 80%] [Shard 2: 10%] [Shard 3: 10%] β β
β β After: [Shard 1: 33%] [Shard 2: 33%] [Shard 3: 33%] β β
β β β β
β β Move data from Shard 1 to other shards β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6.2 Rebalancing Strategies¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Rebalancing Strategies β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Consistent Hashing β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Minimal data movement when adding/removing shards β β
β β Learned in previous lesson β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Dual Write β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Phase 1: Add new shard, write to both β β
β β ββββββββ ββββββββ ββββββββ β β
β β βOld S1β βOld S2β βNew S3β β β
β β β R/W β β R/W β β W β β β
β β ββββββββ ββββββββ ββββββββ β β
β β β β
β β Phase 2: Migrate existing data (background) β β
β β β β
β β Phase 3: Read from new shard as well β β
β β ββββββββ ββββββββ ββββββββ β β
β β βShard1β βShard2β βShard3β β β
β β β R/W β β R/W β β R/W β β β
β β ββββββββ ββββββββ ββββββββ β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 3. Version-Based Routing β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β shard_config_v1: β β
β β range 1-1M β shard_1 β β
β β range 1M-2M β shard_2 β β
β β β β
β β shard_config_v2: (after migration) β β
β β range 1-700K β shard_1 β β
β β range 700K-1.4M β shard_2 β β
β β range 1.4M-2M β shard_3 β β
β β β β
β β # New writes use v2, existing reads use v1 β β
β β # Switch to v2 after migration complete β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7. Practice Problems¶
Problem 1: Choosing a Sharding Strategy¶
Choose the appropriate sharding strategy for the following services.
a) Chat app (messages per conversation room) b) Log analysis system c) Global user service d) SaaS multi-tenant
Problem 2: Shard Key Selection¶
Choose the shard key for the following tables in an e-commerce service.
- users (id, email, name)
- orders (id, user_id, status, created_at)
- order_items (id, order_id, product_id, quantity)
- products (id, merchant_id, name, price)
Requirements: - Frequent queries for user orders - Need to query products by seller - Orders and order items queried together
Problem 3: Hotspot Resolution¶
Comments are concentrating on popular posts, overloading that shard. Propose solutions.
Problem 4: Rebalancing Plan¶
You want to expand from 3 shards to 5 shards. Plan a zero-downtime migration.
Answers¶
Problem 1 Answer¶
a) Chat app: Hash(conversation_id)
- Messages in the same conversation room go to the same shard
- Efficient as queries are per conversation room
b) Log analysis: Range(timestamp)
- Time range queries are efficient
- Easy to delete old log shards
c) Global users: Hash(user_id)
- Even distribution
- Efficient per-user data queries
d) SaaS multi-tenant: tenant_id
- Complete isolation per tenant
- No cross-tenant queries needed
Problem 2 Answer¶
users: Hash(user_id)
- Use as primary key
- Even distribution
orders: Hash(user_id)
- user_id is the main query condition
- Efficient user order queries
order_items: Hash(order_id)
- Or Hash(user_id) (same shard as orders)
- Using order_id may put it on different shard than orders
- β Recommend user_id (same shard as orders)
products: Hash(merchant_id)
- Query products by seller
- Seller unit isolation
Cross-shard considerations:
- Order product info β Use cache
- Seller order queries β Separate index table
Problem 3 Answer¶
Solutions:
1. Hot post caching
- Cache comments in Redis
- Batch save to DB periodically
2. Apply salting
post_123_0, post_123_1, ...
- Random salt on write
- Query all salts and combine on read
3. Separate comment counters
- Manage counters in Redis only
- Async DB updates
4. Change shard key
- Use (post_id, comment_id) composite key
- Same post's comments are also distributed
5. Dedicated handling
- Detect popular posts
- Add dedicated cache layer
Problem 4 Answer¶
Zero-downtime migration plan:
Phase 1: Prepare new shards
- Prepare Shard 4, 5 servers
- Create schema
Phase 2: Start dual write
- Update router
- New writes β Old shards + New shards (based on consistent hashing)
Phase 3: Migrate existing data
- Background copy
- Determine targets by consistent hashing (about 40% moves)
Phase 4: Verification
- Compare data counts
- Verify sample data
Phase 5: Switch reads
- Activate new routing rules
- Gradual switch (10% β 50% β 100%)
Phase 6: Stop dual write
- Remove old routing
Phase 7: Cleanup
- Delete migrated data from old shards
8. Next Steps¶
After understanding database scaling, learn about database replication.
Next Lesson¶
Related Lessons¶
- 07_Distributed_Cache_Systems.md - Consistent Hashing
- PostgreSQL/18_Table_Partitioning.md
Recommended Practice¶
- PostgreSQL partitioning hands-on
- Implement a sharding router
- Implement consistent hashing
9. References¶
Books¶
- Designing Data-Intensive Applications - Ch. 6
Databases¶
- Vitess - MySQL Sharding
- Citus - PostgreSQL Sharding
- MongoDB Sharding
Case Studies¶
Document Information - Last Updated: 2024 - Difficulty: βββ - Estimated Learning Time: 2-3 hours