System Design Overview
System Design Overview¶
Overview¶
This document covers the basic concepts of System Design and interview approaches. You'll learn the foundational framework for designing large-scale systems and back-of-the-envelope calculation methods.
Difficulty: β Estimated Study Time: 2 hours Prerequisites: Programming basics, basic web service concepts
Table of Contents¶
- What is System Design?
- Interview Evaluation Criteria
- Problem Approach Framework
- Back-of-the-envelope Calculations
- Commonly Used Numbers
- Practice Problems
- Next Steps
- References
1. What is System Design?¶
1.1 Definition¶
System design is the process of defining the architecture of complex software systems. The goal is to analyze requirements, design components, and create systems with scalability and reliability.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β What is System Design? β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Requirements β Architecture β Implementation β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β βFunctionalβ βComponent β βCode β β
β βReq. β βββΆ βDesign β βββΆ βImpl. β β
β β β β β β β β
β βNon-Func. β βData β βTesting β β
β βReq. β βFlow β β β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β
β System Design = Requirements β Architecture Decisions β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1.2 Why System Design is Important¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Importance of System Design β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Scalability β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β System must handle growth from 100 to β β
β β 1,000,000 users β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Reliability β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Service must continue even with server failures β β
β β No data loss β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 3. Maintainability β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Easy to add new features β β
β β Bug fixes shouldn't affect other parts β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 4. Performance β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Fast response times β β
β β Sufficient throughput β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1.3 System Design vs Coding Interview¶
| Category | Coding Interview | System Design Interview |
|---|---|---|
| Purpose | Evaluate algorithm skills | Evaluate architecture design skills |
| Answer | Clear correct answer exists | Multiple answers possible (trade-offs) |
| Format | Code writing | Whiteboard/diagrams |
| Evaluation | Correctness, efficiency | Thought process, communication |
| Level | Junior~Senior | Mainly Senior+ |
2. Interview Evaluation Criteria¶
2.1 Key Evaluation Areas¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Interview Evaluation Criteria β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. Problem Scoping β β
β β β’ Do you clarify requirements? β β
β β β’ Do you ask appropriate questions? β β
β β β’ Do you state assumptions? β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 2. High-Level Design β β
β β β’ Do you identify main components? β β
β β β’ Is data flow clear? β β
β β β’ Do you design APIs appropriately? β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 3. Deep Dive β β
β β β’ Do you thoroughly cover core components? β β
β β β’ Is data model appropriate? β β
β β β’ Do you identify potential bottlenecks? β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 4. Trade-offs β β
β β β’ Do you know pros/cons of options? β β
β β β’ Do you explain why you chose specific tech? β β
β β β’ Do you consider constraints? β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2.2 Expectations by Level¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Expectations by Experience Level β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Junior (0-2 years) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Understand basic components (web server, DB, cache) β β
β β β’ Explain data flow of simple systems β β
β β β’ Basic API design β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Mid-level (2-5 years) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Design scalable architecture β β
β β β’ Apply caching, load balancing β β
β β β’ Database selection and schema design β β
β β β’ Basic trade-offs discussion β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Senior (5+ years) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Design large-scale distributed systems β β
β β β’ Analyze complex trade-offs β β
β β β’ Consider failure recovery, security β β
β β β’ Cost optimization β β
β β β’ Microservices architecture β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. Problem Approach Framework¶
3.1 4-Step Approach¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β System Design 4-Step Framework β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STEP 1: Clarify Requirements (5 min) β β
β β β β
β β "Design a Twitter-like service" β β
β β β β
β β Questions to ask: β β
β β β’ Core features? (tweets, timeline, follow?) β β
β β β’ User scale? (DAU 1M? 100M?) β β
β β β’ Read/write ratio? (typically 100:1) β β
β β β’ Media support? (images, videos) β β
β β β’ Real-time notifications needed? β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STEP 2: Scale Estimation (5 min) β β
β β β β
β β β’ Calculate QPS (Queries Per Second) β β
β β β’ Estimate storage capacity β β
β β β’ Calculate bandwidth β β
β β β’ Estimate server count β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STEP 3: High-Level Design (15-20 min) β β
β β β β
β β β’ System architecture diagram β β
β β β’ Main components (client, server, DB, cache, etc.) β β
β β β’ Data flow β β
β β β’ API design β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STEP 4: Detailed Design (15-20 min) β β
β β β β
β β β’ Database schema β β
β β β’ Core algorithms/data structures β β
β β β’ Scaling strategies (sharding, replication) β β
β β β’ Resolve bottlenecks β β
β β β’ Failure handling β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.2 Example Requirements Clarification Questions¶
Functional Requirements:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β’ What are the 3 core features of this system? β
β β’ What actions can users perform? β
β β’ Which clients to support: mobile/web/API? β
β β’ Is authentication/authorization needed? β
β β’ Is search functionality required? β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Non-Functional Requirements:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β’ Expected user count? (DAU, MAU) β
β β’ Response time requirements? (p99 < 200ms?) β
β β’ Availability requirements? (99.9%? 99.99%?) β
β β’ Data consistency vs availability - which is more important? β
β β’ Global or regional service? β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.3 High-Level Design Example¶
Twitter High-Level Design
βββββββββββ βββββββββββββββ ββββββββββββββββββ
β Mobile β β β β β
β App ββββββΆβ Load ββββββΆβ Web/API β
β β β Balancer β β Servers β
βββββββββββ β β β β
βββββββββββββββ βββββββββ¬βββββββββ
βββββββββββ β β
β Web β β β
β Browser βββββββββββββ β
β β βΌ
βββββββββββ ββββββββββββββββββββββ
β β
ββββββββββββββββ€ Service Layer β
β β β
β ββββββββββββββββββββββ
β β
ββββββββββ΄βββββββββ β
βΌ βΌ βΌ
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β Cache β β Database β β Message β
β (Redis) β β (MySQL) β β Queue β
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β
βΌ
ββββββββββββββ
β Object β
β Storage β
β (S3) β
ββββββββββββββ
4. Back-of-the-envelope Calculations¶
4.1 QPS (Queries Per Second) Calculation¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QPS Calculation Method β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Example: Twitter Tweet Read QPS β
β β
β Given: β
β β’ DAU (Daily Active Users): 300 million β
β β’ Average daily tweet views per user: 100 β
β β
β Calculation: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Daily total views = 300,000,000 Γ 100 = 30,000,000,000 β β
β β β β
β β Average QPS = 30,000,000,000 / 86,400 β 350,000 QPS β β
β β β β
β β Peak QPS = Average QPS Γ 2~3 β 700,000 ~ 1,000,000 QPS β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Note: 86,400 = 24 hours Γ 60 minutes Γ 60 seconds β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4.2 Storage Capacity Calculation¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Storage Capacity Calculation Method β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Example: Twitter 5-year storage capacity β
β β
β Given: β
β β’ DAU: 300 million β
β β’ Daily avg tweets: 2 (10% of users post) β
β β’ Average tweet size: 250 bytes (text only) β
β β’ Image ratio: 20%, avg image size: 500KB β
β β
β Calculation: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Daily tweets = 300M Γ 10% Γ 2 = 60M β β
β β β β
β β Text storage: β β
β β Daily = 60M Γ 250B = 15GB β β
β β Yearly = 15GB Γ 365 = 5.5TB β β
β β 5 years = 5.5TB Γ 5 = 27.5TB β β
β β β β
β β Image storage: β β
β β Daily = 60M Γ 20% Γ 500KB = 6TB β β
β β Yearly = 6TB Γ 365 = 2.2PB β β
β β 5 years = 2.2PB Γ 5 = 11PB β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4.3 Bandwidth Calculation¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Bandwidth Calculation Method β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Example: Video streaming service β
β β
β Given: β
β β’ Concurrent viewers: 1 million β
β β’ Average bitrate: 5 Mbps (1080p standard) β
β β
β Calculation: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Total required bandwidth = 1,000,000 Γ 5 Mbps = 5,000,000 β β
β β = 5,000 Gbps = 5 Tbps β β
β β β β
β β Daily data transfer (assume avg 2hr viewing): β β
β β = 100M viewers Γ 5Mbps Γ 7200sec β β
β β = 3.6 Γ 10^15 bits = 450 TB/day β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5. Commonly Used Numbers¶
5.1 Powers of 2¶
| Power | Approximate | Name | Bytes |
|---|---|---|---|
| 2^10 | 1,000 | 1 Thousand | 1 KB |
| 2^20 | 1,000,000 | 1 Million | 1 MB |
| 2^30 | 1,000,000,000 | 1 Billion | 1 GB |
| 2^40 | 1,000,000,000,000 | 1 Trillion | 1 TB |
| 2^50 | - | 1 Quadrillion | 1 PB |
5.2 Time Unit Conversion¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Time Unit Reference Table β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1 day = 86,400 seconds (β 100,000 seconds approx.) β
β 1 week = 604,800 seconds (β 600,000 seconds approx.) β
β 1 month = 2,592,000 seconds (β 2.5M seconds approx.) β
β 1 year = 31,536,000 seconds (β 30M seconds approx.) β
β β
β For quick calculations: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1 day β 10^5 seconds β β
β β 1 year β 3 Γ 10^7 seconds β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5.3 Latency Comparison¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Latency Reference Table β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Operation β Latency β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β L1 cache reference β 0.5 ns β
β L2 cache reference β 7 ns β
β Main memory reference β 100 ns β
β SSD random read β 150 ΞΌs β
β HDD disk seek β 10 ms β
β Same datacenter network RTT β 0.5 ms β
β Different region network RTT β 150 ms β
β β
β Visualization (1 ns = 1 second): β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β L1 cache: 0.5 seconds β β
β β Main memory: 100 seconds (1 min 40 sec) β β
β β SSD: 150,000 seconds (about 2 days) β β
β β HDD: 10,000,000 seconds (about 4 months) β β
β β Network (same DC): 500,000 seconds (about 6 days) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5.4 Availability Numbers (9's)¶
| Availability | Annual Downtime | Monthly Downtime |
|---|---|---|
| 99% (two 9s) | 3.65 days | 7.3 hours |
| 99.9% (three 9s) | 8.77 hours | 43.8 minutes |
| 99.99% (four 9s) | 52.6 minutes | 4.38 minutes |
| 99.999% (five 9s) | 5.26 minutes | 26 seconds |
5.5 Typical Service Throughput¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Service Throughput Reference β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Web Server (Nginx) β 10,000 ~ 100,000 req/s β
β Database (MySQL) β 10,000 ~ 50,000 QPS β
β Cache (Redis) β 100,000 ~ 500,000 ops/s β
β Message Queue (Kafka) β 1,000,000 msg/s β
β β
β Single server estimates: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Web server: ~1,000 concurrent connections β β
β β β’ Database: ~10,000 QPS β β
β β β’ Cache: ~100,000 ops/s β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6. Practice Problems¶
Problem 1: QPS Calculation¶
Calculate image upload QPS for an Instagram-like service.
Conditions: - DAU: 500 million - Daily image upload rate: 10% of users upload average 2 images
Problem 2: Storage Estimation¶
Estimate 1-year message storage for a chat app.
Conditions: - DAU: 100 million - Average daily messages: 50/user - Average message size: 100 bytes
Problem 3: Server Count Estimation¶
Estimate web server count needed to handle 100,000 QPS.
Conditions: - Single server throughput: 1,000 QPS - Need 20% overhead for availability
Problem 4: Requirements Clarification¶
Given "Design a URL shortening service", write 5 questions to ask the interviewer.
Problem 5: System Design Practice¶
Draw high-level architecture for a simple file sharing service.
Answers¶
Problem 1 Answer¶
Images uploaded/day = 500M Γ 10% Γ 2 = 100M
Average QPS = 100M / 86,400 β 1,160 QPS
Peak QPS = 1,160 Γ 3 β 3,500 QPS
Problem 2 Answer¶
Daily messages = 100M Γ 50 = 5B
Daily storage = 5B Γ 100B = 500GB
Annual storage = 500GB Γ 365 β 180TB
Problem 3 Answer¶
Base servers needed = 100,000 / 1,000 = 100
With overhead = 100 Γ 1.2 = 120
Consider HA (redundancy) = 120 Γ 2 = 240
Problem 4 Answer¶
- What are expected DAU and MAU?
- Expected length/format of shortened URLs?
- Do we need URL expiration feature?
- Do we need custom short URL support?
- Do we need analytics (click count stats)?
Problem 5 Answer¶
βββββββββββ βββββββββββββββ ββββββββββββββββ
β Client ββββββΆβ Load ββββββΆβ Web Server β
βββββββββββ β Balancer β ββββββββ¬ββββββββ
βββββββββββββββ β
ββ΄ββββββββββββββ
β β
βΌ βΌ
ββββββββββββ ββββββββββββ
β Metadata β β Object β
β DB β β Storage β
ββββββββββββ ββββββββββββ
7. Next Steps¶
Now that you understand system design basics, learn about scalability concepts.
Next Lesson¶
- 02_Scalability_Basics.md - Horizontal/vertical scaling, CAP theorem
Related Lessons¶
- 03_Network_Fundamentals_Review.md - DNS, CDN, HTTP
- 04_Load_Balancing.md - Traffic distribution
Recommended Practice¶
- Estimate scale of frequently used services
- Draw system architectures on whiteboard
- Practice explaining design process out loud
8. References¶
Books¶
- System Design Interview - Alex Xu
- Designing Data-Intensive Applications - Martin Kleppmann
Online Resources¶
Practice Sites¶
- Pramp - Mock interviews
- Interviewing.io
Document Information - Last Updated: 2024 - Difficulty: β - Estimated Study Time: 2 hours