Microservices Patterns
Microservices Patterns¶
Difficulty: ββββ
Overview¶
Successfully operating a microservices architecture requires various patterns and tools. In this chapter, we'll learn about Service Discovery, Circuit Breaker, Bulkhead Pattern, Service Mesh, and Distributed Tracing.
Table of Contents¶
- Service Discovery
- Circuit Breaker
- Bulkhead Pattern
- Service Mesh
- Distributed Tracing
- Other Important Patterns
- Practice Problems
1. Service Discovery¶
Why Is Service Discovery Needed?¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β The Need for Service Discovery β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Traditional Approach (Hardcoding): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Order Service β β
β β config: β β
β β user_service: http://10.0.1.5:8080 β β
β β product_service: http://10.0.1.10:8080 β β
β β payment_service: http://10.0.1.15:8080 β β
β β β β
β β Problems: β β
β β - Configuration changes required when IP changes β β
β β - Manual updates needed when scaling out β β
β β - Cannot automatically exclude failed instances β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Problems in Dynamic Environments: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Time 0: User Service @ 10.0.1.5 β β
β β Time 1: Scale out β 10.0.1.5, 10.0.1.6, 10.0.1.7 β β
β β Time 2: 10.0.1.5 failure β 10.0.1.6, 10.0.1.7 β β
β β Time 3: New deployment β 10.0.1.8, 10.0.1.9 β β
β β β β
β β β IPs keep changing! How do we track them? β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Client-Side Discovery¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client-Side Discovery β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β ββββββββββββββββββββββββββββββββββ β β
β β β Service Registry β β β
β β β (Eureka, Consul, etcd) β β β
β β β β β β
β β β user-service: β β β
β β β - 10.0.1.5:8080 β β β
β β β - 10.0.1.6:8080 β β β
β β β - 10.0.1.7:8080 β β β
β β β β β β
β β βββββββββββββββ¬βββββββββββββββββββ β β
β β β β β
β β ββββββ1.Queryβββββββββ β β
β β β 2.Return instances β β
β β βΌ β β
β β βββββββββββββββββ 3.Direct call βββββββββββββββββ β β
β β β Order Service βββββββββββββββββββββββββββ User Service β β β
β β β (Client) β (Load Balancing) β (10.0.1.5) β β β
β β β + LB Logic βββββββββββββββββββββββββββ (10.0.1.6) β β β
β β βββββββββββββββββ β (10.0.1.7) β β β
β β βββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β How It Works: β
β 1. Client queries the Registry for service instances β
β 2. Client performs load balancing (Round Robin, Random, etc.) β
β 3. Direct instance call β
β β
β Example: Netflix Eureka + Ribbon β
β β
β Pros: Cons: β
β - Simple infrastructure - Complex clientβ
β - Registry load distributed - Per-language β
β implementationβ
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Server-Side Discovery¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Server-Side Discovery β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β ββββββββββββββββββββββββββββββββββ β β
β β β Service Registry β β β
β β βββββββββββββββ¬βββββββββββββββββββ β β
β β β β β
β β β Instance info sync β β
β β βΌ β β
β β βββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββ β β
β β β Order Service βββΊβ Load Balancer ββββΊβ User Service β β β
β β β (Client) β β / API Gateway β β (10.0.1.5) β β β
β β β Simple call β β β β (10.0.1.6) β β β
β β βββββββββββββββββ β - Routing β β (10.0.1.7) β β β
β β β - Load balancing β βββββββββββββββββ β β
β β GET /user-service β - Health check β β β
β β /users/123 βββββββββββββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β How It Works: β
β 1. Client requests to LB/Gateway (using service name) β
β 2. LB queries instances from Registry β
β 3. LB calls instance after load balancing β
β β
β Examples: AWS ELB + Route 53, Kubernetes Service, Nginx + Consul β
β β
β Pros: Cons: β
β - Simplified client - LB is SPOF β
β - Language independent - Additional hop (latency)β
β - Centralized management - LB operational cost β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Service Registration¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Service Registration Patterns β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Self-Registration: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β User Service Service Registry β β
β β β β β β
β β Startββββββββ Register(ip, port) ββββββββββΊβ β β
β β β β β β
β β Periodicββββββββ Heartbeat ββββββββββββββββββΊβ β β
β β β β β β
β β Stop ββββββββ Deregister ββββββββββββββββββΊβ β β
β β β β β β
β β Examples: Eureka Client, Consul Agent β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Third-Party Registration (Registrar): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β User Service Registrar Service Registry β β
β β β β β β β
β β Startβ β β β β
β β ββββ Detect βββββββ β β β
β β β βββ Register βββββββββββΊβ β β
β β β β β β β
β β Stop ββββ Detect βββββββ β β β
β β β βββ Deregister βββββββββΊβ β β
β β β β
β β Examples: Netflix Prana, Kubernetes, Docker Swarm β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Major Tools Comparison¶
| Tool | Type | Features |
|---|---|---|
| Consul | CP | Health check, KV store, DNS interface |
| Eureka | AP | Netflix OSS, Spring Cloud integration |
| etcd | CP | Raft consensus, Kubernetes-based |
| ZooKeeper | CP | Distributed coordination, complex API |
| Kubernetes | - | Built-in Service + DNS, cloud native |
2. Circuit Breaker¶
Circuit Breaker Pattern¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Circuit Breaker Pattern β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Problem: Cascading Failure β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Order Service β Payment Service β Bank API β β
β β β β β (failure) β β
β β β β β β
β β β Waiting for timeout... β β
β β β Thread exhaustion β β
β β β β β β
β β β β β
β β Waiting for timeout... β β
β β Thread exhaustion β β
β β β β β
β β β β
β β β One failure propagates to the entire system! β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Solution: Circuit Breaker (acts like an electrical circuit breaker) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Order Service β [CB] β Payment Service β [CB] β Bank API β β
β β βββββββββββ β (failure) β β
β β β Circuit β β β
β β β OPEN β βββΊ Immediate failure β β
β β βββββββββββ β β
β β β β
β β β Fast failure protects resources β β
β β β Fault isolation β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
State Transitions¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Circuit Breaker State Transitions β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Failure rate threshold exceeded β
β ββββββββββββββββββββββββββ β
β β β β
β βΌ β β
β ββββββββββββββββββββββββ ββββββββ΄ββββββββββββββββ β
β β β β β β
β β CLOSED βββββββββββββΊβ OPEN β β
β β β β β β
β β - Normal operation β β - Requests blocked β β
β β - All requests pass β β - Immediate failure β β
β β - Failure rate β β - Fallback executed β β
β β monitoring β β β β
β ββββββββββββββββββββββββ ββββββββββββ¬ββββββββββββ β
β β² β β
β β β After timeout β
β β βΌ β
β β ββββββββββββββββββββββββ β
β β β β β
β ββββββββββββββββββββββββββββ HALF-OPEN β β
β Success β β β
β β - Limited requests β β
β Failure β allowed β β
β βββββββββββββββββββ - Testing state β β
β β ββββββββββββββββββββββββ β
β β β β
β ββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Timeline Example: β
β β
β CLOSED βββββββββββββββββββββββββββββΊ OPEN βββββββΊ HALF-OPEN β
β β β β β
β β β β β β β β β β β β β β
β β Failure rate > 50% β β Success! β
β β β Wait 10s β β
β β β βΌ β
β β β CLOSED ββββββββΊ β
β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Configuration Parameters¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Circuit Breaker Configuration β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β CircuitBreaker: β
β failureRateThreshold: 50 # Failure rate threshold (%) β
β slowCallRateThreshold: 100 # Slow call ratio threshold (%) β
β slowCallDurationThreshold: 2s # Slow call duration threshold β
β minimumNumberOfCalls: 10 # Minimum calls (for statistics) β
β slidingWindowSize: 100 # Sliding window size β
β slidingWindowType: COUNT_BASED # COUNT or TIME_BASED β
β waitDurationInOpenState: 10s # OPEN state duration β
β permittedNumberOfCallsInHalfOpen: 3 # Calls allowed in HALF-OPEN β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Sliding Window: β
β β
β COUNT_BASED (based on last N requests): β
β βββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ β
β β β β β β β β β β β β β β β β β β β β β β β Failure rate 60% β
β βββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ β
β βββββββββββββββ Last 10 calls ββββββββββββββΊ β
β β
β TIME_BASED (based on last N seconds): β
β βββββββββββββββββββββββββββββββββββββββββ β
β β β β β β β β β β β β β β Failure rate 60% β
β βββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββ Last 10 seconds βββββββββββββΊ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Fallback Strategies¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Fallback Strategies β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Return Default Value β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β def get_user_profile(user_id): β β
β β try: β β
β β return user_service.get(user_id) β β
β β except CircuitOpenException: β β
β β return {"name": "Guest", "avatar": "default.png"} β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Return Cached Data β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β def get_product_price(product_id): β β
β β try: β β
β β price = price_service.get(product_id) β β
β β cache.set(product_id, price) β β
β β return price β β
β β except CircuitOpenException: β β
β β return cache.get(product_id) # Last cached value β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 3. Call Alternative Service β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β def send_notification(user_id, message): β β
β β try: β β
β β return push_service.send(user_id, message) β β
β β except CircuitOpenException: β β
β β return email_service.send(user_id, message) # Fallbackβ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 4. Queue for Later Processing β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β def process_order(order): β β
β β try: β β
β β return order_service.process(order) β β
β β except CircuitOpenException: β β
β β retry_queue.enqueue(order) # Retry later β β
β β return {"status": "pending"} β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation Tools¶
| Tool | Language | Features |
|---|---|---|
| Resilience4j | Java | Lightweight, functional, modular |
| Hystrix | Java | Netflix OSS (maintenance mode) |
| Polly | .NET | Rich features, policy composition |
| go-kit | Go | Middleware-based |
| opossum | Node.js | Promise support |
3. Bulkhead Pattern¶
Bulkhead Concept¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Bulkhead Pattern β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Ship Bulkheads: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β βββββββ¬ββββββ¬ββββββ¬ββββββ¬ββββββ β β
β β β β βFloodβ β β β Separated by bulkheads β β
β β β A β B β C β D β E β β β
β β β β β~~~~~β β β β Even if C compartment β β
β β βββββββ΄ββββββ΄ββββββ΄ββββββ΄ββββββ floods, others are safe! β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Software Application: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Without Bulkheads (Shared Resources): β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Thread Pool (10 threads) β β β
β β β [Payment][Payment][Payment][Payment][Payment]... β β β
β β β All threads occupied by slow Payment calls β β β
β β β β Order, User service calls also blocked! β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β With Bulkheads (Isolated Resources): β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β Payment Pool β β Order Pool β β User Pool β β β
β β β (5 threads) β β (3 threads) β β (2 threads) β β β
β β β [P][P][P] β β [O][O] β β [U] β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β Even if Payment is slow, Order and User work normally! β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Bulkhead Types¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Bulkhead Implementation Types β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Thread Pool Bulkhead β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Service A Thread Pool β β β
β β β maxThreads: 10 β β β
β β β queueCapacity: 100 β β β
β β β βββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ β β β
β β β β 1 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β 9 β10 β β β β
β β β βββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β Pros: Complete isolation, easy timeout control β β
β β Cons: Overhead (context switching) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Semaphore Bulkhead β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Service A Semaphore β β β
β β β permits: 10 (currently used: 7, waiting: 0) β β β
β β β β β β
β β β Request β acquire() β Call β release() β β β
β β β β β β
β β β When permits exceeded β immediate reject or wait β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β Pros: Lightweight, executes on caller thread β β
β β Cons: Difficult timeout control β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 3. Process-Level Isolation (Containers) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Container β β Container β β Container β β β
β β β Payment β β Order β β User β β β
β β β API calls β β API calls β β API calls β β β
β β β CPU: 2 β β CPU: 1 β β CPU: 1 β β β
β β β Mem: 4GB β β Mem: 2GB β β Mem: 2GB β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β β
β β Strongest isolation, Kubernetes Pod resource limits β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4. Service Mesh¶
Service Mesh Concept¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Service Mesh β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Problem: Each service needs to implement cross-cutting concerns β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Implementing directly in each service: β β
β β - Service discovery β β
β β - Load balancing β β
β β - Circuit breaker β β
β β - Retry/timeout β β
β β - TLS/authentication β β
β β - Metrics/tracing β β
β β β β
β β β Different implementations per language/framework β β
β β β Difficult to maintain consistency β β
β β β Increased developer burden β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Solution: Separate into infrastructure layer (Service Mesh) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Network proxies handle all cross-cutting concerns β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Control Plane β β β
β β β (configuration, policies, certificate management) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β² β² β² β β
β β β β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Service β β Service β β Service β β β
β β β βββββββ β β βββββββ β β βββββββ β β β
β β β βProxyβββββΌβββΌβββΊβProxyβββββΌβββΌβββΊβProxyβ β β β
β β β βββββββ β β βββββββ β β βββββββ β β β
β β β β² β β β² β β β² β β β
β β β β β β β β β β β β β
β β β βββ΄ββ β β βββ΄ββ β β βββ΄ββ β β β
β β β βAppβ β β βAppβ β β βAppβ β β β
β β β βββββ β β βββββ β β βββββ β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β β
β β App focuses only on business logic! β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Sidecar Pattern¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Sidecar Pattern β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Pod (Kubernetes) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β β β
β β β βββββββββββββββββββββββ βββββββββββββββββββββββ β β β
β β β β Application β β Sidecar β β β β
β β β β Container β β Proxy β β β β
β β β β β β (Envoy) β β β β
β β β β βββββββββββββββββ β β βββββββββββββββββ β β β β
β β β β β App Process ββββΌββββΌββΊβ Proxy Process ββββΌβββ External β β
β β β β β β β β β β β trafficβ β β
β β β β β Port 8080 β β β β Port 15001 β β β β β
β β β β βββββββββββββββββ β β βββββββββββββββββ β β β β
β β β β β β β β β β
β β β βββββββββββββββββββββββ βββββββββββββββββββββββ β β β
β β β β β β
β β β Shared: Network Namespace, Volume β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Functions handled by sidecar: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Inbound Traffic: Outbound Traffic: β β
β β - TLS termination - Service discovery β β
β β - Authentication/authorization - Load balancing β β
β β - Rate limiting - Circuit breaker β β
β β - Metrics collection - Retry/timeout β β
β β - mTLS β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Istio Architecture¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Istio Architecture β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Control Plane (istiod) β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Pilot β β Citadel β β Galley β β β
β β β (Config β β(Certificatesβ β (Config β β β
β β β delivery) β β )β β validation) β β β
β β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β β
β β ββββββββββββββββββΌβββββββββββββββββ β β
β β β β β
β β β xDS API (config push) β β
β β βΌ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Data Plane β β
β β β β
β β βββββββββββββββββββ βββββββββββββββββββ β β
β β β Service A β β Service B β β β
β β β βββββββββββββ β β βββββββββββββ β β β
β β β β Envoy ββββΌββββββββββββΌββΊβ Envoy β β β β
β β β β Proxy β β mTLS β β Proxy β β β β
β β β βββββββ¬ββββββ β β βββββββ¬ββββββ β β β
β β β β β β β β β β
β β β βββββββ΄ββββββ β β βββββββ΄ββββββ β β β
β β β β App β β β β App β β β β
β β β βββββββββββββ β β βββββββββββββ β β β
β β βββββββββββββββββββ βββββββββββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Key Features: β
β - Traffic Management: routing, canary deployment, A/B testing β
β - Security: mTLS, RBAC β
β - Observability: metrics, logs, distributed tracing β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Service Mesh Tools Comparison¶
| Tool | Proxy | Features |
|---|---|---|
| Istio | Envoy | Feature-rich, high complexity |
| Linkerd | linkerd2-proxy | Lightweight, Rust-based, simple |
| Consul Connect | Envoy/built-in | HashiCorp ecosystem integration |
| AWS App Mesh | Envoy | AWS services integration |
5. Distributed Tracing¶
The Need for Distributed Tracing¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Distributed Tracing β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Problem: Tracking requests in microservices β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Client β API GW β Order β Inventory β Payment β Shipping β β
β β β β β
β β User Service β β
β β β β
β β "The order API is slow, where is the delay?" β β
β β "An error occurred, which service started it?" β β
β β β β
β β Looking at logs: β β
β β - Each service's logs are distributed β β
β β - Can't tell which logs belong to the same request β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Solution: Track entire request with Trace ID β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Trace ID: abc123 β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β API GW [ββββββββββββββββββββββββββββββββββββββββββ] β β β
β β β Order β [βββββββββββββββββββββββββββββββββ] β β β
β β β Inventory β [ββββββββββββββββ] β β β
β β β Payment β [ββββββββββ] β β β
β β β User β [ββββ] β β β
β β β 0ms 100ms 200ms 300ms 400ms 500ms β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β β See entire flow and bottlenecks at a glance! β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Trace, Span, Context¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tracing Components β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Trace: The entire journey of a request β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Trace ID: abc-123-def-456 β β
β β β β
β β Span: A single unit of work β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Span A (Root Span) β β β
β β β service: api-gateway β β β
β β β operation: handle_request β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β Span B (Child of A) β β β β
β β β β service: order-service β β β β
β β β β operation: create_order β β β β
β β β β ββββββββββββββββββββββββββ ββββββββββββββββββββββ β β β β
β β β β β Span C (Child of B) β β Span D (Child of B)β β β β β
β β β β β service: inventory β β service: user β β β β β
β β β β β operation: reserve β β operation: get β β β β β
β β β β ββββββββββββββββββββββββββ ββββββββββββββββββββββ β β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Span Structure: β
β { β
β "traceId": "abc-123-def-456", β
β "spanId": "span-789", β
β "parentSpanId": "span-456", β
β "operationName": "create_order", β
β "serviceName": "order-service", β
β "startTime": "2024-01-15T10:30:00.000Z", β
β "duration": 150, // ms β
β "tags": { "http.status": 200, "user.id": "123" }, β
β "logs": [ { "event": "order_created", "orderId": "ord-999" } ] β
β } β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Context Propagation¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Context Propagation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Propagation via HTTP Headers: β
β β
β Order Service Inventory Service β
β β β β
β β POST /inventory/reserve β β
β β Headers: β β
β β X-B3-TraceId: abc123 β β
β β X-B3-SpanId: span456 β β
β β X-B3-ParentSpanId: span123 β β
β β X-B3-Sampled: 1 β β
β βββββββββββββββββββββββββββββββββββββββββΊβ β
β β β β
β β β Create new Span: β
β β β spanId: span789 β
β β β parentSpanId: span456 β
β β β traceId: abc123 β
β β β β
β β
β Standards: β
β - B3 Propagation (Zipkin) β
β - W3C Trace Context (standard) β
β - Jaeger Propagation β
β β
β W3C Trace Context: β
β traceparent: 00-abc123def456-span789-01 β
β tracestate: vendor=custom_value β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Tracing Tools¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Distributed Tracing Tools β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Jaeger: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β - CNCF project β β
β β - Developed by Uber β β
β β - Cassandra, Elasticsearch backends β β
β β - Powerful UI β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Zipkin: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β - Developed by Twitter β β
β β - Various storage support β β
β β - Lightweight, easy installation β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β OpenTelemetry (OTel): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β - Standardized observability framework β β
β β - Traces + Metrics + Logs unified β β
β β - Vendor neutral β β
β β - Export to Jaeger, Zipkin, etc. β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Commercial: β
β - Datadog APM β
β - New Relic β
β - AWS X-Ray β
β - Google Cloud Trace β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6. Other Important Patterns¶
API Gateway¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Gateway Pattern β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Clients API Gateway β β
β β ββββββββ βββββββββββββββββββ β β
β β β Web ββββββββββββββββββββΊβ β β β
β β ββββββββ β - Routing β βββββββββββ β β
β β ββββββββ β - Auth/authz βββββΊβ User Svcβ β β
β β βMobileββββββββββββββββββββΊβ - Rate limit β βββββββββββ β β
β β ββββββββ β - Caching β βββββββββββ β β
β β ββββββββ β - Request βββββΊβOrder Svcβ β β
β β β IoT ββββββββββββββββββββΊβ transform β βββββββββββ β β
β β ββββββββ β - Logging/ β βββββββββββ β β
β β β metrics βββββΊβProd Svc β β β
β β β - SSL termin. β βββββββββββ β β
β β βββββββββββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Tools: Kong, AWS API Gateway, Nginx, Envoy, Spring Cloud Gateway β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Retry Pattern¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Retry Strategy β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Exponential Backoff with Jitter: β
β β
β Attempt 1: ββX (failure) β
β Wait: 100ms + random(0-50ms) β
β Attempt 2: ββββX (failure) β
β Wait: 200ms + random(0-100ms) β
β Attempt 3: ββββββX (failure) β
β Wait: 400ms + random(0-200ms) β
β Attempt 4: βββββββββ (success) β
β β
β config: β
β maxRetries: 5 β
β initialDelay: 100ms β
β maxDelay: 10s β
β multiplier: 2 β
β jitter: 0.5 # 50% random β
β retryableExceptions: β
β - ConnectionException β
β - TimeoutException β
β # Don't retry 4xx errors! β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Health Check Pattern¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Health Check β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Liveness Probe (Is it alive?): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β GET /health/live β β
β β β β
β β 200 OK β Process is healthy β β
β β 5xx β Process needs restart β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Readiness Probe (Can it handle requests?): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β GET /health/ready β β
β β β β
β β Check items: β β
β β - DB connection β β
β β - Cache connection β β
β β - Dependent services β β
β β - Initialization complete β β
β β β β
β β 200 OK β Route traffic β β
β β 503 β Exclude from traffic (no restart) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7. Practice Problems¶
Exercise 1: Circuit Breaker Design¶
Design a circuit breaker for a payment service: - Set appropriate thresholds - Define fallback strategies - Write state transition scenarios
Exercise 2: Service Mesh Selection¶
Choose an appropriate service mesh for the following requirements and explain your reasoning: - 10 microservices - Kubernetes environment - mTLS required - Canary deployment needed - Team has intermediate Kubernetes experience
Exercise 3: Distributed Tracing Implementation¶
Design distributed tracing for an order processing system: - Define key spans to trace - Define important tags/metadata - Establish sampling strategy
Next Steps¶
In 15_Distributed_Systems_Concepts.md, let's learn about fundamental concepts of distributed systems, time, and leader election algorithms!
References¶
- "Release It!" - Michael Nygard
- "Building Microservices" - Sam Newman
- Istio Documentation
- Envoy Proxy Documentation
- OpenTelemetry Documentation
- Netflix Tech Blog: Hystrix
- Resilience4j Documentation