1# Architecture Decision Records (ADR)
2
3Architecture Decision Records (ADRs) capture important architectural decisions made
4during a project. Each ADR documents the context, decision, and consequences so that
5future team members understand *why* the system is built the way it is.
6
7**Format**: Michael Nygard's ADR template (the most widely adopted convention).
8
9---
10
11## ADR Template
12
13```markdown
14# ADR-NNN: [Short noun phrase — what was decided]
15
16**Status**: [Proposed | Accepted | Deprecated | Superseded by ADR-NNN]
17**Date**: YYYY-MM-DD
18**Deciders**: [List of people involved in the decision]
19
20## Context
21
22What is the issue that motivates this decision?
23Describe the forces at play — technical, political, social, project-related.
24This section is value-neutral; state facts, not opinions.
25
26## Decision
27
28What is the change being proposed or that was decided?
29Write in the active voice: "We will use X" rather than "X will be used".
30Include alternatives that were considered and briefly explain why they were rejected.
31
32## Consequences
33
34What becomes easier or harder after this decision?
35List positive consequences, negative consequences, and risks.
36Be honest about trade-offs — an ADR with no negative consequences is suspect.
37```
38
39---
40
41## Guidelines for Writing Effective ADRs
42
431. **Write ADRs at decision time.** Don't reconstruct history months later.
442. **Keep each ADR focused.** One significant decision per document.
453. **Immutable after acceptance.** Never edit a past ADR; supersede it with a new one.
464. **Link related ADRs.** Cross-reference when decisions build on each other.
475. **Short title, clear noun phrase.** "Use PostgreSQL as primary database" not "Database".
486. **Include rejected alternatives.** This is the most valuable part for future readers.
497. **Store in the repository.** `docs/adr/` next to the code it describes.
50
51---
52
53## Example ADR 1
54
55# ADR-001: Use PostgreSQL as Primary Database
56
57**Status**: Accepted
58**Date**: 2025-03-10
59**Deciders**: Alice Kim (Tech Lead), Bob Park (Backend), Carol Lee (DBA)
60
61## Context
62
63The e-commerce platform needs a relational database to store users, products, orders,
64and inventory. We need ACID transactions for order processing, and the team anticipates
65complex queries with joins across multiple tables. Initial data volume is expected to be
66~5 million rows across all tables with modest write throughput (~200 writes/second peak).
67
68We evaluated three options:
69
70| Option | Pros | Cons |
71|----------------|-----------------------------------------|-------------------------------------------|
72| PostgreSQL | Mature, full ACID, JSONB, full-text | Vertical scaling limits at extreme load |
73| MySQL 8 | Wide hosting support, faster simple reads | Weaker JSONB, less expressive SQL |
74| MongoDB | Flexible schema, horizontal scaling | No multi-document ACID (without sessions) |
75
76## Decision
77
78We will use **PostgreSQL 16** as the sole primary database for all structured data.
79
80Specific choices within this decision:
81- Use `JSONB` columns for semi-structured product attributes, avoiding a separate NoSQL store.
82- Use row-level security (RLS) to enforce tenant isolation at the database layer.
83- Use a managed instance (AWS RDS) to offload operational concerns (backups, patching).
84
85MySQL was rejected because its JSONB support is inferior and our team has stronger
86PostgreSQL expertise. MongoDB was rejected because order processing requires
87cross-collection transactions that MongoDB handles less cleanly than PostgreSQL.
88
89## Consequences
90
91**Positive:**
92- Full ACID guarantees simplify order and payment logic significantly.
93- JSONB support eliminates the need for a separate document store for product attributes.
94- RLS provides a robust second line of defence for multi-tenant isolation.
95- Extensive tooling ecosystem (pgAdmin, Alembic, SQLAlchemy, pg_stat_statements).
96
97**Negative / Risks:**
98- PostgreSQL does not scale writes horizontally without significant architectural changes
99 (e.g., Citus, read replicas). If write throughput grows beyond ~5,000 writes/second,
100 this decision must be revisited (see ADR-007: Sharding Strategy).
101- RDS costs more than a self-hosted instance; budget must account for this.
102- Team must maintain migration discipline (Alembic) to prevent schema drift.
103
104---
105
106## Example ADR 2
107
108# ADR-012: Adopt Microservices Architecture for Payment Service
109
110**Status**: Accepted
111**Date**: 2025-07-22
112**Deciders**: Alice Kim (Tech Lead), David Oh (Platform), Eve Choi (Security)
113**Supersedes**: ADR-003 (Monolithic architecture for all services)
114
115## Context
116
117The payment service currently lives inside the main Django monolith (see ADR-003).
118Three pain points have emerged that the monolith cannot address cleanly:
119
1201. **Deployment coupling**: A bug fix in the product catalogue requires redeploying
121 the entire application, including the payment module. Any deployment carries risk
122 for payment processing, which must have 99.95% uptime.
1232. **Compliance isolation**: PCI-DSS compliance requires limiting the cardholder data
124 environment (CDE) to as few systems as possible. Keeping payment logic in the
125 monolith means the entire application is in scope, which triples audit complexity.
1263. **Independent scaling**: Checkout traffic spikes (flash sales) require extra payment
127 processing capacity without scaling unrelated services.
128
129Alternatives considered:
130
131| Option | Assessment |
132|---------------------------------|-------------------------------------------------------------|
133| Extract payment as a library | Solves none of the three pain points |
134| Strangler Fig to full microservices | Too broad; creates risk across the whole platform |
135| Extract *only* payment service | Targeted, addresses all three pain points, manageable scope |
136
137## Decision
138
139We will extract the payment service into a **standalone microservice** using the
140Strangler Fig pattern, while keeping the rest of the platform as a monolith.
141
142Implementation decisions:
143- **Language/Framework**: Python 3.12 + FastAPI (async, matches team skills).
144- **Communication**: Synchronous REST for checkout flow; async events via RabbitMQ
145 for post-payment notifications (receipt emails, inventory deduction).
146- **Data**: Dedicated PostgreSQL schema, not shared with the monolith (database-per-service).
147- **Auth**: Service-to-service calls authenticated with short-lived JWTs signed by
148 an internal CA (not user-facing OAuth tokens).
149- **Deployment**: Separate Kubernetes Deployment with HPA; isolated network policy
150 limits ingress to the API gateway and the monolith only.
151
152## Consequences
153
154**Positive:**
155- Payment service can be deployed independently — zero downtime for product catalogue releases.
156- PCI-DSS CDE scope reduced from the whole monolith to one small service and its database.
157- Can scale payment pods independently during flash sales without scaling catalogue services.
158- Failure in the monolith (e.g., OOM) does not crash in-flight payment transactions.
159
160**Negative / Risks:**
161- **Distributed systems complexity**: Network failures between the monolith and payment
162 service must be handled explicitly (retries, idempotency keys, circuit breakers).
163 The monolith today handles this with a local function call.
164- **Operational overhead**: Additional Kubernetes Deployment, service, and network policy
165 to maintain. Observability must be extended (distributed tracing with OpenTelemetry).
166- **Data consistency**: Without a shared database, refunds and order status updates require
167 event-driven coordination. The team must implement the Saga pattern for the checkout flow.
168- **Scope creep risk**: Extracting one service is justified. Extracting everything into
169 microservices is not — this ADR explicitly does not authorise a full decomposition.
170 Future extractions require separate ADRs.
171
172**Follow-up actions:**
173- ADR-013: Event schema for payment events on RabbitMQ
174- ADR-014: Saga pattern implementation for checkout/refund flow
175- Runbook: Payment service on-call playbook (escalation, rollback procedure)