14_adr_template.md

  1# Architecture Decision Records (ADR)
  2
  3Architecture Decision Records (ADRs) capture important architectural decisions made
  4during a project. Each ADR documents the context, decision, and consequences so that
  5future team members understand *why* the system is built the way it is.
  6
  7**Format**: Michael Nygard's ADR template (the most widely adopted convention).
  8
  9---
 10
 11## ADR Template
 12
 13```markdown
 14# ADR-NNN: [Short noun phrase — what was decided]
 15
 16**Status**: [Proposed | Accepted | Deprecated | Superseded by ADR-NNN]
 17**Date**: YYYY-MM-DD
 18**Deciders**: [List of people involved in the decision]
 19
 20## Context
 21
 22What is the issue that motivates this decision?
 23Describe the forces at play — technical, political, social, project-related.
 24This section is value-neutral; state facts, not opinions.
 25
 26## Decision
 27
 28What is the change being proposed or that was decided?
 29Write in the active voice: "We will use X" rather than "X will be used".
 30Include alternatives that were considered and briefly explain why they were rejected.
 31
 32## Consequences
 33
 34What becomes easier or harder after this decision?
 35List positive consequences, negative consequences, and risks.
 36Be honest about trade-offs — an ADR with no negative consequences is suspect.
 37```
 38
 39---
 40
 41## Guidelines for Writing Effective ADRs
 42
 431. **Write ADRs at decision time.** Don't reconstruct history months later.
 442. **Keep each ADR focused.** One significant decision per document.
 453. **Immutable after acceptance.** Never edit a past ADR; supersede it with a new one.
 464. **Link related ADRs.** Cross-reference when decisions build on each other.
 475. **Short title, clear noun phrase.** "Use PostgreSQL as primary database" not "Database".
 486. **Include rejected alternatives.** This is the most valuable part for future readers.
 497. **Store in the repository.** `docs/adr/` next to the code it describes.
 50
 51---
 52
 53## Example ADR 1
 54
 55# ADR-001: Use PostgreSQL as Primary Database
 56
 57**Status**: Accepted
 58**Date**: 2025-03-10
 59**Deciders**: Alice Kim (Tech Lead), Bob Park (Backend), Carol Lee (DBA)
 60
 61## Context
 62
 63The e-commerce platform needs a relational database to store users, products, orders,
 64and inventory. We need ACID transactions for order processing, and the team anticipates
 65complex queries with joins across multiple tables. Initial data volume is expected to be
 66~5 million rows across all tables with modest write throughput (~200 writes/second peak).
 67
 68We evaluated three options:
 69
 70| Option         | Pros                                    | Cons                                      |
 71|----------------|-----------------------------------------|-------------------------------------------|
 72| PostgreSQL     | Mature, full ACID, JSONB, full-text     | Vertical scaling limits at extreme load   |
 73| MySQL 8        | Wide hosting support, faster simple reads | Weaker JSONB, less expressive SQL         |
 74| MongoDB        | Flexible schema, horizontal scaling     | No multi-document ACID (without sessions) |
 75
 76## Decision
 77
 78We will use **PostgreSQL 16** as the sole primary database for all structured data.
 79
 80Specific choices within this decision:
 81- Use `JSONB` columns for semi-structured product attributes, avoiding a separate NoSQL store.
 82- Use row-level security (RLS) to enforce tenant isolation at the database layer.
 83- Use a managed instance (AWS RDS) to offload operational concerns (backups, patching).
 84
 85MySQL was rejected because its JSONB support is inferior and our team has stronger
 86PostgreSQL expertise. MongoDB was rejected because order processing requires
 87cross-collection transactions that MongoDB handles less cleanly than PostgreSQL.
 88
 89## Consequences
 90
 91**Positive:**
 92- Full ACID guarantees simplify order and payment logic significantly.
 93- JSONB support eliminates the need for a separate document store for product attributes.
 94- RLS provides a robust second line of defence for multi-tenant isolation.
 95- Extensive tooling ecosystem (pgAdmin, Alembic, SQLAlchemy, pg_stat_statements).
 96
 97**Negative / Risks:**
 98- PostgreSQL does not scale writes horizontally without significant architectural changes
 99  (e.g., Citus, read replicas). If write throughput grows beyond ~5,000 writes/second,
100  this decision must be revisited (see ADR-007: Sharding Strategy).
101- RDS costs more than a self-hosted instance; budget must account for this.
102- Team must maintain migration discipline (Alembic) to prevent schema drift.
103
104---
105
106## Example ADR 2
107
108# ADR-012: Adopt Microservices Architecture for Payment Service
109
110**Status**: Accepted
111**Date**: 2025-07-22
112**Deciders**: Alice Kim (Tech Lead), David Oh (Platform), Eve Choi (Security)
113**Supersedes**: ADR-003 (Monolithic architecture for all services)
114
115## Context
116
117The payment service currently lives inside the main Django monolith (see ADR-003).
118Three pain points have emerged that the monolith cannot address cleanly:
119
1201. **Deployment coupling**: A bug fix in the product catalogue requires redeploying
121   the entire application, including the payment module. Any deployment carries risk
122   for payment processing, which must have 99.95% uptime.
1232. **Compliance isolation**: PCI-DSS compliance requires limiting the cardholder data
124   environment (CDE) to as few systems as possible. Keeping payment logic in the
125   monolith means the entire application is in scope, which triples audit complexity.
1263. **Independent scaling**: Checkout traffic spikes (flash sales) require extra payment
127   processing capacity without scaling unrelated services.
128
129Alternatives considered:
130
131| Option                          | Assessment                                                  |
132|---------------------------------|-------------------------------------------------------------|
133| Extract payment as a library    | Solves none of the three pain points                        |
134| Strangler Fig to full microservices | Too broad; creates risk across the whole platform       |
135| Extract *only* payment service  | Targeted, addresses all three pain points, manageable scope |
136
137## Decision
138
139We will extract the payment service into a **standalone microservice** using the
140Strangler Fig pattern, while keeping the rest of the platform as a monolith.
141
142Implementation decisions:
143- **Language/Framework**: Python 3.12 + FastAPI (async, matches team skills).
144- **Communication**: Synchronous REST for checkout flow; async events via RabbitMQ
145  for post-payment notifications (receipt emails, inventory deduction).
146- **Data**: Dedicated PostgreSQL schema, not shared with the monolith (database-per-service).
147- **Auth**: Service-to-service calls authenticated with short-lived JWTs signed by
148  an internal CA (not user-facing OAuth tokens).
149- **Deployment**: Separate Kubernetes Deployment with HPA; isolated network policy
150  limits ingress to the API gateway and the monolith only.
151
152## Consequences
153
154**Positive:**
155- Payment service can be deployed independently — zero downtime for product catalogue releases.
156- PCI-DSS CDE scope reduced from the whole monolith to one small service and its database.
157- Can scale payment pods independently during flash sales without scaling catalogue services.
158- Failure in the monolith (e.g., OOM) does not crash in-flight payment transactions.
159
160**Negative / Risks:**
161- **Distributed systems complexity**: Network failures between the monolith and payment
162  service must be handled explicitly (retries, idempotency keys, circuit breakers).
163  The monolith today handles this with a local function call.
164- **Operational overhead**: Additional Kubernetes Deployment, service, and network policy
165  to maintain. Observability must be extended (distributed tracing with OpenTelemetry).
166- **Data consistency**: Without a shared database, refunds and order status updates require
167  event-driven coordination. The team must implement the Saga pattern for the checkout flow.
168- **Scope creep risk**: Extracting one service is justified. Extracting everything into
169  microservices is not — this ADR explicitly does not authorise a full decomposition.
170  Future extractions require separate ADRs.
171
172**Follow-up actions:**
173- ADR-013: Event schema for payment events on RabbitMQ
174- ADR-014: Saga pattern implementation for checkout/refund flow
175- Runbook: Payment service on-call playbook (escalation, rollback procedure)