Database Design Case Study
Database Design Case Study¶
Previous: 15. NewSQL and Modern Trends
This capstone lesson brings together everything from the preceding fifteen lessons into a comprehensive, end-to-end database design exercise. We will walk through the complete design lifecycle of an e-commerce platform -- from gathering requirements to writing optimized queries -- applying the theory of ER modeling, normalization, indexing, transactions, and concurrency control to a realistic scenario. A second mini case study (social media platform) provides additional practice, and the lesson concludes with a design review checklist and a catalog of common mistakes.
Difficulty: ββββ
Learning Objectives: - Conduct requirements analysis for a database system - Produce a complete ER diagram from business requirements - Map an ER model to a relational schema and normalize to BCNF - Make informed decisions about physical design (indexing, partitioning, denormalization) - Analyze query plans and apply optimization techniques - Design transaction strategies with appropriate isolation levels - Evaluate design tradeoffs and document decisions - Recognize and avoid common database design mistakes
Table of Contents¶
- Overview of the Design Lifecycle
- Phase 1: Requirements Analysis
- Phase 2: Conceptual Design (ER Diagram)
- Phase 3: Logical Design (Relational Schema)
- Phase 4: Physical Design
- Phase 5: Query Optimization
- Phase 6: Transaction Design
- Alternative Case Study: Social Media Platform
- Design Review Checklist
- Common Design Mistakes and How to Avoid Them
- Exercises
- References
1. Overview of the Design Lifecycle¶
Database design is not a one-shot process. It follows a structured lifecycle with feedback loops:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Database Design Lifecycle β
β β
β Phase 1 Phase 2 Phase 3 β
β Requirements βββΆ Conceptual βββΆ Logical β
β Analysis Design (ER) Design (Relational) β
β β
β β β β
β β βββββββββββββββββββββββ β
β β β β
β β βΌ β
β β Phase 4 Phase 5 β
β β Physical βββΆ Query β
β ββββββββββΆDesign Optimization β
β β
β β β β
β βΌ βΌ β
β Phase 6 β
β Transaction Design β
β β
β β β
β βΌ β
β Deploy, Monitor, Iterate β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Each phase produces artifacts that feed into the next:
| Phase | Input | Output |
|---|---|---|
| 1. Requirements | Business needs, interviews | Functional & data requirements |
| 2. Conceptual | Requirements | ER diagram |
| 3. Logical | ER diagram | Normalized relational schema (DDL) |
| 4. Physical | Relational schema + query patterns | Indexes, partitions, denormalization |
| 5. Query | Physical schema + common queries | Optimized queries with explain plans |
| 6. Transaction | Business rules + concurrency needs | Isolation levels, locking strategy |
2. Phase 1: Requirements Analysis¶
2.1 The Business Context¶
We are designing the database for ShopWave, an online e-commerce platform that sells physical products. The platform supports multiple sellers, product reviews, promotions, and a standard checkout flow.
2.2 Functional Requirements¶
After interviewing stakeholders, we identify these key business functions:
FR-01: Customer registration and authentication
FR-02: Product catalog browsing with search and filtering
FR-03: Product detail page with images, specifications, and reviews
FR-04: Shopping cart management (add, remove, update quantities)
FR-05: Checkout with multiple payment methods
FR-06: Order tracking (status updates from placement to delivery)
FR-07: Seller management (sellers list products, manage inventory)
FR-08: Product reviews and ratings (1-5 stars, text review)
FR-09: Promotions and discount codes
FR-10: Wishlist (customers save products for later)
FR-11: Address management (multiple shipping addresses per customer)
FR-12: Product categories with hierarchy (e.g., Electronics > Phones > Smartphones)
2.3 Data Requirements¶
From the functional requirements, we extract the data entities and their key attributes:
DR-01: Customers
- Name, email, password hash, phone, registration date
- Multiple shipping addresses
- One or more payment methods
DR-02: Products
- Name, description, SKU, price, weight, dimensions
- Multiple images (URLs)
- Belongs to one or more categories
- Listed by exactly one seller
- Has a stock quantity (inventory)
DR-03: Categories
- Name, description
- Hierarchical (parent-child relationship)
DR-04: Orders
- Customer, shipping address, order date, status
- One or more order items (product, quantity, unit price at time of order)
- Payment information
- Shipping method and tracking number
DR-05: Reviews
- Customer, product, rating (1-5), text, date
- Each customer can review a product at most once
DR-06: Sellers
- Company name, contact email, commission rate
- Bank account for payouts
DR-07: Promotions
- Code, discount type (percentage/fixed), amount, valid dates
- Applicable to specific products, categories, or site-wide
- Usage limit (total and per customer)
2.4 Volumetric Analysis¶
Understanding data volumes helps with physical design decisions:
| Entity | Expected Volume | Growth Rate |
|---|---|---|
| Customers | 1 million | 50K/month |
| Products | 500,000 | 10K/month |
| Orders | 10 million | 500K/month |
| Order Items | 25 million | 1.25M/month |
| Reviews | 2 million | 100K/month |
| Categories | 1,000 | Rarely changes |
| Sellers | 5,000 | 200/month |
| Promotions | 500 active at a time | Seasonal peaks |
2.5 Query Patterns¶
We identify the most frequent and critical queries:
QP-01: [High freq] Product search by keyword + category + price range
QP-02: [High freq] Product detail page (product + images + reviews + seller)
QP-03: [High freq] Shopping cart contents for a customer
QP-04: [Critical] Place order (decrement inventory, create order + items)
QP-05: [High freq] Order history for a customer (with status)
QP-06: [Medium] Seller dashboard (orders, revenue, inventory)
QP-07: [Medium] Top-rated products in a category
QP-08: [Low freq] Admin: sales report by category, date range
QP-09: [Critical] Apply promotion code (validate, calculate discount)
QP-10: [High freq] Customer wishlist display
2.6 Non-Functional Requirements¶
NFR-01: Response time: < 200ms for read queries, < 500ms for writes
NFR-02: Availability: 99.9% uptime
NFR-03: Data durability: no data loss for orders and payments
NFR-04: Concurrency: support 10,000 concurrent users
NFR-05: Scalability: handle 10x growth over 3 years
NFR-06: Security: PCI-DSS compliance for payment data
3. Phase 2: Conceptual Design (ER Diagram)¶
3.1 Entity Identification¶
From the data requirements, we identify these entities:
Strong Entities: Weak Entities: Associative Entities:
βββ Customer βββ OrderItem βββ Review
βββ Product βββ CartItem βββ ProductCategory
βββ Category βββ ProductImage βββ WishlistItem
βββ Order βββ Address βββ PromotionUsage
βββ Seller
βββ Promotion
βββ Cart
βββ PaymentMethod
3.2 ER Diagram¶
ββββββββββββββββ
βββββββ Category βββββββ
β β id β β
β β name β β parent_id
β β description β β (self-ref)
β ββββββββββββββββββββββ
β β
β βββββββ΄βββββββ
β β Product β
β β Category β
β β (M:N) β
β βββββββ¬βββββββ
β β
ββββββββββββββββ βββββββ΄βββββββββββ΄βββ ββββββββββββββββ
β Seller β 1 M β Product β 1 M β ProductImage β
β id ββββββββββ id ββββββββββ id β
β company_name β β name β β product_id β
β email β β description β β url β
β commission β β sku β β alt_text β
β bank_account β β price β β position β
ββββββββββββββββ β weight β ββββββββββββββββ
β stock_quantity β
β seller_id (FK) β
ββββββββββ¬βββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββ
β β β
βββββββ΄βββββββ ββββββββ΄βββββββ ββββββββ΄βββββββ
β Review β β WishlistItemβ β CartItem β
β id β β id β β id β
β customer_idβ β customer_id β β cart_id β
β product_id β β product_id β β product_id β
β rating β β added_at β β quantity β
β text β βββββββββββββββ β added_at β
β created_at β ββββββββ¬βββββββ
βββββββ¬βββββββ β
β β
β βββββββ΄βββββββ
β β Cart β
β β id β
β β customer_id β
β β created_at β
β βββββββ¬βββββββ
β β
β β
βββββββ΄βββββββββββββββββββββββββββββββββββ΄βββ
β Customer β
β id β
β email β
β password_hash β
β first_name, last_name β
β phone β
β created_at β
ββββββββββββββ¬ββββββββββββββ¬βββββββββββββββββ
β β
ββββββββ΄βββββββ βββββ΄βββββββββββββ
β Address β β PaymentMethod β
β id β β id β
β customer_id β β customer_id β
β label β β type β
β street β β last_four β
β city β β token β
β state β β is_default β
β zip_code β ββββββββββββββββββ
β country β
β is_default β
ββββββββ¬βββββββ
β
ββββββββ΄βββββββββββββββββββββββββββββββ
β Order β
β id β
β customer_id β
β shipping_address_id β
β payment_method_id β
β status β
β subtotal, discount, tax, total β
β promotion_id (nullable) β
β tracking_number β
β created_at, updated_at β
ββββββββββββ¬βββββββββββββββββββββββββββ
β
ββββββββββββ΄βββββββββββ
β OrderItem β
β id β
β order_id β
β product_id β
β quantity β
β unit_price β β price at time of order
β subtotal β (not current price!)
βββββββββββββββββββββββ
ββββββββββββββββ
β Promotion β
β id β
β code β
β discount_typeβ (percentage / fixed_amount)
β amount β
β min_purchase β
β max_uses β
β uses_count β
β valid_from β
β valid_until β
β is_active β
ββββββββββββββββ
3.3 Cardinality Summary¶
| Relationship | Cardinality | Participation |
|---|---|---|
| Customer -- Address | 1:M | Total (at least one address required for ordering) |
| Customer -- PaymentMethod | 1:M | Partial (can register without payment method) |
| Customer -- Order | 1:M | Partial |
| Customer -- Review | 1:M | Partial |
| Customer -- Cart | 1:1 | Partial |
| Customer -- WishlistItem | 1:M | Partial |
| Order -- OrderItem | 1:M | Total (order must have at least one item) |
| Product -- ProductImage | 1:M | Partial (images optional) |
| Product -- Review | 1:M | Partial |
| Product -- Category | M:N | Total (product must be in at least one category) |
| Seller -- Product | 1:M | Total |
| Category -- Category (self) | 1:M | Partial (root categories have no parent) |
| Order -- Promotion | M:1 | Partial (order may have no promotion) |
| Customer + Product -- Review | Unique (one review per customer per product) |
4. Phase 3: Logical Design (Relational Schema)¶
4.1 ER-to-Relational Mapping¶
We apply the standard mapping rules from Lesson 4:
Rule 1: Strong entities become tables. Rule 2: Weak entities become tables with the owner's PK as part of their PK or as a FK. Rule 3: 1:M relationships become FKs in the "many" side. Rule 4: M:N relationships become junction tables.
4.2 Complete DDL¶
-- ============================================================
-- CUSTOMERS AND AUTHENTICATION
-- ============================================================
CREATE TABLE customers (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
phone VARCHAR(20),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE addresses (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id) ON DELETE CASCADE,
label VARCHAR(50) NOT NULL DEFAULT 'Home', -- 'Home', 'Work', etc.
street_line1 VARCHAR(255) NOT NULL,
street_line2 VARCHAR(255),
city VARCHAR(100) NOT NULL,
state VARCHAR(100),
zip_code VARCHAR(20) NOT NULL,
country VARCHAR(2) NOT NULL, -- ISO 3166-1 alpha-2
is_default BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE payment_methods (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id) ON DELETE CASCADE,
type VARCHAR(20) NOT NULL CHECK (type IN ('credit_card', 'debit_card', 'paypal')),
last_four CHAR(4) NOT NULL,
token VARCHAR(255) NOT NULL, -- tokenized by payment processor
expires_at DATE,
is_default BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- ============================================================
-- SELLERS
-- ============================================================
CREATE TABLE sellers (
id BIGSERIAL PRIMARY KEY,
company_name VARCHAR(255) NOT NULL,
contact_email VARCHAR(255) NOT NULL UNIQUE,
commission_rate DECIMAL(5,4) NOT NULL DEFAULT 0.1500, -- 15.00%
bank_account VARCHAR(255), -- encrypted or tokenized
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- ============================================================
-- PRODUCT CATALOG
-- ============================================================
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
slug VARCHAR(100) NOT NULL UNIQUE,
description TEXT,
parent_id INT REFERENCES categories(id) ON DELETE SET NULL,
position INT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
seller_id BIGINT NOT NULL REFERENCES sellers(id),
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) NOT NULL UNIQUE,
description TEXT,
sku VARCHAR(50) NOT NULL UNIQUE,
price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
weight_grams INT,
stock_quantity INT NOT NULL DEFAULT 0 CHECK (stock_quantity >= 0),
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE product_categories (
product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE CASCADE,
category_id INT NOT NULL REFERENCES categories(id) ON DELETE CASCADE,
PRIMARY KEY (product_id, category_id)
);
CREATE TABLE product_images (
id BIGSERIAL PRIMARY KEY,
product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE CASCADE,
url VARCHAR(500) NOT NULL,
alt_text VARCHAR(255),
position INT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- ============================================================
-- SHOPPING CART
-- ============================================================
CREATE TABLE carts (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT REFERENCES customers(id) ON DELETE SET NULL,
session_id VARCHAR(255), -- for anonymous carts
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT cart_owner CHECK (customer_id IS NOT NULL OR session_id IS NOT NULL)
);
CREATE TABLE cart_items (
id BIGSERIAL PRIMARY KEY,
cart_id BIGINT NOT NULL REFERENCES carts(id) ON DELETE CASCADE,
product_id BIGINT NOT NULL REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0),
added_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (cart_id, product_id)
);
-- ============================================================
-- PROMOTIONS
-- ============================================================
CREATE TABLE promotions (
id BIGSERIAL PRIMARY KEY,
code VARCHAR(50) NOT NULL UNIQUE,
discount_type VARCHAR(20) NOT NULL CHECK (discount_type IN ('percentage', 'fixed_amount')),
amount DECIMAL(10,2) NOT NULL CHECK (amount > 0),
min_purchase DECIMAL(10,2) DEFAULT 0,
max_uses INT, -- NULL = unlimited
uses_count INT NOT NULL DEFAULT 0,
per_customer INT NOT NULL DEFAULT 1, -- max uses per customer
valid_from TIMESTAMPTZ NOT NULL,
valid_until TIMESTAMPTZ NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CHECK (valid_from < valid_until)
);
-- ============================================================
-- ORDERS
-- ============================================================
CREATE TYPE order_status AS ENUM (
'pending', 'confirmed', 'processing', 'shipped',
'delivered', 'cancelled', 'refunded'
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
shipping_address_id BIGINT NOT NULL REFERENCES addresses(id),
payment_method_id BIGINT NOT NULL REFERENCES payment_methods(id),
promotion_id BIGINT REFERENCES promotions(id),
status order_status NOT NULL DEFAULT 'pending',
subtotal DECIMAL(10,2) NOT NULL,
discount_amount DECIMAL(10,2) NOT NULL DEFAULT 0,
tax_amount DECIMAL(10,2) NOT NULL DEFAULT 0,
shipping_cost DECIMAL(10,2) NOT NULL DEFAULT 0,
total DECIMAL(10,2) NOT NULL,
tracking_number VARCHAR(100),
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id BIGINT NOT NULL REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0),
unit_price DECIMAL(10,2) NOT NULL, -- price at time of order
subtotal DECIMAL(10,2) NOT NULL, -- quantity * unit_price
UNIQUE (order_id, product_id)
);
-- ============================================================
-- REVIEWS
-- ============================================================
CREATE TABLE reviews (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
product_id BIGINT NOT NULL REFERENCES products(id),
rating SMALLINT NOT NULL CHECK (rating BETWEEN 1 AND 5),
title VARCHAR(255),
body TEXT,
is_verified BOOLEAN NOT NULL DEFAULT FALSE, -- verified purchase
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (customer_id, product_id) -- one review per customer per product
);
-- ============================================================
-- WISHLISTS
-- ============================================================
CREATE TABLE wishlist_items (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id) ON DELETE CASCADE,
product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE CASCADE,
added_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (customer_id, product_id)
);
4.3 Normalization Verification¶
Let us verify that our schema is in BCNF by analyzing the functional dependencies of key tables.
customers table:
FDs: id β email, password_hash, first_name, last_name, phone, created_at, updated_at
email β id (candidate key)
Candidate keys: {id}, {email}
Both determinants are superkeys β BCNF β
products table:
FDs: id β seller_id, name, slug, description, sku, price, weight_grams,
stock_quantity, is_active, created_at, updated_at
sku β id (candidate key)
slug β id (candidate key)
Candidate keys: {id}, {sku}, {slug}
All determinants are superkeys β BCNF β
order_items table:
FDs: id β order_id, product_id, quantity, unit_price, subtotal
(order_id, product_id) β id, quantity, unit_price, subtotal
Candidate keys: {id}, {order_id, product_id}
All determinants are superkeys β BCNF β
reviews table:
FDs: id β customer_id, product_id, rating, title, body, is_verified, created_at
(customer_id, product_id) β id, rating, title, body, is_verified, created_at
Candidate keys: {id}, {customer_id, product_id}
All determinants are superkeys β BCNF β
categories table (self-referential hierarchy):
FDs: id β name, slug, description, parent_id, position, created_at
slug β id
Candidate keys: {id}, {slug}
All determinants are superkeys β BCNF β
The schema is in BCNF -- no non-trivial FD has a determinant that is not a superkey.
4.4 Referential Integrity Summary¶
βββββββββββββββββββ βββββββββββββββββββ
β customers β β sellers β
β (PK: id) β β (PK: id) β
βββββββββ¬ββββββββββ βββββββββ¬ββββββββββ
β β β β β
β β β β β
β β β β βββββββββββ΄βββββββββββ
β β β βββββΆβ products β
β β β β (FK: seller_id) β
β β β ββββ¬βββββββββ¬βββββββββ
β β β β β
β β β ββββββββ ββββββββ
β β β βΌ βΌ
β β β ββββββββββββββ ββββββββββββββββ
β β β βproduct_catsβ βproduct_imagesβ
β β β ββββββββββββββ ββββββββββββββββ
β β β
β β ββββββΆββββββββββββ
β β β reviews β (FK: customer_id, product_id)
β β ββββββββββββ
β β
β βββββββββββΆββββββββββββ βββββββββββββββ
β β orders ββββββΆβ order_items β
β ββββββββββββ βββββββββββββββ
β
ββββββββββββββββΆββββββββββββ
β β addressesβ
ββββββββββββββββΆββββββββββββ
β βpay_methodsβ
ββββββββββββββββΆββββββββββββ
β β carts ββββββΆ cart_items
ββββββββββββββββΆββββββββββββ
β wishlist β
ββββββββββββ
5. Phase 4: Physical Design¶
5.1 Indexing Strategy¶
We design indexes based on the query patterns identified in Phase 1.
Primary key indexes (automatically created):
-- All tables already have PK indexes from BIGSERIAL PRIMARY KEY
-- Unique constraints also create indexes automatically:
-- customers(email), products(sku), products(slug), promotions(code)
Secondary indexes for frequent queries:
-- QP-01: Product search by category + price range
CREATE INDEX idx_products_active_price ON products(price)
WHERE is_active = TRUE;
-- For category-based search (through junction table)
CREATE INDEX idx_product_categories_category ON product_categories(category_id);
-- QP-02: Product detail page (reviews for a product)
CREATE INDEX idx_reviews_product_rating ON reviews(product_id, rating);
-- QP-03: Shopping cart for a customer
CREATE INDEX idx_carts_customer ON carts(customer_id);
-- QP-04: Inventory check (stock_quantity for a product)
-- PK on products(id) is sufficient
-- QP-05: Order history for a customer
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at DESC);
-- QP-06: Seller dashboard
CREATE INDEX idx_products_seller ON products(seller_id);
CREATE INDEX idx_order_items_product ON order_items(product_id);
-- QP-07: Top-rated products in a category
-- Requires joining product_categories + reviews
-- The idx_reviews_product_rating and idx_product_categories_category help
-- QP-09: Promotion validation
CREATE INDEX idx_promotions_code_active ON promotions(code)
WHERE is_active = TRUE;
-- QP-10: Wishlist display
CREATE INDEX idx_wishlist_customer ON wishlist_items(customer_id, added_at DESC);
-- Full-text search for product names and descriptions
CREATE INDEX idx_products_search ON products
USING GIN (to_tsvector('english', name || ' ' || COALESCE(description, '')));
5.2 Partitioning Strategy¶
For tables expected to grow into hundreds of millions of rows, we apply table partitioning.
-- Partition orders by month (range partitioning)
-- This helps with:
-- 1. Order history queries (scan only relevant months)
-- 2. Archiving old orders (drop entire partitions)
-- 3. Maintenance operations (VACUUM on smaller partitions)
CREATE TABLE orders (
id BIGSERIAL,
customer_id BIGINT NOT NULL,
-- ... other columns ...
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, created_at) -- partition key must be in PK
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_2024_02 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- ... (automate with pg_partman extension)
-- Similarly partition order_items by order date
-- Or use foreign key to orders which is already partitioned
-- Partition reviews (if they grow very large)
CREATE TABLE reviews (
-- ...
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);
5.3 Denormalization Decisions¶
Pure normalization optimizes for storage and update consistency but can hurt read performance. We strategically denormalize where the read performance benefit outweighs the update complexity.
Decision 1: Store unit_price in order_items
-- ALREADY IN SCHEMA: order_items.unit_price captures price at order time
-- This is NOT denormalization for performance -- it's a business requirement!
-- Product price may change after the order is placed.
-- Without this, we cannot reconstruct historical order totals.
Decision 2: Materialized review statistics
-- Instead of computing AVG(rating) and COUNT(*) for every product page load,
-- store pre-computed values on the products table.
ALTER TABLE products ADD COLUMN avg_rating DECIMAL(3,2) DEFAULT NULL;
ALTER TABLE products ADD COLUMN review_count INT NOT NULL DEFAULT 0;
-- Update via trigger or application code when a review is added/modified:
-- UPDATE products SET
-- avg_rating = (SELECT AVG(rating) FROM reviews WHERE product_id = $1),
-- review_count = (SELECT COUNT(*) FROM reviews WHERE product_id = $1)
-- WHERE id = $1;
Decision 3: Store total in orders
-- ALREADY IN SCHEMA: orders.total is stored rather than computed
-- Computing SUM(subtotal) from order_items on every read is wasteful
-- The total is fixed once the order is placed
Decision 4: Category path materialization
-- For deep category hierarchies, recursive queries are slow
-- Materialize the full path for display:
ALTER TABLE categories ADD COLUMN path TEXT; -- e.g., "Electronics/Phones/Smartphones"
ALTER TABLE categories ADD COLUMN depth INT NOT NULL DEFAULT 0;
-- Update on category creation/move:
-- path = parent.path || '/' || name
-- depth = parent.depth + 1
Denormalization decision matrix:
| Denormalization | Benefit | Cost | Decision |
|---|---|---|---|
| avg_rating on products | Avoid aggregate on every product page | Update on review change | YES (high read:write ratio) |
| total on orders | Avoid SUM on every order display | Must match order_items | YES (immutable after creation) |
| category path | Avoid recursive queries | Update on category move | YES (categories change rarely) |
| customer name on orders | Avoid JOIN for order list | Update on name change | NO (JOIN is fast with index) |
5.4 Storage Considerations¶
-- Use appropriate data types for storage efficiency
-- GOOD: DECIMAL(10,2) for money (exact arithmetic)
-- BAD: FLOAT for money (rounding errors!)
-- GOOD: SMALLINT for rating (1-5, uses 2 bytes)
-- BAD: INT for rating (uses 4 bytes, wastes space)
-- GOOD: VARCHAR(2) for country codes
-- BAD: TEXT for country codes (no length constraint)
-- GOOD: TIMESTAMPTZ for dates (timezone-aware)
-- BAD: TIMESTAMP for dates (ambiguous timezone)
-- GOOD: ENUM for fixed status values
-- BAD: VARCHAR for status (allows typos)
-- Estimate table sizes:
-- customers: 1M rows x ~300 bytes β 300 MB
-- products: 500K rows x ~500 bytes β 250 MB
-- orders: 10M rows x ~200 bytes β 2 GB
-- order_items: 25M rows x ~100 bytes β 2.5 GB
-- reviews: 2M rows x ~500 bytes β 1 GB
-- TOTAL: ~6 GB (easily fits in RAM for caching)
6. Phase 5: Query Optimization¶
6.1 Product Search (QP-01)¶
-- Query: Search products by keyword in a category with price filter
-- This is the most frequent query on the platform
SELECT p.id, p.name, p.price, p.avg_rating, p.review_count,
(SELECT url FROM product_images pi
WHERE pi.product_id = p.id ORDER BY pi.position LIMIT 1) AS thumbnail
FROM products p
JOIN product_categories pc ON p.id = pc.product_id
WHERE pc.category_id = 42
AND p.price BETWEEN 10.00 AND 100.00
AND p.is_active = TRUE
AND to_tsvector('english', p.name || ' ' || COALESCE(p.description, ''))
@@ plainto_tsquery('english', 'wireless bluetooth headphones')
ORDER BY p.avg_rating DESC NULLS LAST
LIMIT 20 OFFSET 0;
Explain plan analysis:
EXPLAIN (ANALYZE, BUFFERS)
-- Expected plan:
-- Limit (cost=... rows=20)
-- -> Sort (cost=... key: avg_rating DESC)
-- -> Nested Loop (cost=...)
-- -> Index Scan on product_categories pc (category_id = 42)
-- -> Index Scan on products p (id = pc.product_id)
-- Filter: is_active AND price BETWEEN 10 AND 100
-- Filter: tsvector @@ tsquery
Optimization: Create a composite GIN index that covers both full-text search and category:
-- If category-based full-text search is the dominant pattern:
-- Consider a materialized view that pre-joins products and categories
CREATE MATERIALIZED VIEW product_search AS
SELECT p.id, p.name, p.description, p.price, p.avg_rating,
p.review_count, p.is_active, pc.category_id,
to_tsvector('english', p.name || ' ' || COALESCE(p.description, '')) AS search_vector
FROM products p
JOIN product_categories pc ON p.id = pc.product_id;
CREATE INDEX idx_product_search_category ON product_search(category_id)
WHERE is_active = TRUE;
CREATE INDEX idx_product_search_fts ON product_search USING GIN(search_vector);
CREATE INDEX idx_product_search_price ON product_search(category_id, price)
WHERE is_active = TRUE;
-- Refresh periodically or on product changes
REFRESH MATERIALIZED VIEW CONCURRENTLY product_search;
6.2 Product Detail Page (QP-02)¶
-- Fetch product with images, seller, and review summary in one query
SELECT
p.id, p.name, p.description, p.price, p.sku,
p.stock_quantity, p.avg_rating, p.review_count,
s.company_name AS seller_name,
json_agg(DISTINCT jsonb_build_object(
'url', pi.url, 'alt_text', pi.alt_text, 'position', pi.position
)) AS images
FROM products p
JOIN sellers s ON p.seller_id = s.id
LEFT JOIN product_images pi ON p.id = pi.product_id
WHERE p.id = $1
GROUP BY p.id, s.id;
-- Fetch reviews separately (paginated)
SELECT r.rating, r.title, r.body, r.created_at, r.is_verified,
c.first_name || ' ' || LEFT(c.last_name, 1) || '.' AS reviewer
FROM reviews r
JOIN customers c ON r.customer_id = c.id
WHERE r.product_id = $1
ORDER BY r.created_at DESC
LIMIT 10 OFFSET $2;
6.3 Order History (QP-05)¶
-- Customer's recent orders with item summary
SELECT o.id, o.status, o.total, o.created_at,
COUNT(oi.id) AS item_count,
STRING_AGG(LEFT(p.name, 30), ', ' ORDER BY oi.id LIMIT 3) AS item_preview
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.customer_id = $1
GROUP BY o.id
ORDER BY o.created_at DESC
LIMIT 20 OFFSET $2;
-- Uses index: idx_orders_customer_date (customer_id, created_at DESC)
-- Partition pruning: if querying recent orders, only scans recent partitions
6.4 Seller Dashboard (QP-06)¶
-- Seller's monthly revenue and order count
SELECT
DATE_TRUNC('month', o.created_at) AS month,
COUNT(DISTINCT o.id) AS order_count,
SUM(oi.subtotal) AS revenue,
SUM(oi.subtotal * s.commission_rate) AS commission
FROM sellers s
JOIN products p ON s.id = p.seller_id
JOIN order_items oi ON p.id = oi.product_id
JOIN orders o ON oi.order_id = o.id
WHERE s.id = $1
AND o.created_at >= $2 -- start date
AND o.created_at < $3 -- end date
AND o.status NOT IN ('cancelled', 'refunded')
GROUP BY month
ORDER BY month DESC;
6.5 Query Performance Summary¶
| Query | Expected Latency | Key Index | Notes |
|---|---|---|---|
| Product search | < 50ms | GIN (FTS) + category | Materialized view helps |
| Product detail | < 10ms | PK + seller FK | Single row + images |
| Cart contents | < 5ms | carts(customer_id) | Small result set |
| Place order | < 100ms | Inventory check + insert | Transaction (see Phase 6) |
| Order history | < 20ms | orders(customer_id, date) | Partition pruning |
| Seller dashboard | < 200ms | products(seller_id) + order_items(product_id) | Aggregate query |
| Top rated products | < 30ms | product_categories + avg_rating | Uses denormalized rating |
7. Phase 6: Transaction Design¶
7.1 Critical Transaction: Place Order¶
This is the most complex and important transaction in the system. It must: 1. Validate cart items are in stock 2. Calculate totals (with potential promotion) 3. Decrement inventory 4. Create the order and order items 5. Clear the cart 6. Process payment (external API call)
-- Isolation level: SERIALIZABLE or READ COMMITTED with explicit locking
-- We use READ COMMITTED + explicit locking for better performance
BEGIN;
-- Step 1: Lock cart items to prevent concurrent modifications
-- Also lock the product rows to prevent overselling
SELECT p.id, p.name, p.price, p.stock_quantity, ci.quantity
FROM cart_items ci
JOIN products p ON ci.product_id = p.id
WHERE ci.cart_id = $cart_id
FOR UPDATE OF p; -- locks product rows
-- Step 2: Validate stock (application checks: stock_quantity >= ci.quantity)
-- If any item is out of stock, ROLLBACK and inform the customer
-- Step 3: Apply promotion (if any)
SELECT id, discount_type, amount, min_purchase, max_uses, uses_count
FROM promotions
WHERE code = $promo_code
AND is_active = TRUE
AND valid_from <= NOW()
AND valid_until > NOW()
FOR UPDATE; -- lock to prevent concurrent usage beyond max_uses
-- Step 4: Create order
INSERT INTO orders (customer_id, shipping_address_id, payment_method_id,
promotion_id, subtotal, discount_amount, tax_amount,
shipping_cost, total, status)
VALUES ($customer_id, $address_id, $payment_id,
$promo_id, $subtotal, $discount, $tax, $shipping, $total, 'pending')
RETURNING id AS order_id;
-- Step 5: Create order items and decrement inventory
INSERT INTO order_items (order_id, product_id, quantity, unit_price, subtotal)
SELECT $order_id, ci.product_id, ci.quantity, p.price, ci.quantity * p.price
FROM cart_items ci
JOIN products p ON ci.product_id = p.id
WHERE ci.cart_id = $cart_id;
UPDATE products p
SET stock_quantity = stock_quantity - ci.quantity,
updated_at = NOW()
FROM cart_items ci
WHERE p.id = ci.product_id
AND ci.cart_id = $cart_id;
-- Step 6: Update promotion usage count
UPDATE promotions SET uses_count = uses_count + 1
WHERE id = $promo_id;
-- Step 7: Clear cart
DELETE FROM cart_items WHERE cart_id = $cart_id;
-- Step 8: Process payment (external API call)
-- NOTE: This should happen OUTSIDE the transaction or use the Saga pattern
-- If payment fails, the entire transaction rolls back
COMMIT;
Important design decision: The payment API call should NOT be inside the database transaction because: 1. External API calls can be slow (seconds), holding locks too long 2. If the DB commits but the payment fails, we have an inconsistent state
Better approach: Saga pattern
1. BEGIN; create order with status='pending_payment'; decrement stock; COMMIT;
2. Call payment API
3. If payment succeeds: UPDATE orders SET status='confirmed' WHERE id=$id;
4. If payment fails: UPDATE orders SET status='payment_failed' WHERE id=$id;
Restore inventory (compensating transaction)
7.2 Isolation Level Decisions¶
| Transaction | Isolation Level | Reason |
|---|---|---|
| Place order | READ COMMITTED + FOR UPDATE | Explicit locking prevents overselling; RC is faster than Serializable |
| Browse products | READ COMMITTED | Slight staleness is acceptable for catalog |
| View order history | READ COMMITTED | Historical data, no write conflicts |
| Submit review | READ COMMITTED + UNIQUE constraint | UNIQUE(customer_id, product_id) prevents duplicates |
| Apply promotion | READ COMMITTED + FOR UPDATE | Lock promotion row to prevent exceeding max_uses |
| Admin reports | REPEATABLE READ | Consistent snapshot for long-running aggregation queries |
7.3 Locking Strategy¶
-- Prevent overselling: use SELECT FOR UPDATE on product rows
-- This acquires row-level exclusive locks
-- Ordered locking to prevent deadlocks:
-- Always lock products in id order when updating multiple products
-- Example: if cart has products [42, 17, 88], lock in order [17, 42, 88]
SELECT id, stock_quantity
FROM products
WHERE id IN (17, 42, 88)
ORDER BY id -- consistent ordering prevents deadlocks
FOR UPDATE;
7.4 Concurrency Scenarios¶
Scenario 1: Two customers order the last item
Customer A Customer B
BEGIN BEGIN
SELECT ... FOR UPDATE (stock=1) β SELECT ... FOR UPDATE β BLOCKS
UPDATE stock = 0 (waiting for A's lock)
INSERT order_item ...
COMMIT β releases lock β unblocked, reads stock=0
β stock < quantity β ROLLBACK
β "Item out of stock" message
Scenario 2: Customer updates cart while checking out
Checkout Transaction Cart Update
BEGIN BEGIN
SELECT cart_items FOR UPDATE β UPDATE cart_items β BLOCKS
... (waiting for checkout's lock)
...processes order... ...
DELETE cart_items ...
COMMIT β releases lock β row deleted, update affects 0 rows
β COMMIT (no error, but cart is empty)
8. Alternative Case Study: Social Media Platform¶
8.1 Requirements Summary¶
SocialBuzz is a social media platform supporting posts, follows, likes, and a personalized feed.
8.2 Key Entities¶
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL UNIQUE,
email VARCHAR(255) NOT NULL UNIQUE,
display_name VARCHAR(100) NOT NULL,
bio TEXT,
avatar_url VARCHAR(500),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE follows (
follower_id BIGINT NOT NULL REFERENCES users(id),
followee_id BIGINT NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (follower_id, followee_id),
CHECK (follower_id != followee_id)
);
CREATE TABLE posts (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
content TEXT NOT NULL,
media_urls TEXT[], -- array of media URLs
like_count INT NOT NULL DEFAULT 0, -- denormalized
comment_count INT NOT NULL DEFAULT 0, -- denormalized
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE likes (
user_id BIGINT NOT NULL REFERENCES users(id),
post_id BIGINT NOT NULL REFERENCES posts(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (user_id, post_id)
);
CREATE TABLE comments (
id BIGSERIAL PRIMARY KEY,
post_id BIGINT NOT NULL REFERENCES posts(id),
user_id BIGINT NOT NULL REFERENCES users(id),
parent_id BIGINT REFERENCES comments(id), -- threaded comments
content TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
8.3 The Feed Problem¶
The most challenging query is the personalized feed: "Show recent posts from people I follow."
Pull model (fan-out on read):
-- At read time, query posts from all followed users
SELECT p.*, u.username, u.display_name, u.avatar_url
FROM posts p
JOIN follows f ON p.user_id = f.followee_id
JOIN users u ON p.user_id = u.id
WHERE f.follower_id = $current_user_id
ORDER BY p.created_at DESC
LIMIT 20;
-- Problem: If a user follows 1000 people, this JOINs a large follows list
-- with a large posts table. Slow for active users.
Push model (fan-out on write):
-- Create a feed table (denormalized timeline)
CREATE TABLE feed_items (
user_id BIGINT NOT NULL, -- the user whose feed this is
post_id BIGINT NOT NULL,
author_id BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
PRIMARY KEY (user_id, created_at DESC, post_id)
);
-- When user X creates a post:
-- Insert a feed_item for EVERY follower of X
INSERT INTO feed_items (user_id, post_id, author_id, created_at)
SELECT f.follower_id, $post_id, $author_id, $created_at
FROM follows f
WHERE f.followee_id = $author_id;
-- Reading the feed is now trivial:
SELECT fi.post_id, fi.author_id, fi.created_at,
p.content, p.media_urls, p.like_count,
u.username, u.avatar_url
FROM feed_items fi
JOIN posts p ON fi.post_id = p.id
JOIN users u ON fi.author_id = u.id
WHERE fi.user_id = $current_user_id
ORDER BY fi.created_at DESC
LIMIT 20;
-- Problem: If a celebrity has 10 million followers, creating a post
-- requires 10 million INSERT operations. Slow for popular users.
Hybrid model (used by Twitter/X):
Regular users (< 10K followers): Push model (fan-out on write)
Celebrity users (> 10K followers): Pull model (fan-out on read)
Feed = Merge(feed_items table, live query for followed celebrities)
8.4 Design Tradeoffs¶
| Aspect | E-Commerce | Social Media |
|---|---|---|
| Critical transaction | Place order (ACID required) | Post creation (eventual consistency OK) |
| Primary read pattern | Product lookup by ID | Timeline (feed) |
| Write pattern | Low frequency, high value | High frequency, low value |
| Denormalization | Moderate (avg_rating, totals) | Heavy (feed_items, counters) |
| Consistency | Strong (inventory, payments) | Eventual (like counts, feeds) |
| Scale challenge | Inventory contention | Fan-out (popular users) |
9. Design Review Checklist¶
Use this checklist before deploying any database design to production:
9.1 Schema Design¶
[ ] Every table has a clear primary key
[ ] Foreign keys are defined for all relationships
[ ] ON DELETE / ON UPDATE actions are specified (CASCADE, SET NULL, RESTRICT)
[ ] NOT NULL constraints on required fields
[ ] CHECK constraints for domain validation (e.g., price >= 0, rating BETWEEN 1 AND 5)
[ ] UNIQUE constraints for natural keys (email, username, SKU)
[ ] Appropriate data types (DECIMAL for money, TIMESTAMPTZ for dates)
[ ] No redundant columns unless denormalization is documented and justified
[ ] Schema is in at least 3NF (preferably BCNF)
[ ] Self-referential FKs for hierarchies (categories, comments)
9.2 Indexing¶
[ ] Indexes exist for all foreign keys (PostgreSQL does NOT auto-create FK indexes!)
[ ] Indexes cover the WHERE, JOIN, and ORDER BY columns of frequent queries
[ ] Composite indexes follow the left-prefix rule
[ ] Partial indexes used where appropriate (WHERE is_active = TRUE)
[ ] GIN indexes for full-text search and array columns
[ ] No redundant indexes (index on (a, b) makes index on (a) redundant)
[ ] Index bloat monitoring plan in place
9.3 Data Integrity¶
[ ] All money values use DECIMAL, never FLOAT
[ ] Timestamps use TIMESTAMPTZ (timezone-aware)
[ ] Enum types or CHECK constraints for status fields
[ ] Immutable fields are protected (order totals, historical prices)
[ ] Soft delete vs hard delete decision documented for each table
[ ] Data retention policy defined
9.4 Performance¶
[ ] EXPLAIN ANALYZE run on all critical queries
[ ] No sequential scans on large tables for indexed queries
[ ] Pagination uses keyset pagination, not OFFSET
[ ] N+1 query patterns eliminated (use JOINs or batch fetching)
[ ] Connection pooling configured (PgBouncer)
[ ] Appropriate work_mem and shared_buffers settings
9.5 Transactions¶
[ ] Isolation level chosen for each transaction type
[ ] Locking strategy documented (which rows/tables are locked)
[ ] Lock ordering defined to prevent deadlocks
[ ] Long-running transactions avoided (no API calls inside transactions)
[ ] Retry logic for serialization failures
[ ] Deadlock detection timeout configured
9.6 Security¶
[ ] No plaintext passwords (use bcrypt/argon2 hashes)
[ ] Payment tokens stored, not card numbers (PCI-DSS)
[ ] Row-Level Security (RLS) for multi-tenant data
[ ] Application user has minimal privileges (no SUPERUSER)
[ ] SQL injection prevented (parameterized queries only)
[ ] Audit logging for sensitive operations
9.7 Operations¶
[ ] Backup strategy defined (pg_dump schedule, WAL archiving)
[ ] Point-in-time recovery (PITR) tested
[ ] Monitoring: query latency, connection count, table bloat, replication lag
[ ] Schema migration tool in use (Flyway, Alembic, golang-migrate)
[ ] Runbook for common failure scenarios (disk full, replication lag, deadlocks)
10. Common Design Mistakes and How to Avoid Them¶
Mistake 1: Using FLOAT for Money¶
-- BAD: Floating-point arithmetic causes rounding errors
CREATE TABLE orders (
total FLOAT -- 0.1 + 0.2 = 0.30000000000000004
);
-- GOOD: Use exact decimal types
CREATE TABLE orders (
total DECIMAL(10,2) -- 0.1 + 0.2 = 0.30
);
Mistake 2: Missing Foreign Key Indexes¶
-- PostgreSQL creates indexes for PRIMARY KEY and UNIQUE, but NOT for FOREIGN KEYS!
-- This means JOIN queries on FK columns do sequential scans.
CREATE TABLE order_items (
order_id BIGINT REFERENCES orders(id), -- NO automatic index!
product_id BIGINT REFERENCES products(id) -- NO automatic index!
);
-- FIX: Always create indexes on FK columns
CREATE INDEX idx_order_items_order ON order_items(order_id);
CREATE INDEX idx_order_items_product ON order_items(product_id);
Mistake 3: Storing Derived Data Without a Maintenance Strategy¶
-- Denormalization is fine, but you MUST maintain consistency
ALTER TABLE products ADD COLUMN avg_rating DECIMAL(3,2);
-- BAD: Update manually and hope developers remember
-- GOOD: Use a trigger
CREATE OR REPLACE FUNCTION update_product_rating()
RETURNS TRIGGER AS $$
BEGIN
UPDATE products SET
avg_rating = (SELECT AVG(rating) FROM reviews WHERE product_id = NEW.product_id),
review_count = (SELECT COUNT(*) FROM reviews WHERE product_id = NEW.product_id)
WHERE id = NEW.product_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_update_product_rating
AFTER INSERT OR UPDATE OR DELETE ON reviews
FOR EACH ROW EXECUTE FUNCTION update_product_rating();
Mistake 4: OFFSET-Based Pagination¶
-- BAD: OFFSET scans and discards rows. Page 1000 reads 20,000 rows!
SELECT * FROM products ORDER BY created_at DESC LIMIT 20 OFFSET 19980;
-- β Reads 20,000 rows, discards 19,980, returns 20.
-- GOOD: Keyset pagination (cursor-based)
SELECT * FROM products
WHERE created_at < $last_seen_created_at
ORDER BY created_at DESC
LIMIT 20;
-- β Uses index, reads exactly 20 rows regardless of page number.
Mistake 5: The God Table (EAV Anti-Pattern)¶
-- BAD: Entity-Attribute-Value pattern
CREATE TABLE properties (
entity_type VARCHAR(50), -- 'product', 'user', etc.
entity_id BIGINT,
key VARCHAR(100), -- 'color', 'size', 'price'
value TEXT -- everything is a string!
);
-- Problems:
-- 1. No type safety (price stored as text)
-- 2. No constraints (can't enforce "price >= 0")
-- 3. No foreign keys
-- 4. Terrible query performance (pivot queries are slow)
-- 5. No schema documentation
-- GOOD: Use proper tables with typed columns
-- If attributes vary by category, consider:
-- 1. PostgreSQL JSONB column for flexible attributes
-- 2. Separate tables per category (product_electronics, product_clothing)
-- 3. Hybrid: common columns in products + JSONB for category-specific
Mistake 6: Not Capturing Historical Data¶
-- BAD: Only storing current state
-- If a product's price changes, all historical orders show the new price!
-- GOOD: Capture state at the time of the event
CREATE TABLE order_items (
product_id BIGINT REFERENCES products(id),
quantity INT,
unit_price DECIMAL(10,2), -- price AT TIME OF ORDER, not current price
product_name VARCHAR(255) -- optional: snapshot of product name at order time
);
Mistake 7: Premature Denormalization¶
Phase: Before you have any users
Developer: "Let's denormalize everything for performance!"
Reality check:
- A properly indexed PostgreSQL can handle most workloads up to 100M rows
- Start normalized, measure, then denormalize where measurements show problems
- Premature denormalization creates maintenance nightmares with no measurable benefit
Rule of thumb: If you have fewer than 10M rows and your queries use indexes,
you probably don't need denormalization.
Mistake 8: Ignoring NULL Semantics¶
-- Surprise: NULL comparisons don't work as expected
SELECT * FROM products WHERE category_id != 5;
-- Does NOT return rows where category_id IS NULL!
-- FIX: Handle NULLs explicitly
SELECT * FROM products WHERE category_id != 5 OR category_id IS NULL;
-- Or use: WHERE category_id IS DISTINCT FROM 5;
-- Also: COUNT(column) ignores NULLs, COUNT(*) counts all rows
SELECT COUNT(phone) FROM customers; -- only counts non-NULL phones
SELECT COUNT(*) FROM customers; -- counts all customers
Mistake 9: Storing Comma-Separated Values¶
-- BAD: Storing multiple values in a single column
CREATE TABLE products (
tags VARCHAR(500) -- "electronics,wireless,bluetooth"
);
-- Problems:
-- 1. Can't index individual tags efficiently
-- 2. Can't enforce referential integrity
-- 3. "Find products with tag 'wireless'" requires LIKE '%wireless%' (slow, incorrect)
-- 4. Can't JOIN with a tags table
-- 5. Violates 1NF
-- GOOD: Use a junction table
CREATE TABLE tags (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL UNIQUE
);
CREATE TABLE product_tags (
product_id BIGINT REFERENCES products(id),
tag_id INT REFERENCES tags(id),
PRIMARY KEY (product_id, tag_id)
);
-- Or use PostgreSQL arrays with GIN index
CREATE TABLE products (
tags TEXT[]
);
CREATE INDEX idx_products_tags ON products USING GIN(tags);
SELECT * FROM products WHERE tags @> ARRAY['wireless'];
Mistake 10: Not Planning for Schema Evolution¶
-- BAD: No migration tool, manual ALTER TABLEs in production
-- GOOD: Use a migration tool (Flyway, Alembic, golang-migrate)
-- Each migration is a versioned, reversible SQL file:
-- V001__create_customers.sql
CREATE TABLE customers (...);
-- V002__add_phone_to_customers.sql
ALTER TABLE customers ADD COLUMN phone VARCHAR(20);
-- V003__create_orders.sql
CREATE TABLE orders (...);
-- Benefits:
-- 1. Every schema change is version-controlled
-- 2. Reproducible across environments (dev, staging, prod)
-- 3. Rollback capability
-- 4. Team collaboration (no conflicting manual changes)
11. Exercises¶
Exercise 1: Complete the E-Commerce Schema¶
The ShopWave schema above is missing several features. Extend it to support:
- Product variants: Products can have variants (e.g., "Blue, Large", "Red, Small") each with its own price, SKU, and stock quantity. Design the tables.
- Order status history: Instead of a single status field, track every status change with a timestamp and the user who made the change.
- Seller payouts: Track commission calculations and payout history for sellers.
Write the CREATE TABLE statements for each extension and explain how they integrate with the existing schema.
Exercise 2: Normalization Analysis¶
Consider this denormalized order table:
CREATE TABLE flat_orders (
order_id INT,
customer_name VARCHAR(100),
customer_email VARCHAR(255),
product_name VARCHAR(255),
product_sku VARCHAR(50),
product_price DECIMAL(10,2),
quantity INT,
order_total DECIMAL(10,2),
order_date DATE,
shipping_street VARCHAR(255),
shipping_city VARCHAR(100),
shipping_zip VARCHAR(20)
);
- List all functional dependencies.
- Identify the candidate key(s).
- Determine whether the table is in 1NF, 2NF, 3NF, and BCNF.
- Decompose into BCNF, showing each decomposition step.
- Verify that your decomposition is lossless-join.
Exercise 3: Index Design Challenge¶
Given the ShopWave schema and these query patterns:
-- Q1: Products on sale (price < original_price) in a category
SELECT * FROM products
WHERE category_id = $1 AND sale_price < price AND is_active = TRUE
ORDER BY (price - sale_price) DESC
LIMIT 20;
-- Q2: Customer's order with specific status in date range
SELECT * FROM orders
WHERE customer_id = $1 AND status = $2
AND created_at BETWEEN $3 AND $4
ORDER BY created_at DESC;
-- Q3: Products reviewed by a specific customer
SELECT p.name, r.rating, r.created_at
FROM reviews r
JOIN products p ON r.product_id = p.id
WHERE r.customer_id = $1
ORDER BY r.created_at DESC;
-- Q4: Sellers with the most orders this month
SELECT s.company_name, COUNT(DISTINCT o.id) AS order_count
FROM sellers s
JOIN products p ON s.id = p.seller_id
JOIN order_items oi ON p.id = oi.product_id
JOIN orders o ON oi.order_id = o.id
WHERE o.created_at >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY s.id
ORDER BY order_count DESC
LIMIT 10;
For each query: 1. Design the optimal index(es). 2. Write the expected EXPLAIN plan. 3. Estimate the selectivity of each predicate.
Exercise 4: Transaction Design¶
Design the transaction for the following business operation:
Apply a bulk discount: An admin wants to reduce the price of all products in category "Electronics" by 15%, capping the discount at $50 per product, and logging the change in an audit table.
Requirements: - No product should be visible to customers at an intermediate state (partially discounted) - The operation should not block product reads for more than 1 second - All changes must be logged with before/after values
Write the transaction SQL and specify the isolation level.
Exercise 5: Design Your Own Database¶
Choose ONE of the following scenarios and complete a full database design (Phases 1-6):
Option A: Library Management System - Patrons, books (multiple copies), checkouts, reservations, fines - Patrons can check out up to 5 books for 14 days - Late returns incur $0.25/day fine - Books can be reserved if all copies are checked out - Librarians can add/remove books and manage patron accounts
Option B: Restaurant Reservation System - Restaurants, tables, reservations, customers, waitlists - Restaurants have tables of different sizes (2, 4, 6, 8 seats) - Reservations are for a specific date/time and party size - Waitlist when no tables are available - Customers can leave reviews for restaurants
Option C: Online Learning Platform - Courses, lessons, students, enrollments, progress tracking, quizzes - Courses have multiple lessons in a defined order - Students track progress (completed lessons, quiz scores) - Instructors create and manage courses - Certificates issued upon course completion
For your chosen scenario, produce: 1. Functional requirements (at least 8) 2. Data requirements with volumetric estimates 3. ER diagram with cardinalities 4. Normalized relational schema (CREATE TABLE statements) 5. Indexing strategy with justification 6. Two critical transactions with isolation level and locking strategy 7. Three representative queries with expected explain plans
Exercise 6: Critique This Design¶
Find and fix at least 8 problems in the following schema:
CREATE TABLE users (
id INT,
name TEXT,
email TEXT,
password TEXT,
created DATE
);
CREATE TABLE products (
id INT,
title TEXT,
price FLOAT,
category TEXT,
tags TEXT, -- comma-separated: "electronics,sale,new"
stock INT
);
CREATE TABLE orders (
id INT,
user_email TEXT,
product_ids TEXT, -- comma-separated: "1,5,12"
quantities TEXT, -- comma-separated: "2,1,3"
total FLOAT,
status TEXT,
date TEXT
);
For each problem: 1. Identify the issue. 2. Explain why it is problematic. 3. Provide the corrected SQL.
Exercise 7: Scaling the E-Commerce Design¶
The ShopWave platform has grown to: - 50 million customers - 10 million products - 500 million orders - 1.5 billion order items - 100 million reviews
The current single-node PostgreSQL can no longer handle the load. Propose a scaling strategy:
- Which tables should be partitioned? By what key?
- Which tables need read replicas?
- Which data should be moved to Redis (caching)?
- Should any data be moved to a different database type (e.g., Elasticsearch for search, Cassandra for order history)? Justify.
- How would you handle the transition from a single database to a distributed architecture without downtime?
Exercise 8: Social Media Feed Design¶
Using the SocialBuzz schema from Section 8:
- Implement the hybrid feed model: write the SQL for both the push path (regular users creating posts) and the pull path (merging celebrity posts at read time).
- A celebrity with 5 million followers changes their username. How do you update the feed_items table efficiently?
- Design a "trending posts" feature: show the 50 posts with the most likes in the last 24 hours. What indexes do you need? How do you handle the write load on like_count?
- A user blocks another user. What changes do you need to make to the feed system?
Exercise 9: Migration Planning¶
You need to add a "product variants" feature to the live ShopWave database (see Exercise 1). Write a zero-downtime migration plan:
- List the migration steps in order.
- For each step, specify whether it requires a lock and for how long.
- Describe the backward-compatible application changes needed between migration steps.
- What is your rollback plan if something goes wrong mid-migration?
Exercise 10: Design Review¶
Review the complete ShopWave schema (Phase 3 DDL) against the Design Review Checklist (Section 9). For each checklist item that is NOT yet satisfied, describe the gap and provide the SQL fix. Aim to find at least 5 gaps.
12. References¶
- Elmasri, R. & Navathe, S. (2016). Fundamentals of Database Systems, 7th Edition. Pearson. Chapters 3-9.
- Garcia-Molina, H., Ullman, J., & Widom, J. (2008). Database Systems: The Complete Book, 2nd Edition. Pearson.
- Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media. Chapters 2-3.
- Winand, M. (2012). SQL Performance Explained. https://use-the-index-luke.com/
- PostgreSQL Documentation. "Partitioning." https://www.postgresql.org/docs/current/ddl-partitioning.html
- PostgreSQL Documentation. "Concurrency Control." https://www.postgresql.org/docs/current/mvcc.html
- Karwin, B. (2010). SQL Antipatterns. Pragmatic Bookshelf.
- Kleppmann, M. (2012). "Designing Data-Intensive Applications: Partitioning." Chapter 6.
- Sadalage, P. & Fowler, M. (2012). NoSQL Distilled. Addison-Wesley. (For polyglot persistence comparisons.)
- Twitter Engineering Blog. (2013). "How Twitter Stores 250 Million Tweets a Day."
Previous: 15. NewSQL and Modern Trends