Database Theory
Overview
This topic covers the theoretical foundations of database systems, from the relational model and normalization theory to transaction processing, query optimization, and modern distributed paradigms. These lessons provide the academic underpinning that every database practitioner, backend engineer, and data architect needs to design correct, efficient, and scalable data systems.
Prerequisites
- Basic programming experience (Python or any language)
- SQL familiarity helpful but not required (covered from first principles)
- Elementary set theory and logic (sets, relations, predicates)
- Basic understanding of file systems and data storage concepts
Lesson Plan
Phase 1: Foundations (L01-L04)
| Filename |
Difficulty |
Key Topics |
Notes |
| 01_Introduction_to_Database_Systems.md |
โญ |
DBMS, Three-Schema Architecture, Data Independence, ANSI/SPARC |
Conceptual foundation |
| 02_Relational_Model.md |
โญโญ |
Codd's Rules, Relations, Keys, Integrity Constraints, NULL Semantics |
Mathematical foundation |
| 03_Relational_Algebra.md |
โญโญ |
ฯ, ฯ, โ, รท, Query Trees, Relational Calculus, SQL Equivalence |
Formal query languages |
| 04_ER_Modeling.md |
โญโญ |
Entities, Relationships, Cardinality, EER, ER-to-Relational Mapping |
Conceptual design |
Phase 2: Design Theory (L05-L08)
| Filename |
Difficulty |
Key Topics |
Notes |
| 05_Functional_Dependencies.md |
โญโญโญ |
FDs, Armstrong's Axioms, Closure, Canonical Cover, Attribute Closure |
Formal dependency theory |
| 06_Normalization.md |
โญโญโญ |
1NF-5NF, BCNF, Decomposition, Lossless Join, Dependency Preservation |
Schema refinement |
| 07_SQL_Theory.md |
โญโญ |
DDL, DML, Subqueries, Aggregation, Views, NULL Handling, Relational Completeness |
SQL as formal language |
| 08_Constraints_and_Triggers.md |
โญโญโญ |
CHECK, UNIQUE, FK Actions, Assertions, Triggers, Active Databases |
Declarative integrity |
Phase 3: Internals (L09-L12)
| Filename |
Difficulty |
Key Topics |
Notes |
| 09_Storage_and_Indexing.md |
โญโญโญ |
Disk I/O, B+ Trees, Hash Indexes, Clustered vs Non-Clustered, Buffer Management |
Physical storage layer |
| 10_Query_Processing.md |
โญโญโญโญ |
Parsing, Optimization, Cost Estimation, Join Algorithms, Execution Plans |
Query engine internals |
| 11_Transaction_Theory.md |
โญโญโญโญ |
ACID, Serializability, Conflict/View Equivalence, Schedules, Recoverability |
Concurrency theory |
| 12_Concurrency_Control.md |
โญโญโญโญ |
2PL, Deadlock, Timestamp Ordering, MVCC, Isolation Levels, Snapshot Isolation |
Lock and version protocols |
Phase 4: Advanced Topics (L13-L16)
| Filename |
Difficulty |
Key Topics |
Notes |
| 13_Recovery_Systems.md |
โญโญโญโญ |
WAL, ARIES, Checkpointing, Undo/Redo, Shadow Paging, Media Recovery |
Crash recovery |
| 14_NoSQL_and_NewSQL.md |
โญโญโญ |
Key-Value, Document, Column-Family, Graph, CAP, BASE, NewSQL |
Beyond relational |
| 15_Distributed_Databases.md |
โญโญโญโญ |
Fragmentation, Replication, 2PC, Paxos/Raft, Distributed Joins, CAP Theorem |
Distributed systems |
| 16_Database_Design_Project.md |
โญโญโญโญ |
Full Design Lifecycle: Requirements โ ER โ Relational โ Normalization โ SQL |
Capstone project |
Recommended Learning Path
Phase 1: Foundations (L01-L04) Phase 2: Design Theory (L05-L08)
โ โ
โผ โผ
DBMS Concepts Functional Dependencies
Relational Model Normalization (1NF-5NF)
Relational Algebra SQL as Formal Language
ER Modeling Constraints & Triggers
โ โ
โโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Phase 3: Internals (L09-L12)
Storage, Indexing, B+ Trees
Query Processing & Optimization
Transactions & Concurrency
โ
โผ
Phase 4: Advanced (L13-L16)
Recovery (WAL, ARIES)
NoSQL & NewSQL Paradigms
Distributed Databases
โ
โผ
Capstone Design Project (L16)
- PostgreSQL: Practical SQL and advanced PostgreSQL features (JSON, FTS, replication, RLS)
- System_Design: Scalability, distributed systems, and architecture patterns that build on database theory
- Data_Engineering: ETL pipelines, data warehousing, and data infrastructure
- Data_Science: Statistical analysis and data manipulation that relies on sound database design
Total
- 16 lessons (4 foundations + 4 design theory + 4 internals + 4 advanced)
- Difficulty range: โญ to โญโญโญโญ
- Languages: SQL (primary), Python (supplementary)
- Key concepts: Relational model, normalization, ACID, serializability, B+ trees, ARIES, CAP theorem