Database Theory

Database Theory

Overview

This topic covers the theoretical foundations of database systems, from the relational model and normalization theory to transaction processing, query optimization, and modern distributed paradigms. These lessons provide the academic underpinning that every database practitioner, backend engineer, and data architect needs to design correct, efficient, and scalable data systems.

Prerequisites

  • Basic programming experience (Python or any language)
  • SQL familiarity helpful but not required (covered from first principles)
  • Elementary set theory and logic (sets, relations, predicates)
  • Basic understanding of file systems and data storage concepts

Lesson Plan

Phase 1: Foundations (L01-L04)

Filename Difficulty Key Topics Notes
01_Introduction_to_Database_Systems.md โญ DBMS, Three-Schema Architecture, Data Independence, ANSI/SPARC Conceptual foundation
02_Relational_Model.md โญโญ Codd's Rules, Relations, Keys, Integrity Constraints, NULL Semantics Mathematical foundation
03_Relational_Algebra.md โญโญ ฯƒ, ฯ€, โ‹ˆ, รท, Query Trees, Relational Calculus, SQL Equivalence Formal query languages
04_ER_Modeling.md โญโญ Entities, Relationships, Cardinality, EER, ER-to-Relational Mapping Conceptual design

Phase 2: Design Theory (L05-L08)

Filename Difficulty Key Topics Notes
05_Functional_Dependencies.md โญโญโญ FDs, Armstrong's Axioms, Closure, Canonical Cover, Attribute Closure Formal dependency theory
06_Normalization.md โญโญโญ 1NF-5NF, BCNF, Decomposition, Lossless Join, Dependency Preservation Schema refinement
07_SQL_Theory.md โญโญ DDL, DML, Subqueries, Aggregation, Views, NULL Handling, Relational Completeness SQL as formal language
08_Constraints_and_Triggers.md โญโญโญ CHECK, UNIQUE, FK Actions, Assertions, Triggers, Active Databases Declarative integrity

Phase 3: Internals (L09-L12)

Filename Difficulty Key Topics Notes
09_Storage_and_Indexing.md โญโญโญ Disk I/O, B+ Trees, Hash Indexes, Clustered vs Non-Clustered, Buffer Management Physical storage layer
10_Query_Processing.md โญโญโญโญ Parsing, Optimization, Cost Estimation, Join Algorithms, Execution Plans Query engine internals
11_Transaction_Theory.md โญโญโญโญ ACID, Serializability, Conflict/View Equivalence, Schedules, Recoverability Concurrency theory
12_Concurrency_Control.md โญโญโญโญ 2PL, Deadlock, Timestamp Ordering, MVCC, Isolation Levels, Snapshot Isolation Lock and version protocols

Phase 4: Advanced Topics (L13-L16)

Filename Difficulty Key Topics Notes
13_Recovery_Systems.md โญโญโญโญ WAL, ARIES, Checkpointing, Undo/Redo, Shadow Paging, Media Recovery Crash recovery
14_NoSQL_and_NewSQL.md โญโญโญ Key-Value, Document, Column-Family, Graph, CAP, BASE, NewSQL Beyond relational
15_Distributed_Databases.md โญโญโญโญ Fragmentation, Replication, 2PC, Paxos/Raft, Distributed Joins, CAP Theorem Distributed systems
16_Database_Design_Project.md โญโญโญโญ Full Design Lifecycle: Requirements โ†’ ER โ†’ Relational โ†’ Normalization โ†’ SQL Capstone project
Phase 1: Foundations (L01-L04)         Phase 2: Design Theory (L05-L08)
       โ”‚                                       โ”‚
       โ–ผ                                       โ–ผ
  DBMS Concepts                         Functional Dependencies
  Relational Model                      Normalization (1NF-5NF)
  Relational Algebra                    SQL as Formal Language
  ER Modeling                           Constraints & Triggers
       โ”‚                                       โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
                       โ–ผ
         Phase 3: Internals (L09-L12)
         Storage, Indexing, B+ Trees
         Query Processing & Optimization
         Transactions & Concurrency
                       โ”‚
                       โ–ผ
         Phase 4: Advanced (L13-L16)
         Recovery (WAL, ARIES)
         NoSQL & NewSQL Paradigms
         Distributed Databases
                       โ”‚
                       โ–ผ
         Capstone Design Project (L16)
  • PostgreSQL: Practical SQL and advanced PostgreSQL features (JSON, FTS, replication, RLS)
  • System_Design: Scalability, distributed systems, and architecture patterns that build on database theory
  • Data_Engineering: ETL pipelines, data warehousing, and data infrastructure
  • Data_Science: Statistical analysis and data manipulation that relies on sound database design

Total

  • 16 lessons (4 foundations + 4 design theory + 4 internals + 4 advanced)
  • Difficulty range: โญ to โญโญโญโญ
  • Languages: SQL (primary), Python (supplementary)
  • Key concepts: Relational model, normalization, ACID, serializability, B+ trees, ARIES, CAP theorem
to navigate between lessons