Introduction to Database Systems
Introduction to Database Systems¶
Previous: Overview | Next: Relational Model
A database system is one of the most important pieces of software infrastructure in modern computing. From banking transactions and airline reservations to social media feeds and scientific research, databases underpin virtually every application that manages persistent, shared data. This lesson introduces the fundamental concepts, architecture, and terminology that form the foundation of database theory.
Table of Contents¶
- What Is a Database?
- Why Not Just Use Files?
- Database Management Systems
- Brief History of Database Systems
- Three-Schema Architecture
- Data Independence
- ANSI/SPARC Architecture
- Data Models
- Database Users and Roles
- Database System Architecture
- Exercises
1. What Is a Database?¶
A database is an organized collection of logically related data, designed to meet the information needs of multiple users in an organization. More precisely:
Database: A shared, integrated collection of persistent data that provides a controlled, reliable, and efficient mechanism for definition, construction, manipulation, and sharing of data among various users and applications.
Key properties of a database:
- Persistent: Data survives beyond the process that created it
- Shared: Multiple users and applications can access data concurrently
- Integrated: Data is collected in a unified structure, minimizing redundancy
- Managed: Access is controlled by software that enforces rules and constraints
Database vs. Data¶
It is important to distinguish between raw data and a database:
Data: Individual facts (e.g., "Alice", "29", "Engineering")
Information: Data with context and meaning
("Alice is 29 years old and works in Engineering")
Database: Organized collection of related data with
structure, constraints, and access control
A Simple Example¶
Consider a university that needs to track students, courses, and enrollments:
STUDENT table:
+--------+-----------+------+--------+
| Stu_ID | Name | Year | GPA |
+--------+-----------+------+--------+
| S001 | Alice Kim | 3 | 3.85 |
| S002 | Bob Park | 2 | 3.42 |
| S003 | Carol Lee | 4 | 3.91 |
+--------+-----------+------+--------+
COURSE table:
+-----------+---------------------+---------+
| Course_ID | Title | Credits |
+-----------+---------------------+---------+
| CS101 | Intro to CS | 3 |
| CS301 | Database Theory | 3 |
| MA201 | Linear Algebra | 4 |
+-----------+---------------------+---------+
ENROLLMENT table:
+--------+-----------+-------+
| Stu_ID | Course_ID | Grade |
+--------+-----------+-------+
| S001 | CS101 | A |
| S001 | CS301 | A+ |
| S002 | CS101 | B+ |
| S003 | MA201 | A |
+--------+-----------+-------+
This structured representation allows us to answer queries like: - "What courses is Alice enrolled in?" - "How many students are taking CS101?" - "What is the average GPA of students in CS301?"
2. Why Not Just Use Files?¶
Before databases existed, applications stored data in flat files. While files are simple, they have fundamental limitations that motivated the development of database systems.
The File-Based Approach¶
ββββββββββββββββ
β Application β
β Program 1 ββββββββΊ student_records.dat
ββββββββββββββββ
ββββββββββββββββ
β Application β
β Program 2 ββββββββΊ course_records.dat
ββββββββββββββββ
ββββββββββββββββ
β Application β
β Program 3 ββββββββΊ enrollment.dat
ββββββββββββββββ
Each application manages its own files independently.
Problems with File-Based Systems¶
| Problem | Description | Example |
|---|---|---|
| Data Redundancy | Same data stored in multiple files | Student name in student file AND enrollment file |
| Data Inconsistency | Redundant copies become out of sync | Name changed in one file but not another |
| Program-Data Dependence | File format changes require program changes | Adding a field to a record breaks existing code |
| Limited Data Sharing | Each application has its own files | Registrar and financial aid cannot share data |
| No Concurrent Access | Multiple users cannot safely update simultaneously | Two registrars editing the same record |
| No Recovery Mechanism | Data loss from crashes is permanent | Power failure during file write corrupts data |
| No Security Control | File-level access only, no fine-grained control | Cannot restrict access to specific fields |
| No Integrity Enforcement | No centralized rules for valid data | GPA of 5.0 or negative credits can be stored |
A Concrete Example of Redundancy and Inconsistency¶
Suppose the Registrar and Financial Aid offices each maintain their own files:
Registrar's file (students.txt):
S001, Alice Kim, Computer Science, 3.85
Financial Aid's file (financial.txt):
S001, Alice Kim, Computer Science, Need-Based
Alice changes her major to Data Science...
The Registrar updates their file, but Financial Aid's file still says
"Computer Science." Now the data is INCONSISTENT.
The Database Approach¶
A DBMS solves these problems by centralizing data management:
ββββββββββββββββ
β Application βββββ
β Program 1 β β
ββββββββββββββββ β ββββββββββββ ββββββββββββββββ
ββββββββββββββββ βββββΊβ β β β
β Application βββββ€ β DBMS βββββΊβ Database β
β Program 2 β βββββΊβ β β β
ββββββββββββββββ β ββββββββββββ ββββββββββββββββ
ββββββββββββββββ β
β Application βββββ
β Program 3 β
ββββββββββββββββ
All applications go through the DBMS to access a single database.
3. Database Management Systems¶
A Database Management System (DBMS) is the software that sits between users/applications and the stored data. It provides a systematic way to create, retrieve, update, and manage data.
Core Functions of a DBMS¶
- Data Definition: Define the structure (schema) of the database
- Data Manipulation: Insert, update, delete, and retrieve data
- Data Control: Manage security, integrity, and concurrent access
- Data Administration: Backup, recovery, performance tuning
DBMS Components¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DBMS Software β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Query β β Transaction β β Storage β β
β β Processor β β Manager β β Manager β β
β β β β β β β β
β β - Parser β β - Scheduler β β - Buffer Mgr β β
β β - Optimizer β β - Lock Mgr β β - File Mgr β β
β β - Executor β β - Recovery β β - Disk Mgr β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Catalog β β Authorizationβ β Integrity β β
β β Manager β β Manager β β Manager β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Stored Data β
β (Files on Disk) β
ββββββββββββββββββββ
Advantages of the DBMS Approach¶
| Advantage | Description |
|---|---|
| Data Abstraction | Users see logical structure, not physical storage details |
| Minimal Redundancy | Centralized storage with controlled redundancy |
| Consistency | Integrity constraints enforced by DBMS |
| Data Sharing | Multiple users/apps access same data |
| Concurrency Control | Safe simultaneous access via locking/MVCC |
| Recovery | Automatic recovery from failures |
| Security | Fine-grained access control (table, row, column level) |
| Standards Enforcement | SQL standard, naming conventions, data formats |
When NOT to Use a DBMS¶
A DBMS is not always the right choice. Consider simpler alternatives when:
- The database and applications are simple, well-defined, and not expected to change
- There are stringent real-time requirements that a DBMS overhead cannot meet
- There is no need for multi-user access
- The data volume is very small (a few kilobytes)
- The application requires direct, low-level access to storage
Examples: embedded sensor firmware, simple configuration files, small scripts.
4. Brief History of Database Systems¶
Understanding how databases evolved helps appreciate why current systems work the way they do.
Timeline¶
1960s 1970s 1980s 1990s 2000s 2010s+
β β β β β β
βΌ βΌ βΌ βΌ βΌ βΌ
Flat Files Relational SQL Standard Object- NoSQL NewSQL
Hierarchical Model Commercial Relational Movement Distributed
Network (Codd 1970) RDBMS Boom DBMS (2009+) HTAP
(IMS, IDMS) System R Oracle, PostgreSQL MongoDB CockroachDB
INGRES DB2, SQL Informix Cassandra TiDB
Server Redis Google Spanner
Era 1: Pre-Relational (1960s)¶
Hierarchical Model (IBM IMS, 1966): - Data organized as tree structures (parent-child relationships) - Fast for predefined queries along the hierarchy - Inflexible: changing the tree structure required rewriting applications
Department
/ \
Employee Project
|
Dependent
Network Model (CODASYL, 1969): - Generalization of hierarchical model: records can have multiple parents - More flexible, but complex pointer-based navigation - Programmer must know the exact access path
Student ββββββββ Course
β \ / \ β
β \/ \ β
β /\ \ β
βΌ / \ βΌ βΌ
Advisor Enrollment
Era 2: The Relational Revolution (1970s)¶
Edgar F. Codd published "A Relational Model of Data for Large Shared Data Banks" in 1970, proposing that data be represented as mathematical relations (tables).
Key innovations: - Declarative queries: Specify what data you want, not how to get it - Data independence: Physical storage changes do not affect applications - Mathematical foundation: Relational algebra and relational calculus - Simplicity: All data represented uniformly as tables
System R (IBM, 1974-1979): First implementation of the relational model. Introduced SQL (originally SEQUEL).
INGRES (UC Berkeley, 1973-1979): Concurrent independent implementation. Used QUEL query language.
Era 3: Commercial RDBMS (1980s-1990s)¶
The relational model proved its worth and commercial systems proliferated:
| System | Year | Notable Features |
|---|---|---|
| Oracle | 1979 | First commercial RDBMS |
| IBM DB2 | 1983 | System R successor |
| SQL Server | 1989 | Microsoft's RDBMS |
| PostgreSQL | 1996 | Open-source, extensible |
| MySQL | 1995 | Open-source, web-friendly |
SQL standardization: - SQL-86: First ANSI standard - SQL-92: Major revision (subqueries, JOINs, CASE) - SQL:1999: Recursive queries, triggers, object-relational features - SQL:2003: XML, window functions, sequences - SQL:2011: Temporal data - SQL:2016: JSON support - SQL:2023: Property graph queries, multi-dimensional arrays
Era 4: Object-Relational and Beyond (1990s-2000s)¶
Object-Relational DBMS extended the relational model with: - User-defined types and functions - Inheritance - Complex objects (arrays, nested tables) - PostgreSQL is a prime example
Object-Oriented DBMS (OODBMS): - Store objects directly (no impedance mismatch) - Never achieved mainstream adoption - Examples: ObjectStore, db4o
Era 5: NoSQL Movement (2009+)¶
Driven by web-scale companies needing to handle massive data volumes:
| Type | Examples | Best For |
|---|---|---|
| Key-Value | Redis, DynamoDB | Caching, sessions |
| Document | MongoDB, CouchDB | Semi-structured data |
| Column-Family | Cassandra, HBase | Time-series, analytics |
| Graph | Neo4j, JanusGraph | Connected data, social networks |
Key concepts: - CAP Theorem: Cannot simultaneously guarantee Consistency, Availability, and Partition tolerance - BASE: Basically Available, Soft state, Eventually consistent (vs. ACID) - Schema flexibility: Schema-on-read vs schema-on-write
Era 6: NewSQL and Distributed SQL (2010s+)¶
Combining the scalability of NoSQL with the guarantees of relational systems:
| System | Approach |
|---|---|
| Google Spanner | Globally distributed, TrueTime |
| CockroachDB | Distributed PostgreSQL-compatible |
| TiDB | MySQL-compatible, HTAP |
| YugabyteDB | PostgreSQL-compatible distributed |
| Vitess | MySQL sharding middleware |
5. Three-Schema Architecture¶
The three-schema architecture (also called the three-level architecture) separates a database system into three abstraction levels. This separation is the foundation of data independence.
The Three Levels¶
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β External Level β
β (Individual User Views) β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β View 1 β β View 2 β β View 3 β ... β
β β(Students)β β(Faculty) β β(Finance) β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β β β β
β ββββββββββββββΌββββββββββββββ β
β β β
β External/Conceptual Mapping β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β βΌ β
β Conceptual Level β
β (Community User View) β
β β
β Describes the WHAT: β
β - All entities, attributes, relationships β
β - Integrity constraints β
β - Security and authorization rules β
β β
β Conceptual/Internal Mapping β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β βΌ β
β Internal Level β
β (Physical Storage View) β
β β
β Describes the HOW: β
β - File organization (heap, sorted, hashed) β
β - Index structures (B+ tree, hash index) β
β - Record layout and compression β
β - Buffer management policies β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
External Level (View Level)¶
The external level describes the part of the database that is relevant to a particular user or application. Different users see different views of the same underlying data.
-- View for the Registrar (sees academic info)
CREATE VIEW registrar_view AS
SELECT s.student_id, s.name, s.major, s.gpa,
c.course_id, c.title, e.grade
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id;
-- View for Financial Aid (sees financial info)
CREATE VIEW financial_aid_view AS
SELECT s.student_id, s.name, s.financial_status,
s.scholarship_amount, s.loan_balance
FROM students s;
-- View for the Student Portal (limited self-view)
CREATE VIEW student_portal_view AS
SELECT s.name, s.gpa, c.title, e.grade
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id
WHERE s.student_id = CURRENT_USER_ID();
Conceptual Level (Logical Level)¶
The conceptual level describes the logical structure of the entire database for all users. It includes:
- All entity types and their attributes
- Relationships between entities
- Integrity constraints
- Security and authorization information
Conceptual Schema:
STUDENT(student_id PK, name, major, gpa, financial_status,
scholarship_amount, loan_balance)
COURSE(course_id PK, title, credits, department)
ENROLLMENT(student_id FK, course_id FK, grade, semester)
PK(student_id, course_id, semester)
INSTRUCTOR(instructor_id PK, name, department, salary)
TEACHES(instructor_id FK, course_id FK, semester)
PK(instructor_id, course_id, semester)
Constraints:
- gpa BETWEEN 0.0 AND 4.0
- credits > 0
- grade IN ('A+','A','A-','B+','B','B-','C+','C','C-','D','F')
Internal Level (Physical Level)¶
The internal level describes how data is physically stored on disk:
Internal Schema (conceptual representation):
STUDENT table:
- Storage: Heap file with overflow pages
- Primary index: B+ tree on student_id (clustered)
- Secondary index: Hash index on name
- Record format: Fixed-length (student_id: 8 bytes,
name: 50 bytes VARCHAR, major: 30 bytes, ...)
- Compression: Dictionary encoding on 'major' column
- Partition: Range partition on student_id
ENROLLMENT table:
- Storage: Sorted file on (student_id, course_id)
- Index: Composite B+ tree on (student_id, course_id, semester)
- Record format: Fixed-length, 32 bytes per record
6. Data Independence¶
Data independence is the capacity to change the schema at one level without having to change the schema at the next higher level. This is the primary benefit of the three-schema architecture.
Logical Data Independence¶
The ability to change the conceptual schema without changing the external schema or application programs.
Example: Adding a column to the STUDENT table
Before:
STUDENT(student_id, name, major, gpa)
After:
STUDENT(student_id, name, major, gpa, email, phone)
Impact:
- External views that don't use email/phone: NO CHANGE needed
- Applications querying only name and gpa: NO CHANGE needed
- Only applications that need email/phone must be updated
Changes that benefit from logical data independence: - Adding or removing attributes from a table - Splitting a table into two (with a view to maintain the original appearance) - Combining two tables into one - Adding new relationships or entity types
Physical Data Independence¶
The ability to change the internal schema without changing the conceptual schema or external views.
Example: Changing the index structure
Before:
STUDENT.name indexed with B+ tree
After:
STUDENT.name indexed with hash index
Impact:
- Conceptual schema: NO CHANGE (still STUDENT table with name column)
- External views: NO CHANGE
- Application programs: NO CHANGE
- Only query performance characteristics change
Changes that benefit from physical data independence: - Changing file organization (heap to sorted) - Adding or removing indexes - Moving data to different storage devices - Changing buffer management strategies - Compressing or partitioning data differently
Why Data Independence Matters¶
Without Data Independence: With Data Independence:
App 1 βββΊ Physical Storage App 1 βββΊ View 1 ββ
App 2 βββΊ Physical Storage App 2 βββΊ View 2 ββ€
App 3 βββΊ Physical Storage App 3 βββΊ View 3 ββ€
βΌ
Change storage format? Conceptual Schema
β Rewrite ALL applications! β
βΌ
Internal Schema
Change storage format?
β Applications unaffected!
Practical Reality¶
In practice, achieving complete data independence is difficult:
- Logical independence is harder to achieve than physical independence
- Performance considerations often leak through abstraction layers
- ORM frameworks provide partial logical independence
- Views provide external-level independence but can have performance implications
7. ANSI/SPARC Architecture¶
The ANSI/SPARC (American National Standards Institute / Standards Planning and Requirements Committee) architecture, proposed in 1975, formalized the three-schema approach. While no DBMS implements it exactly, it remains the conceptual blueprint for all modern database systems.
Full Architecture Diagram¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USERS β
β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββββββββββ β
β β Casual β β App β β Parametricβ β DBA β β
β β User β β Prog β β User β β β β
β ββββββ¬ββββββ ββββββ¬βββββ βββββββ¬βββββ βββββββββ¬βββββββββ β
β β β β β β
βββββββββΌβββββββββββββββΌββββββββββββββΌββββββββββββββββΌβββββββββββ
β β β β
βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXTERNAL LEVEL β
β β
β External External External DDL β
β Schema 1 Schema 2 Schema 3 Compiler β
β β β β β β
β ββββββββββββββ΄βββββββββββββ β β
β β β β
β External/Conceptual β β
β Mapping β β
ββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONCEPTUAL LEVEL β
β β
β Conceptual Schema β
β (defined by DBA) β
β β β
β Conceptual/Internal β
β Mapping β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INTERNAL LEVEL β
β β
β Internal Schema β
β (storage structures, indexes) β
β β β
β Internal/Physical β
β Mapping β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββ
β STORED β
β DATABASE β
ββββββββββββββββ
Mappings Between Levels¶
The architecture defines mappings that translate requests and results between levels:
- External/Conceptual Mapping: Translates between a user's view and the conceptual schema
- Conceptual/Internal Mapping: Translates between the logical structure and physical storage
- Internal/Physical Mapping: Translates between DBMS internal structures and OS file system
# Conceptual illustration of mapping (not real DBMS code)
# External Schema: User sees a "student_summary" view
class StudentSummaryView:
student_name: str # Maps to STUDENT.first_name + ' ' + STUDENT.last_name
course_count: int # Maps to COUNT(*) from ENROLLMENT
average_grade: float # Maps to AVG(ENROLLMENT.grade_points)
# Conceptual Schema: Actual tables
class Student:
student_id: int # PK
first_name: str
last_name: str
major: str
class Enrollment:
student_id: int # FK -> Student
course_id: str # FK -> Course
grade_points: float
# Internal Schema: Physical storage
class StudentStorage:
file_type: str = "B+ tree clustered on student_id"
record_size: int = 128 # bytes
page_size: int = 8192 # bytes
records_per_page: int = 64
Key Interfaces¶
| Interface | Between | Purpose |
|---|---|---|
| DDL (Data Definition Language) | DBA and Conceptual Level | Define schema: CREATE TABLE, ALTER TABLE |
| VDL (View Definition Language) | Users and External Level | Define views: CREATE VIEW |
| SDL (Storage Definition Language) | DBA and Internal Level | Define storage: CREATE INDEX, TABLESPACE |
| DML (Data Manipulation Language) | Users and Data | Manipulate data: SELECT, INSERT, UPDATE, DELETE |
8. Data Models¶
A data model is a collection of concepts for describing the structure, operations, and constraints of a database. Different data models provide different levels of abstraction.
Categories of Data Models¶
High-level Low-level
(Conceptual) (Physical)
β β
βΌ βΌ
ββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββ
β ER Model β β Relational β β Record-basedβ β Physical β
β UML β β Model β β Models β β Data β
β ORM β β β β (Network, β β Model β
β β β β β Hierarch.) β β β
ββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββ
Conceptual Representational Implementation Physical
(what) (what + some how) (how) (how exactly)
Summary of Major Data Models¶
| Data Model | Structure | Query Language | Era |
|---|---|---|---|
| Hierarchical | Trees | DL/1 | 1960s |
| Network | Graphs (CODASYL) | Navigational | 1960s |
| Relational | Tables (relations) | SQL | 1970s+ |
| Entity-Relationship | Entities & relationships | N/A (design tool) | 1976 |
| Object-Oriented | Objects, classes | OQL | 1990s |
| Object-Relational | Extended tables | SQL + extensions | 1990s |
| Document | JSON/BSON documents | MongoDB Query | 2000s |
| Key-Value | Key-value pairs | GET/SET | 2000s |
| Column-Family | Column groups | CQL | 2000s |
| Graph | Nodes & edges | Cypher, SPARQL | 2000s |
The Relational Model (Preview)¶
Since the relational model is the focus of this course, here is a brief preview:
Relation (Table):
- A set of tuples (rows)
- Each tuple has the same set of attributes (columns)
- Each attribute has a domain (allowed values)
- Order of tuples does not matter
- Order of attributes does not matter
- No duplicate tuples
- Each cell contains an atomic value
βββββββββββββββββββββββββββββββββββββββββββ
β EMPLOYEE (Relation) β
ββββββββββββ¬βββββββββββ¬ββββββββ¬ββββββββββββ€
β emp_id β name β age β dept β β Attributes
ββββββββββββΌβββββββββββΌββββββββΌββββββββββββ€
β E001 β Alice β 29 β CS β β Tuple 1
β E002 β Bob β 35 β EE β β Tuple 2
β E003 β Carol β 42 β CS β β Tuple 3
ββββββββββββ΄βββββββββββ΄ββββββββ΄ββββββββββββ
Domain(emp_id) = {E001, E002, ..., E999}
Domain(age) = positive integers
Domain(dept) = {CS, EE, ME, CE, ...}
9. Database Users and Roles¶
A database system serves many types of users, each with different needs and levels of technical expertise.
User Classification¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Database Users β
β β
β βββββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Actors on the β β Actors Behind the β β
β β Scene β β Scene β β
β β β β β β
β β - Database Admin β β - DBMS Designers β β
β β - Database Designerβ β - Tool Developers β β
β β - End Users β β - System Administrators β β
β β - App Programmers β β β β
β βββββββββββββββββββββββ ββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Actors on the Scene¶
| Role | Responsibilities | Tools Used |
|---|---|---|
| Database Administrator (DBA) | Schema definition, storage structure, access control, backup/recovery, performance tuning | DDL, DCL, monitoring tools |
| Database Designer | Conceptual and logical design, ER modeling, normalization, view definition | ER tools, UML, CASE tools |
| Application Programmer | Write programs that access the database, embed SQL in host language | SQL, ORM, APIs |
| End Users | Query and update the database through applications or ad-hoc queries | Forms, reports, SQL |
Types of End Users¶
End Users
β
βββ Casual Users
β Access database occasionally
β Use ad-hoc queries (SQL or GUI)
β Example: Manager running monthly report
β
βββ Naive (Parametric) Users
β Use pre-written applications repeatedly
β Do not write queries
β Example: Bank teller, airline reservation agent
β
βββ Sophisticated Users
β Familiar with DBMS facilities
β Write complex queries
β Example: Data analyst, scientist
β
βββ Standalone Users
Personal database
Use off-the-shelf software
Example: Tax preparation software user
The DBA in Detail¶
The Database Administrator is the central authority for managing the database:
# Typical DBA responsibilities (conceptual)
class DBA:
"""Database Administrator responsibilities"""
def schema_management(self):
"""Define and modify database schema"""
# CREATE TABLE, ALTER TABLE, CREATE INDEX
# Define views for different user groups
# Manage schema migrations
def security_management(self):
"""Control access to the database"""
# GRANT/REVOKE privileges
# Create roles and assign users
# Audit access logs
def performance_tuning(self):
"""Optimize database performance"""
# Analyze query execution plans
# Create/drop indexes based on workload
# Configure buffer pool, cache sizes
# Partition large tables
def backup_and_recovery(self):
"""Ensure data durability"""
# Schedule regular backups (full, incremental)
# Test recovery procedures
# Manage transaction logs
# Handle disaster recovery
def capacity_planning(self):
"""Plan for growth"""
# Monitor disk usage trends
# Estimate future storage needs
# Plan hardware upgrades
10. Database System Architecture¶
Modern database systems can be deployed in various architectural configurations.
Centralized Architecture¶
ββββββββββββββββββββββββββββββββββββββββββ
β Centralized System β
β β
β ββββββββββββββββββββββββββββββββββ β
β β Application + β β
β β DBMS Software β β
β ββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββ β
β β Database β β
β ββββββββββββββββββββββββββββββββββ β
β β
β Users access via dumb terminals β
ββββββββββββββββββββββββββββββββββββββββββ
Client-Server Architecture¶
Two-Tier:
ββββββββββββ ββββββββββββ ββββββββββββ
β Client 1 β β Client 2 β β Client 3 β
β (App + β β (App + β β (App + β
β UI) β β UI) β β UI) β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
ββββββββββββββββΌβββββββββββββββ
β Network (SQL over TCP)
βΌ
ββββββββββββββββ
β Database β
β Server β
β (DBMS + β
β Database) β
ββββββββββββββββ
Three-Tier (Web Architecture):
ββββββββββββ ββββββββββββ ββββββββββββ
β Browser β β Browser β β Mobile β
β (Thin β β (Thin β β App β
β Client) β β Client) β β β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
ββββββββββββββββΌβββββββββββββββ
β HTTP/HTTPS
βΌ
ββββββββββββββββ
β Application β Tier 2: Business Logic
β Server β (Web server, API server)
β (Flask/ β
β Django/ β
β Express) β
ββββββββ¬ββββββββ
β SQL/Protocol
βΌ
ββββββββββββββββ
β Database β Tier 3: Data Management
β Server β
β (PostgreSQL/ β
β MySQL) β
ββββββββββββββββ
Distributed Architecture¶
ββββββββββββββββ
β Client β
ββββββββ¬ββββββββ
β
ββββββββΌββββββββ
β Router / β
β Coordinatorβ
ββββββββ¬ββββββββ
β
ββββββββββββββΌβββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β Node 1 β β Node 2 β β Node 3 β
β (Shard A)β β (Shard B)β β (Shard C)β
β +Replica β β +Replica β β +Replica β
ββββββββββββ ββββββββββββ ββββββββββββ
β β β
ββββββββββββββΌβββββββββββββ
β
Replication &
Consensus Protocol
Cloud Architecture¶
βββββββββββββββββββββββββββββββββββββββββββββββ
β Cloud Provider β
β β
β βββββββββββββββββββββββββββββββββββββββ β
β β Managed Database β β
β β (RDS, Cloud SQL, Aurora, etc.) β β
β β β β
β β ββββββββββββ ββββββββββββ β β
β β β Primary β β Replica β β β
β β β Instance ββββ Instance β β β
β β ββββββββββββ ββββββββββββ β β
β β β β β
β β ββββββββββββββββββββββββββββ β β
β β β Shared Storage Layer β β β
β β β (Distributed, Durable) β β β
β β ββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββ β
β β
β Features: Auto-backup, scaling, β
β monitoring, patching, HA failover β
βββββββββββββββββββββββββββββββββββββββββββββββ
11. Exercises¶
Conceptual Questions¶
Exercise 1.1: List five disadvantages of the file-based approach to data management and explain how a DBMS addresses each one.
Exercise 1.2: Explain the difference between the three-schema architecture levels. For a university database, give an example of what each level would describe.
Exercise 1.3: Define the following terms: - (a) Data independence - (b) Data abstraction - (c) Data definition language - (d) Data manipulation language - (e) Schema vs. instance
Exercise 1.4: A company currently stores all employee data in spreadsheets. The CEO wants to migrate to a database system. Write a brief memo (5-7 bullet points) explaining the benefits of this migration and potential challenges.
Analysis Questions¶
Exercise 1.5: Classify each of the following changes as requiring a modification to the (i) external, (ii) conceptual, or (iii) internal schema. Explain whether data independence is preserved.
- (a) A new index is added to the STUDENT table
- (b) A new column
emailis added to the STUDENT table - (c) The STUDENT table is split into STUDENT_PERSONAL and STUDENT_ACADEMIC
- (d) The database file is moved from HDD to SSD
- (e) A new view is created for the financial aid office
Exercise 1.6: For each pair of database models, explain two advantages and two disadvantages of the first model compared to the second: - (a) Relational vs. Hierarchical - (b) Document (NoSQL) vs. Relational - (c) Graph vs. Relational
Exercise 1.7: Classify the following users of a hospital database system by their role (DBA, database designer, application programmer, or end user type): - (a) A person who designs the ER diagram for the patient records system - (b) A nurse entering patient vital signs through a tablet application - (c) The IT staff member who performs nightly backups and monitors query performance - (d) A doctor querying the database to find all patients with a specific diagnosis - (e) A programmer building the patient portal web application
Practical Questions¶
Exercise 1.8: Research and compare three modern DBMS (one relational, one document, one graph). For each, identify: - Primary data model - Query language - ACID support (full, partial, or none) - Typical use case - Three-schema architecture support level
Exercise 1.9: Consider the following scenario:
An e-commerce company currently uses flat files to store product catalog data (product_catalog.csv), customer data (customers.csv), and order data (orders.csv). They process about 1,000 orders per day and have 50,000 products.
Design a brief plan for migrating this data to a relational database. Include: - What tables would you create? - What are the relationships between tables? - What problems from the file-based approach would this solve? - What additional features would a DBMS provide?
Exercise 1.10: Consider the ANSI/SPARC architecture. Explain what happens at each level when a user executes the following SQL query:
SELECT name, gpa FROM student_summary WHERE gpa > 3.5;
Assume student_summary is a view defined as:
CREATE VIEW student_summary AS
SELECT student_id, first_name || ' ' || last_name AS name, gpa
FROM students;
Trace the query through: external level (view resolution) -> conceptual level (logical plan) -> internal level (physical access).
Next: Relational Model