Relational Algebra
Relational Algebra¶
Previous: Relational Model | Next: ER Modeling
Relational algebra is the formal query language of the relational model. It provides a collection of operators that take one or two relations as input and produce a new relation as output. Understanding relational algebra is essential because it underlies SQL query processing: every SQL query is internally translated into a relational algebra expression that the query optimizer can then transform and improve.
Table of Contents¶
- Overview of Relational Algebra
- Unary Operations
- Set Operations
- Binary Operations: Cartesian Product and Joins
- Division
- Additional Operations
- Query Trees and Algebraic Optimization
- Relational Calculus (Brief Introduction)
- Equivalence with SQL
- Complete Worked Examples
- Exercises
1. Overview of Relational Algebra¶
What Is Relational Algebra?¶
Relational algebra is a procedural query language: it describes a sequence of operations to compute the desired result. Each operation takes one or more relations and produces a new relation.
Properties:
- Closure: The result of every operation is a relation
(enabling composition of operations)
- Set semantics: Relations are sets (no duplicate tuples)
- Foundation: Based on set theory and first-order logic
Categories of Operations¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Relational Algebra Operations β
β β
β Fundamental (cannot be expressed using other operations): β
β Ο Selection (filter rows) β
β Ο Projection (choose columns) β
β Ο Rename (rename relation/attributes) β
β βͺ Union (combine two relations) β
β β Set Difference (tuples in R but not in S) β
β Γ Cartesian Product (all combinations) β
β β
β Derived (can be expressed using fundamental operations): β
β β Join (various types) β
β β© Intersection β
β Γ· Division β
β β Assignment β
β Ξ΄ Duplicate elimination (for bag semantics) β
β Ξ³ Grouping/Aggregation β
β Ο Sorting β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Running Example Database¶
Throughout this lesson, we use the following sample database:
STUDENT:
ββββββββ¬ββββββββββββ¬βββββββ¬βββββββ
β sid β name β year β dept β
ββββββββΌββββββββββββΌβββββββΌβββββββ€
β S001 β Alice β 3 β CS β
β S002 β Bob β 2 β CS β
β S003 β Carol β 4 β EE β
β S004 β Dave β 3 β ME β
β S005 β Eve β 1 β CS β
ββββββββ΄ββββββββββββ΄βββββββ΄βββββββ
COURSE:
βββββββββ¬βββββββββββββββββββββββ¬ββββββββββ¬βββββββ
β cid β title β credits β dept β
βββββββββΌβββββββββββββββββββββββΌββββββββββΌβββββββ€
β CS101 β Intro to CS β 3 β CS β
β CS301 β Database Theory β 3 β CS β
β CS401 β Machine Learning β 4 β CS β
β EE201 β Circuit Analysis β 3 β EE β
β MA101 β Calculus I β 4 β MA β
βββββββββ΄βββββββββββββββββββββββ΄ββββββββββ΄βββββββ
ENROLLMENT:
ββββββββ¬ββββββββ¬ββββββββ
β sid β cid β grade β
ββββββββΌββββββββΌββββββββ€
β S001 β CS101 β A β
β S001 β CS301 β A+ β
β S001 β MA101 β B+ β
β S002 β CS101 β B β
β S002 β CS301 β A- β
β S003 β EE201 β A β
β S003 β CS101 β B+ β
β S004 β CS101 β C β
β S005 β CS101 β A β
β S005 β MA101 β A- β
ββββββββ΄ββββββββ΄ββββββββ
INSTRUCTOR:
ββββββββ¬βββββββββββ¬βββββββ¬βββββββββ
β iid β name β dept β salary β
ββββββββΌβββββββββββΌβββββββΌβββββββββ€
β I001 β Prof. Kimβ CS β 95000 β
β I002 β Prof. Leeβ CS β 88000 β
β I003 β Prof. Parkβ EE β 92000 β
β I004 β Prof. Choiβ MA β 85000 β
ββββββββ΄βββββββββββ΄βββββββ΄βββββββββ
2. Unary Operations¶
Unary operations take a single relation as input.
Selection (sigma)¶
The selection operation filters rows based on a condition (predicate).
Notation: Ο_condition(R)
Output: A relation containing only the tuples from R
that satisfy the condition.
Schema: Same as R (all attributes preserved)
Formal definition:
Ο_condition(R) = { t | t β R AND condition(t) is TRUE }
Examples:
1. Select CS students:
Ο_{dept='CS'}(STUDENT)
Result:
ββββββββ¬ββββββββ¬βββββββ¬βββββββ
β sid β name β year β dept β
ββββββββΌββββββββΌβββββββΌβββββββ€
β S001 β Alice β 3 β CS β
β S002 β Bob β 2 β CS β
β S005 β Eve β 1 β CS β
ββββββββ΄ββββββββ΄βββββββ΄βββββββ
2. Select 3rd-year CS students:
Ο_{dept='CS' AND year=3}(STUDENT)
Result:
ββββββββ¬ββββββββ¬βββββββ¬βββββββ
β sid β name β year β dept β
ββββββββΌββββββββΌβββββββΌβββββββ€
β S001 β Alice β 3 β CS β
ββββββββ΄ββββββββ΄βββββββ΄βββββββ
3. Select courses with 4 credits:
Ο_{credits=4}(COURSE)
Result:
βββββββββ¬βββββββββββββββββββ¬ββββββββββ¬βββββββ
β cid β title β credits β dept β
βββββββββΌβββββββββββββββββββΌββββββββββΌβββββββ€
β CS401 β Machine Learning β 4 β CS β
β MA101 β Calculus I β 4 β MA β
βββββββββ΄βββββββββββββββββββ΄ββββββββββ΄βββββββ
Selection conditions can use: - Comparison operators: =, β , <, >, β€, β₯ - Logical connectives: AND (β§), OR (β¨), NOT (Β¬) - Attribute names and constants
Projection (pi)¶
The projection operation selects specific columns (attributes) and removes duplicates.
Notation: Ο_{attr_list}(R)
Output: A relation containing only the specified attributes,
with duplicate tuples removed.
Schema: Only the attributes in attr_list
Formal definition:
Ο_{A1, A2, ..., Ak}(R) = { <t[A1], t[A2], ..., t[Ak]> | t β R }
Examples:
1. Project student names and departments:
Ο_{name, dept}(STUDENT)
Result:
βββββββββ¬βββββββ
β name β dept β
βββββββββΌβββββββ€
β Alice β CS β
β Bob β CS β
β Carol β EE β
β Dave β ME β
β Eve β CS β
βββββββββ΄βββββββ
2. Project distinct departments from STUDENT:
Ο_{dept}(STUDENT)
Result:
ββββββββ
β dept β
ββββββββ€
β CS β
β EE β
β ME β
ββββββββ
(Note: CS appears only ONCE because duplicates are removed)
3. Compose selection and projection:
"Find names of CS students"
Ο_{name}(Ο_{dept='CS'}(STUDENT))
Step 1: Ο_{dept='CS'}(STUDENT) β {Alice/CS, Bob/CS, Eve/CS}
Step 2: Ο_{name}(...) β {Alice, Bob, Eve}
Result:
βββββββββ
β name β
βββββββββ€
β Alice β
β Bob β
β Eve β
βββββββββ
Rename (rho)¶
The rename operation changes the name of a relation and/or its attributes.
Notation: Ο_{S(B1, B2, ..., Bn)}(R)
Renames relation R to S and attributes to B1, B2, ..., Bn
Shorthand: Ο_S(R) -- rename relation only
Ο_{(B1,...,Bn)}(R) -- rename attributes only
Examples:
1. Rename STUDENT to S:
Ο_S(STUDENT)
β Same tuples, but relation is now called S
2. Rename for self-join preparation:
Ο_{S1(sid1, name1, year1, dept1)}(STUDENT)
Ο_{S2(sid2, name2, year2, dept2)}(STUDENT)
Now we can join S1 and S2 without attribute name conflicts.
3. Practical use - find pairs of students in the same department:
Ο_{S1.dept = S2.dept AND S1.sid < S2.sid}(
Ο_{S1(sid1, name1, year1, dept1)}(STUDENT) Γ
Ο_{S2(sid2, name2, year2, dept2)}(STUDENT)
)
3. Set Operations¶
Set operations require union-compatible (or type-compatible) relations: they must have the same number of attributes, and corresponding attributes must have compatible domains.
Union compatibility:
R(A1: D1, A2: D2, ..., An: Dn)
S(B1: D1, B2: D2, ..., Bn: Dn)
Same number of attributes (n) and compatible domains.
Attribute names may differ (result uses R's names by convention).
Union¶
Notation: R βͺ S
Output: All tuples that are in R, in S, or in both.
Duplicates are eliminated.
R βͺ S = { t | t β R OR t β S }
Example:
CS_STUDENTS = Ο_{sid, name}(Ο_{dept='CS'}(STUDENT))
ββββββββ¬ββββββββ
β sid β name β
ββββββββΌββββββββ€
β S001 β Alice β
β S002 β Bob β
β S005 β Eve β
ββββββββ΄ββββββββ
YEAR3_STUDENTS = Ο_{sid, name}(Ο_{year=3}(STUDENT))
ββββββββ¬ββββββββ
β sid β name β
ββββββββΌββββββββ€
β S001 β Alice β
β S004 β Dave β
ββββββββ΄ββββββββ
CS_STUDENTS βͺ YEAR3_STUDENTS =
ββββββββ¬ββββββββ
β sid β name β
ββββββββΌββββββββ€
β S001 β Alice β β appears in both, but listed only once
β S002 β Bob β
β S005 β Eve β
β S004 β Dave β
ββββββββ΄ββββββββ
Set Difference¶
Notation: R β S (or R \ S)
Output: Tuples in R that are NOT in S.
R β S = { t | t β R AND t β S }
Example:
"CS students who are NOT in year 3"
CS_STUDENTS β YEAR3_STUDENTS =
ββββββββ¬βββββββ
β sid β name β
ββββββββΌβββββββ€
β S002 β Bob β
β S005 β Eve β
ββββββββ΄βββββββ
Note: Alice (S001) is removed because she is in YEAR3_STUDENTS.
Note: R β S β S β R (set difference is NOT commutative)
YEAR3_STUDENTS β CS_STUDENTS =
ββββββββ¬βββββββ
β sid β name β
ββββββββΌβββββββ€
β S004 β Dave β
ββββββββ΄βββββββ
Intersection¶
Notation: R β© S
Output: Tuples that are in BOTH R and S.
R β© S = { t | t β R AND t β S }
Note: R β© S = R β (R β S) (intersection is derivable)
Example:
"Students who are in CS AND in year 3"
CS_STUDENTS β© YEAR3_STUDENTS =
ββββββββ¬ββββββββ
β sid β name β
ββββββββΌββββββββ€
β S001 β Alice β
ββββββββ΄ββββββββ
Set Operations Summary¶
R S R βͺ S R β S R β© S
ββββββββ ββββββββ ββββββββ ββββββββ ββββββββ
β a β β a β β a β β b β β a β
β b β β a β β b β β c β β β
β c β β d β β c β ββββββββ ββββββββ
ββββββββ ββββββββ β d β
ββββββββ
Venn diagram:
βββββββββββββββ
β R β
β ββββββββΌβββββββ
β β Rβ©S β β
β β β S β
ββββββββΌβββββββ β
β β
βββββββββββββββ
R βͺ S = entire shaded area
R β S = left only (not overlap)
R β© S = overlap only
S β R = right only (not overlap)
4. Binary Operations: Cartesian Product and Joins¶
Cartesian Product (Cross Product)¶
Notation: R Γ S
Output: Every tuple in R combined with every tuple in S.
If R has n tuples and S has m tuples, R Γ S has n Γ m tuples.
If R has p attributes and S has q attributes, R Γ S has p + q attributes.
Formal definition:
R Γ S = { <t_r, t_s> | t_r β R AND t_s β S }
Example:
A = {(a1, b1), (a2, b2)} B = {(c1, d1), (c2, d2), (c3, d3)}
A Γ B =
ββββββ¬βββββ¬βββββ¬βββββ
β A β B β C β D β
ββββββΌβββββΌβββββΌβββββ€
β a1 β b1 β c1 β d1 β
β a1 β b1 β c2 β d2 β
β a1 β b1 β c3 β d3 β
β a2 β b2 β c1 β d1 β
β a2 β b2 β c2 β d2 β
β a2 β b2 β c3 β d3 β
ββββββ΄βββββ΄βββββ΄βββββ
|A| = 2, |B| = 3, |A Γ B| = 2 Γ 3 = 6
The Cartesian product by itself is rarely useful because it produces many meaningless combinations. Its power comes when combined with selection (which gives us joins).
Theta Join¶
A theta join combines Cartesian product with selection:
Notation: R β_ΞΈ S (where ΞΈ is a condition)
Definition: R β_ΞΈ S = Ο_ΞΈ(R Γ S)
The condition ΞΈ can use any comparison: =, β , <, >, β€, β₯
Example:
"Find students and courses in the same department"
STUDENT β_{STUDENT.dept = COURSE.dept} COURSE
Equivalent to: Ο_{STUDENT.dept = COURSE.dept}(STUDENT Γ COURSE)
Result (partial):
ββββββββ¬ββββββββ¬βββββββ¬βββββββ¬ββββββββ¬βββββββββββββββββββ¬ββββββββββ¬ββββββββ
β sid β name β year β s.deptβ cid β title β credits βc.dept β
ββββββββΌββββββββΌβββββββΌβββββββΌββββββββΌβββββββββββββββββββΌββββββββββΌββββββββ€
β S001 β Alice β 3 β CS β CS101 β Intro to CS β 3 β CS β
β S001 β Alice β 3 β CS β CS301 β Database Theory β 3 β CS β
β S001 β Alice β 3 β CS β CS401 β Machine Learning β 4 β CS β
β S002 β Bob β 2 β CS β CS101 β Intro to CS β 3 β CS β
β S002 β Bob β 2 β CS β CS301 β Database Theory β 3 β CS β
β S002 β Bob β 2 β CS β CS401 β Machine Learning β 4 β CS β
β S003 β Carol β 4 β EE β EE201 β Circuit Analysis β 3 β EE β
β S005 β Eve β 1 β CS β CS101 β Intro to CS β 3 β CS β
β S005 β Eve β 1 β CS β CS301 β Database Theory β 3 β CS β
β S005 β Eve β 1 β CS β CS401 β Machine Learning β 4 β CS β
ββββββββ΄ββββββββ΄βββββββ΄βββββββ΄ββββββββ΄βββββββββββββββββββ΄ββββββββββ΄ββββββββ
(Dave/ME has no matching course, so not in result)
Equi-Join¶
An equi-join is a theta join where the condition uses only equality (=):
R β_{R.A = S.B} S
All equi-joins are theta joins, but not all theta joins are equi-joins.
A theta join with R.A > S.B is NOT an equi-join.
Natural Join¶
A natural join is a special equi-join that: 1. Joins on ALL common attribute names 2. Removes duplicate columns from the result
Notation: R β S (no subscript)
Definition: Join on all attributes with the same name in R and S,
then project out the duplicate attribute columns.
Example:
STUDENT β ENROLLMENT
Common attribute: sid
Step 1: Equi-join on STUDENT.sid = ENROLLMENT.sid
Step 2: Remove duplicate sid column
Result:
ββββββββ¬ββββββββ¬βββββββ¬βββββββ¬ββββββββ¬ββββββββ
β sid β name β year β dept β cid β grade β
ββββββββΌββββββββΌβββββββΌβββββββΌββββββββΌββββββββ€
β S001 β Alice β 3 β CS β CS101 β A β
β S001 β Alice β 3 β CS β CS301 β A+ β
β S001 β Alice β 3 β CS β MA101 β B+ β
β S002 β Bob β 2 β CS β CS101 β B β
β S002 β Bob β 2 β CS β CS301 β A- β
β S003 β Carol β 4 β EE β EE201 β A β
β S003 β Carol β 4 β EE β CS101 β B+ β
β S004 β Dave β 3 β ME β CS101 β C β
β S005 β Eve β 1 β CS β CS101 β A β
β S005 β Eve β 1 β CS β MA101 β A- β
ββββββββ΄ββββββββ΄βββββββ΄βββββββ΄ββββββββ΄ββββββββ
WARNING: Be careful with natural join when relations share
attribute names unintentionally!
STUDENT(sid, name, year, dept) β COURSE(cid, title, credits, dept)
^^^^ ^^^^
This joins on 'dept', which may not be what you want!
It matches students with courses in THEIR department.
Outer Joins¶
Standard joins discard tuples that do not match. Outer joins preserve unmatched tuples by padding them with NULLs.
Three types:
1. LEFT OUTER JOIN (β):
Keep all tuples from the LEFT relation.
Pad with NULLs if no match on the right.
2. RIGHT OUTER JOIN (β):
Keep all tuples from the RIGHT relation.
Pad with NULLs if no match on the left.
3. FULL OUTER JOIN (β):
Keep all tuples from BOTH relations.
Pad with NULLs on either side as needed.
Example: Left Outer Join
STUDENT β_{STUDENT.sid = ENROLLMENT.sid} ENROLLMENT
"Find all students and their enrollments, INCLUDING students
with no enrollments"
(Suppose S004/Dave had no enrollments in our data.)
If Dave had NO enrollments:
Result:
ββββββββ¬ββββββββ¬βββββββ¬βββββββ¬ββββββββ¬ββββββββ
β sid β name β year β dept β cid β grade β
ββββββββΌββββββββΌβββββββΌβββββββΌββββββββΌββββββββ€
β S001 β Alice β 3 β CS β CS101 β A β
β S001 β Alice β 3 β CS β CS301 β A+ β
β S001 β Alice β 3 β CS β MA101 β B+ β
β S002 β Bob β 2 β CS β CS101 β B β
β S002 β Bob β 2 β CS β CS301 β A- β
β S003 β Carol β 4 β EE β EE201 β A β
β S003 β Carol β 4 β EE β CS101 β B+ β
β S004 β Dave β 3 β ME β NULL β NULL β β preserved with NULLs
β S005 β Eve β 1 β CS β CS101 β A β
β S005 β Eve β 1 β CS β MA101 β A- β
ββββββββ΄ββββββββ΄βββββββ΄βββββββ΄ββββββββ΄ββββββββ
Join Comparison Summary¶
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Join Types β
β β
β Cartesian Product R Γ S All combinations (n Γ m tuples) β
β Theta Join R β_ΞΈ S Cartesian + selection on ΞΈ β
β Equi-Join R β_{=} S Theta join with equality only β
β Natural Join R β S Equi-join on common attrs, β
β remove duplicates β
β Left Outer Join R β S Natural + keep unmatched R β
β Right Outer Join R β S Natural + keep unmatched S β
β Full Outer Join R β S Natural + keep all unmatched β
β Semi-Join R β S Tuples in R with a match in S β
β Anti-Join R β· S Tuples in R with NO match in S β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Semi-Join and Anti-Join¶
Semi-Join: R β S = Ο_{attrs(R)}(R β S)
"Return tuples from R that have a matching tuple in S"
(Only R's attributes appear in the result)
Anti-Join: R β· S = R β Ο_{attrs(R)}(R β S)
"Return tuples from R that have NO matching tuple in S"
Example:
"Students who are enrolled in at least one course" (Semi-Join):
STUDENT β ENROLLMENT
β All students from STUDENT who appear in ENROLLMENT
"Students who are NOT enrolled in any course" (Anti-Join):
STUDENT β· ENROLLMENT
β Students who have no matching enrollment
5. Division¶
The division operation answers "for all" queries. It is one of the most powerful and least intuitive operations in relational algebra.
Notation: R Γ· S
Given: R(A1, A2, ..., An, B1, B2, ..., Bm)
S(B1, B2, ..., Bm)
Result: Tuples t over {A1, ..., An} such that for EVERY tuple s in S,
the tuple <t, s> is in R.
Formally:
R Γ· S = { t | t β Ο_{A}(R) AND βs β S : <t,s> β R }
where A = {A1, ..., An} (attributes of R not in S)
Division Expressed Using Fundamental Operations¶
R Γ· S = Ο_A(R) β Ο_A( (Ο_A(R) Γ S) β R )
Explanation:
1. Ο_A(R) = all possible A-values in R
2. Ο_A(R) Γ S = every A-value paired with every S-tuple
3. (Ο_A(R) Γ S) β R = combinations that are MISSING from R
4. Ο_A(...) = A-values that are missing at least one S-tuple
5. Ο_A(R) β ... = A-values that are NOT missing any S-tuple
= A-values associated with ALL S-tuples
Division Example¶
"Find students enrolled in ALL CS courses"
Step 1: Define the dividend (student-course pairs)
R = Ο_{sid, cid}(ENROLLMENT)
ββββββββ¬ββββββββ
β sid β cid β
ββββββββΌββββββββ€
β S001 β CS101 β
β S001 β CS301 β
β S001 β MA101 β
β S002 β CS101 β
β S002 β CS301 β
β S003 β EE201 β
β S003 β CS101 β
β S004 β CS101 β
β S005 β CS101 β
β S005 β MA101 β
ββββββββ΄ββββββββ
Step 2: Define the divisor (all CS courses)
S = Ο_{cid}(Ο_{dept='CS'}(COURSE))
βββββββββ
β cid β
βββββββββ€
β CS101 β
β CS301 β
β CS401 β
βββββββββ
Step 3: R Γ· S = students associated with ALL of {CS101, CS301, CS401}
Check each student:
S001: has {CS101, CS301} but NOT CS401 β β
S002: has {CS101, CS301} but NOT CS401 β β
S003: has {CS101} only β β
S004: has {CS101} only β β
S005: has {CS101} only β β
Result: EMPTY SET (no student is enrolled in ALL three CS courses)
If CS401 were not in the course list (only CS101 and CS301):
S = {CS101, CS301}
S001: has {CS101, CS301} β β
S002: has {CS101, CS301} β β
Others: β
Result:
ββββββββ
β sid β
ββββββββ€
β S001 β
β S002 β
ββββββββ
6. Additional Operations¶
Aggregation and Grouping¶
The grouping/aggregation operator extends relational algebra to support aggregate functions.
Notation: _{G1, G2, ..., Gn} G _{F1(A1), F2(A2), ..., Fk(Ak)} (R)
or more commonly written as:
Ξ³_{G; F1(A1) AS name1, ...}(R)
Where:
G1, ..., Gn = grouping attributes
F1, ..., Fk = aggregate functions (COUNT, SUM, AVG, MIN, MAX)
A1, ..., Ak = attributes to aggregate
Example:
"Count students per department"
Ξ³_{dept; COUNT(sid) AS count}(STUDENT)
Result:
ββββββββ¬ββββββββ
β dept β count β
ββββββββΌββββββββ€
β CS β 3 β
β EE β 1 β
β ME β 1 β
ββββββββ΄ββββββββ
"Average salary per department (for instructors)"
Ξ³_{dept; AVG(salary) AS avg_sal, COUNT(*) AS num}(INSTRUCTOR)
Result:
ββββββββ¬ββββββββββ¬ββββββ
β dept β avg_sal β num β
ββββββββΌββββββββββΌββββββ€
β CS β 91500 β 2 β
β EE β 92000 β 1 β
β MA β 85000 β 1 β
ββββββββ΄ββββββββββ΄ββββββ
Assignment¶
The assignment operator stores intermediate results:
Notation: temp β expression
Example:
CS_STUDENTS β Ο_{dept='CS'}(STUDENT)
CS_NAMES β Ο_{name}(CS_STUDENTS)
Sorting¶
Notation: Ο_{A1 ASC, A2 DESC}(R)
Note: Sorting produces a LIST, not a SET. Strictly speaking, it
goes beyond pure relational algebra (which produces only sets).
7. Query Trees and Algebraic Optimization¶
Query Trees¶
A query tree (or operator tree) represents a relational algebra expression as a tree where: - Leaf nodes are base relations - Internal nodes are relational algebra operations - The root produces the final result
Example: "Find names of CS students enrolled in CS301"
Relational algebra:
Ο_{name}(Ο_{dept='CS' AND cid='CS301'}(STUDENT β ENROLLMENT))
Query tree:
Ο_{name}
β
Ο_{dept='CS' AND cid='CS301'}
β
β_{sid=sid}
/ \
STUDENT ENROLLMENT
Algebraic Optimization¶
The query optimizer transforms query trees using equivalence rules to find more efficient execution plans.
Key Equivalence Rules¶
Rule 1: Cascade of Selection
Ο_{c1 AND c2}(R) = Ο_{c1}(Ο_{c2}(R))
A conjunctive selection can be broken into a sequence of selections.
Rule 2: Commutativity of Selection
Ο_{c1}(Ο_{c2}(R)) = Ο_{c2}(Ο_{c1}(R))
The order of selections does not matter.
Rule 3: Cascade of Projection
Ο_{L1}(Ο_{L2}(...Ο_{Ln}(R)...)) = Ο_{L1}(R)
Only the outermost projection matters (if L1 β L2 β ... β Ln).
Rule 4: Commuting Selection with Projection
If the selection condition c only involves attributes in L:
Ο_L(Ο_c(R)) = Ο_c(Ο_L(R))
Rule 5: Commutativity of Join
R β S = S β R
R Γ S = S Γ R
Rule 6: Associativity of Join
(R β S) β T = R β (S β T)
Rule 7: Pushing Selection Through Join
If condition c involves only attributes of R:
Ο_c(R β S) = Ο_c(R) β S
This is the MOST IMPORTANT optimization rule!
It reduces the size of intermediate results.
Rule 8: Commutativity of Set Operations
R βͺ S = S βͺ R
R β© S = S β© R
(But R β S β S β R)
Optimization Example¶
Original query tree (unoptimized):
Ο_{name}
β
Ο_{dept='CS' AND cid='CS301'}
β
β_{sid=sid}
/ \
STUDENT ENROLLMENT
(5 rows) (10 rows)
Cartesian: 50 rows before filter
Optimized query tree (push selections down):
Ο_{name}
β
β_{sid=sid}
/ \
Ο_{dept='CS'} Ο_{cid='CS301'}
| |
STUDENT ENROLLMENT
(β3 rows) (β2 rows)
Join: 6 combinations, ~2 matches
The optimized tree:
1. Filters STUDENT to 3 CS students FIRST
2. Filters ENROLLMENT to 2 CS301 enrollments FIRST
3. Joins only 3 Γ 2 = 6 combinations instead of 5 Γ 10 = 50
4. Significant reduction in intermediate result size
Heuristic Optimization Rules¶
1. Push selections down as far as possible
(Reduce tuple count early)
2. Push projections down as far as possible
(Reduce attribute count early, but keep join attributes)
3. Choose the most restrictive selection first
(A selection that eliminates the most tuples)
4. Avoid Cartesian products
(Always prefer joins over Cartesian + selection)
5. Choose join order to minimize intermediate result size
(This is the hardest optimization problem β often NP-hard)
8. Relational Calculus (Brief Introduction)¶
While relational algebra is procedural (specifies how to compute the result), relational calculus is declarative (specifies what the result should be, not how to compute it).
Tuple Relational Calculus (TRC)¶
In TRC, queries are expressed using tuple variables that range over relations.
General form:
{ t | P(t) }
"The set of all tuples t such that predicate P(t) is true"
Example: "Find all CS students"
{ t | t β STUDENT β§ t.dept = 'CS' }
"The set of tuples t from STUDENT where dept is CS"
Example: "Find names and departments of students in year 3 or 4"
{ t.name, t.dept | t β STUDENT β§ (t.year = 3 β¨ t.year = 4) }
Example: "Find students enrolled in CS301"
{ t | t β STUDENT β§ βe β ENROLLMENT(e.sid = t.sid β§ e.cid = 'CS301') }
"Tuples t from STUDENT such that there EXISTS an enrollment e
with the same sid and cid = CS301"
Domain Relational Calculus (DRC)¶
In DRC, variables range over individual values (domains) rather than tuples.
General form:
{ <x1, x2, ..., xn> | P(x1, x2, ..., xn) }
Example: "Find names of CS students"
{ <n> | βs, y, d (STUDENT(s, n, y, d) β§ d = 'CS') }
"The set of name values n such that there exist values s, y, d
where (s, n, y, d) is a tuple in STUDENT and d is CS"
Equivalence of Algebra and Calculus¶
Codd's Theorem (1972): Relational algebra and (safe) relational calculus have the same expressive power. Any query expressible in one can be expressed in the other.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Relational Algebra β‘ Safe Tuple Relational Calc. β
β β‘ Safe Domain Relational Calc. β
β β‘ SQL (core) β
β β
β A query language is "relationally complete" if it β
β can express everything that relational algebra can. β
β β
β SQL is relationally complete (and more β it has β
β aggregation, ordering, recursion, etc.) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Safety of Expressions¶
A relational calculus expression is safe if it guarantees a finite result. Unsafe expressions can produce infinite results:
Unsafe: { t | Β¬(t β STUDENT) }
"All tuples NOT in STUDENT" β this is infinite!
Safe: { t | t β STUDENT β§ Β¬(βe β ENROLLMENT(e.sid = t.sid)) }
"Students NOT enrolled in any course" β finite result
9. Equivalence with SQL¶
Every relational algebra expression has an SQL equivalent. Understanding this correspondence helps in writing and optimizing SQL queries.
Operation-by-Operation Mapping¶
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β Relational Algebra β SQL Equivalent β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ€
β Ο_{c}(R) β SELECT * FROM R WHERE c β
β Ο_{A,B}(R) β SELECT DISTINCT A, B FROM R β
β Ο_{S}(R) β R AS S (in FROM clause) β
β R βͺ S β SELECT * FROM R UNION SELECT * FROM S β
β R β© S β SELECT * FROM R INTERSECT SELECT * FROM Sβ
β R β S β SELECT * FROM R EXCEPT SELECT * FROM S β
β R Γ S β SELECT * FROM R, S (or CROSS JOIN) β
β R β_{c} S β SELECT * FROM R JOIN S ON c β
β R β S β SELECT * FROM R NATURAL JOIN S β
β R β S β SELECT * FROM R LEFT OUTER JOIN S ON ... β
β R Γ· S β (requires NOT EXISTS + correlated sub) β
β Ξ³_{G;F(A)}(R) β SELECT G, F(A) FROM R GROUP BY G β
β Ο_{A}(R) β SELECT * FROM R ORDER BY A β
ββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββ
Detailed SQL Equivalences¶
Selection:
Ο_{dept='CS' AND year>=3}(STUDENT)
SELECT *
FROM student
WHERE dept = 'CS' AND year >= 3;
Projection:
Ο_{name, dept}(STUDENT)
SELECT DISTINCT name, dept
FROM student;
Note: SQL does NOT eliminate duplicates by default.
You must use DISTINCT to match relational algebra's set semantics.
Natural Join:
STUDENT β ENROLLMENT
-- Method 1: NATURAL JOIN (matches on all common columns)
SELECT * FROM student NATURAL JOIN enrollment;
-- Method 2: Explicit JOIN (safer β you control which columns match)
SELECT s.sid, s.name, s.year, s.dept, e.cid, e.grade
FROM student s
JOIN enrollment e ON s.sid = e.sid;
Division (the hardest to express in SQL):
R Γ· S = "Find tuples in R associated with ALL tuples in S"
-- "Find students enrolled in ALL CS courses"
SELECT DISTINCT e.sid
FROM enrollment e
WHERE NOT EXISTS (
SELECT c.cid
FROM course c
WHERE c.dept = 'CS'
AND NOT EXISTS (
SELECT 1
FROM enrollment e2
WHERE e2.sid = e.sid
AND e2.cid = c.cid
)
);
-- Alternative using COUNT:
SELECT e.sid
FROM enrollment e
JOIN course c ON e.cid = c.cid
WHERE c.dept = 'CS'
GROUP BY e.sid
HAVING COUNT(DISTINCT e.cid) = (
SELECT COUNT(*) FROM course WHERE dept = 'CS'
);
Composition Example:
"Find names of students who received an A in any course"
Relational Algebra:
Ο_{name}(Ο_{grade='A'}(STUDENT β ENROLLMENT))
SQL:
SELECT DISTINCT s.name
FROM student s
JOIN enrollment e ON s.sid = e.sid
WHERE e.grade = 'A';
10. Complete Worked Examples¶
Example 1: Multi-Step Query¶
Query: "Find the names and departments of students who are enrolled in a course taught by the CS department but are not CS majors themselves."
Relational Algebra:
CS_COURSES β Ο_{dept='CS'}(COURSE)
CS_ENROLLED β ENROLLMENT β_{cid} CS_COURSES
NON_CS_STUDENTS β Ο_{deptβ 'CS'}(STUDENT)
RESULT β Ο_{name, dept}(NON_CS_STUDENTS β_{sid} CS_ENROLLED)
Step-by-step:
1. CS_COURSES = Ο_{dept='CS'}(COURSE)
β {CS101, CS301, CS401}
2. CS_ENROLLED = Ο_{sid}(ENROLLMENT β CS_COURSES)
β {S001, S002, S003, S004, S005} (all who took CS101/CS301/CS401)
3. NON_CS_STUDENTS = Ο_{deptβ 'CS'}(STUDENT)
β {S003/Carol/EE, S004/Dave/ME}
4. RESULT = Ο_{name, dept}(NON_CS_STUDENTS β CS_ENROLLED)
β {(Carol, EE), (Dave, ME)}
SQL:
SELECT DISTINCT s.name, s.dept
FROM student s
JOIN enrollment e ON s.sid = e.sid
JOIN course c ON e.cid = c.cid
WHERE c.dept = 'CS' AND s.dept <> 'CS';
Example 2: Division Query¶
Query: "Find students who have taken ALL courses that student S001 has taken."
Relational Algebra:
S001_COURSES β Ο_{cid}(Ο_{sid='S001'}(ENROLLMENT))
ALL_ENROLLMENTS β Ο_{sid, cid}(ENROLLMENT)
RESULT β ALL_ENROLLMENTS Γ· S001_COURSES
Step-by-step:
1. S001_COURSES = {CS101, CS301, MA101}
2. Check each student:
S001: {CS101, CS301, MA101} β {CS101, CS301, MA101} β β
S002: {CS101, CS301} β {CS101, CS301, MA101} β β (missing MA101)
S003: {EE201, CS101} β {CS101, CS301, MA101} β β
S004: {CS101} β {CS101, CS301, MA101} β β
S005: {CS101, MA101} β {CS101, CS301, MA101} β β (missing CS301)
3. RESULT = {S001}
SQL:
SELECT e.sid
FROM enrollment e
WHERE e.cid IN (
SELECT cid FROM enrollment WHERE sid = 'S001'
)
GROUP BY e.sid
HAVING COUNT(DISTINCT e.cid) = (
SELECT COUNT(DISTINCT cid) FROM enrollment WHERE sid = 'S001'
);
Example 3: Outer Join¶
Query: "List all students with their enrollment counts, including students with no enrollments."
Relational Algebra:
JOINED β STUDENT β ENROLLMENT (left outer join on sid)
RESULT β Ξ³_{sid, name; COUNT(cid) AS num_courses}(JOINED)
If Dave (S004) had no enrollments:
SQL:
SELECT s.sid, s.name, COUNT(e.cid) AS num_courses
FROM student s
LEFT OUTER JOIN enrollment e ON s.sid = e.sid
GROUP BY s.sid, s.name
ORDER BY num_courses DESC;
Result:
ββββββββ¬ββββββββ¬ββββββββββββββ
β sid β name β num_courses β
ββββββββΌββββββββΌββββββββββββββ€
β S001 β Alice β 3 β
β S002 β Bob β 2 β
β S003 β Carol β 2 β
β S005 β Eve β 2 β
β S004 β Dave β 0 β β preserved by LEFT OUTER JOIN
ββββββββ΄ββββββββ΄ββββββββββββββ
11. Exercises¶
Basic Operations¶
Exercise 3.1: Using the sample database, write relational algebra expressions for:
- (a) All students in year 2 or year 3
- (b) Course titles with 4 or more credits
- (c) Student IDs enrolled in EE201
- (d) Names of students NOT in the CS department
Exercise 3.2: Evaluate the following expressions step by step, showing intermediate results:
- (a)
Ο_{name}(Ο_{year > 2}(STUDENT)) - (b)
Ο_{sid}(Ο_{grade='A'}(ENROLLMENT)) β© Ο_{sid}(Ο_{dept='CS'}(STUDENT)) - (c)
STUDENT β (Ο_{cid='CS301'}(ENROLLMENT))
Join Operations¶
Exercise 3.3: For each of the following, write the relational algebra expression AND the equivalent SQL:
- (a) Find names of students enrolled in "Database Theory"
- (b) Find course titles that have at least one student with grade A
- (c) Find students who are enrolled in courses outside their department
- (d) Find pairs of students who share at least one course
Exercise 3.4: Explain the difference between the results of these three queries on our sample data:
Q1: STUDENT β ENROLLMENT
Q2: STUDENT β ENROLLMENT
Q3: STUDENT Γ ENROLLMENT
How many tuples does each produce? Which tuples appear in Q2 but not Q1?
Division¶
Exercise 3.5: Write relational algebra expressions using division for:
- (a) Students enrolled in every 3-credit course
- (b) Students who have taken ALL courses that Bob (S002) has taken
Show the step-by-step evaluation.
Exercise 3.6: Express each division query from Exercise 3.5 in SQL using: - (i) Double NOT EXISTS - (ii) GROUP BY with HAVING COUNT
Optimization¶
Exercise 3.7: Given the query:
Ο_{name}(Ο_{credits=3 AND grade='A'}(STUDENT β ENROLLMENT β COURSE))
- (a) Draw the initial (unoptimized) query tree
- (b) Apply algebraic optimization rules to produce an optimized query tree
- (c) Explain which rules you applied and why the optimized tree is better
- (d) Estimate the reduction in intermediate result sizes
Exercise 3.8: Prove or disprove the following equivalences:
- (a)
Ο_{c1}(R βͺ S) = Ο_{c1}(R) βͺ Ο_{c1}(S) - (b)
Ο_{c1}(R β S) = Ο_{c1}(R) β Ο_{c1}(S) - (c)
Ο_A(R βͺ S) = Ο_A(R) βͺ Ο_A(S) - (d)
Ο_{c1}(R Γ S) = Ο_{c1}(R) Γ S(where c1 involves only R's attributes)
Relational Calculus¶
Exercise 3.9: Express the following in Tuple Relational Calculus (TRC):
- (a) Names of students in the CS department
- (b) Students enrolled in at least two courses
- (c) Courses with no enrolled students
- (d) Students enrolled in ALL courses offered by the CS department
Exercise 3.10: Determine which of the following TRC expressions are safe. If unsafe, explain why and provide a safe equivalent:
- (a)
{ t | Β¬(t β STUDENT) } - (b)
{ t.name | t β STUDENT β§ t.gpa > 3.5 } - (c)
{ <x, y> | βt β STUDENT(t.sid = x β§ t.name = y) } - (d)
{ t | t.salary > 100000 }
Challenge Problems¶
Exercise 3.11: A relation R(A, B, C) contains the following tuples:
{(a1, b1, c1), (a1, b2, c1), (a1, b2, c2), (a2, b1, c1), (a2, b1, c2)}
A relation S(B, C) contains: {(b1, c1), (b1, c2)}
Compute R Γ· S step by step. Verify your answer by checking that each resulting tuple is associated with ALL tuples in S.
Exercise 3.12: Write a single relational algebra expression (no assignment) for:
"Find the department with the highest average instructor salary."
Hint: This requires aggregation and a comparison pattern.
Previous: Relational Model | Next: ER Modeling