Relational Algebra

Relational Algebra

Previous: Relational Model | Next: ER Modeling


Relational algebra is the formal query language of the relational model. It provides a collection of operators that take one or two relations as input and produce a new relation as output. Understanding relational algebra is essential because it underlies SQL query processing: every SQL query is internally translated into a relational algebra expression that the query optimizer can then transform and improve.

Table of Contents

  1. Overview of Relational Algebra
  2. Unary Operations
  3. Set Operations
  4. Binary Operations: Cartesian Product and Joins
  5. Division
  6. Additional Operations
  7. Query Trees and Algebraic Optimization
  8. Relational Calculus (Brief Introduction)
  9. Equivalence with SQL
  10. Complete Worked Examples
  11. Exercises

1. Overview of Relational Algebra

What Is Relational Algebra?

Relational algebra is a procedural query language: it describes a sequence of operations to compute the desired result. Each operation takes one or more relations and produces a new relation.

Properties:
  - Closure: The result of every operation is a relation
             (enabling composition of operations)
  - Set semantics: Relations are sets (no duplicate tuples)
  - Foundation: Based on set theory and first-order logic

Categories of Operations

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Relational Algebra Operations                    β”‚
β”‚                                                             β”‚
β”‚  Fundamental (cannot be expressed using other operations):   β”‚
β”‚    Οƒ  Selection          (filter rows)                      β”‚
β”‚    Ο€  Projection         (choose columns)                   β”‚
β”‚    ρ  Rename             (rename relation/attributes)       β”‚
β”‚    βˆͺ  Union              (combine two relations)            β”‚
β”‚    βˆ’  Set Difference     (tuples in R but not in S)         β”‚
β”‚    Γ—  Cartesian Product  (all combinations)                 β”‚
β”‚                                                             β”‚
β”‚  Derived (can be expressed using fundamental operations):    β”‚
β”‚    β‹ˆ  Join (various types)                                  β”‚
β”‚    ∩  Intersection                                          β”‚
β”‚    Γ·  Division                                              β”‚
β”‚    ←  Assignment                                            β”‚
β”‚    Ξ΄  Duplicate elimination (for bag semantics)             β”‚
β”‚    Ξ³  Grouping/Aggregation                                  β”‚
β”‚    Ο„  Sorting                                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Running Example Database

Throughout this lesson, we use the following sample database:

STUDENT:
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name      β”‚ year β”‚ dept β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚ S001 β”‚ Alice     β”‚  3   β”‚ CS   β”‚
β”‚ S002 β”‚ Bob       β”‚  2   β”‚ CS   β”‚
β”‚ S003 β”‚ Carol     β”‚  4   β”‚ EE   β”‚
β”‚ S004 β”‚ Dave      β”‚  3   β”‚ ME   β”‚
β”‚ S005 β”‚ Eve       β”‚  1   β”‚ CS   β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

COURSE:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
β”‚ cid   β”‚ title                β”‚ credits β”‚ dept β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚ CS101 β”‚ Intro to CS          β”‚ 3       β”‚ CS   β”‚
β”‚ CS301 β”‚ Database Theory      β”‚ 3       β”‚ CS   β”‚
β”‚ CS401 β”‚ Machine Learning     β”‚ 4       β”‚ CS   β”‚
β”‚ EE201 β”‚ Circuit Analysis     β”‚ 3       β”‚ EE   β”‚
β”‚ MA101 β”‚ Calculus I           β”‚ 4       β”‚ MA   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

ENROLLMENT:
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ cid   β”‚ grade β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ S001 β”‚ CS101 β”‚ A     β”‚
β”‚ S001 β”‚ CS301 β”‚ A+    β”‚
β”‚ S001 β”‚ MA101 β”‚ B+    β”‚
β”‚ S002 β”‚ CS101 β”‚ B     β”‚
β”‚ S002 β”‚ CS301 β”‚ A-    β”‚
β”‚ S003 β”‚ EE201 β”‚ A     β”‚
β”‚ S003 β”‚ CS101 β”‚ B+    β”‚
β”‚ S004 β”‚ CS101 β”‚ C     β”‚
β”‚ S005 β”‚ CS101 β”‚ A     β”‚
β”‚ S005 β”‚ MA101 β”‚ A-    β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

INSTRUCTOR:
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ iid  β”‚ name     β”‚ dept β”‚ salary β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ I001 β”‚ Prof. Kimβ”‚ CS   β”‚ 95000  β”‚
β”‚ I002 β”‚ Prof. Leeβ”‚ CS   β”‚ 88000  β”‚
β”‚ I003 β”‚ Prof. Parkβ”‚ EE  β”‚ 92000  β”‚
β”‚ I004 β”‚ Prof. Choiβ”‚ MA  β”‚ 85000  β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Unary Operations

Unary operations take a single relation as input.

Selection (sigma)

The selection operation filters rows based on a condition (predicate).

Notation:   Οƒ_condition(R)

Output:     A relation containing only the tuples from R
            that satisfy the condition.

Schema:     Same as R (all attributes preserved)

Formal definition:

Οƒ_condition(R) = { t | t ∈ R  AND  condition(t) is TRUE }

Examples:

1. Select CS students:

   Οƒ_{dept='CS'}(STUDENT)

   Result:
   β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
   β”‚ sid  β”‚ name  β”‚ year β”‚ dept β”‚
   β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
   β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚
   β”‚ S002 β”‚ Bob   β”‚  2   β”‚ CS   β”‚
   β”‚ S005 β”‚ Eve   β”‚  1   β”‚ CS   β”‚
   β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜


2. Select 3rd-year CS students:

   Οƒ_{dept='CS' AND year=3}(STUDENT)

   Result:
   β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
   β”‚ sid  β”‚ name  β”‚ year β”‚ dept β”‚
   β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
   β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚
   β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜


3. Select courses with 4 credits:

   Οƒ_{credits=4}(COURSE)

   Result:
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
   β”‚ cid   β”‚ title            β”‚ credits β”‚ dept β”‚
   β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
   β”‚ CS401 β”‚ Machine Learning β”‚ 4       β”‚ CS   β”‚
   β”‚ MA101 β”‚ Calculus I       β”‚ 4       β”‚ MA   β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

Selection conditions can use: - Comparison operators: =, β‰ , <, >, ≀, β‰₯ - Logical connectives: AND (∧), OR (∨), NOT (Β¬) - Attribute names and constants

Projection (pi)

The projection operation selects specific columns (attributes) and removes duplicates.

Notation:   Ο€_{attr_list}(R)

Output:     A relation containing only the specified attributes,
            with duplicate tuples removed.

Schema:     Only the attributes in attr_list

Formal definition:

Ο€_{A1, A2, ..., Ak}(R) = { <t[A1], t[A2], ..., t[Ak]> | t ∈ R }

Examples:

1. Project student names and departments:

   Ο€_{name, dept}(STUDENT)

   Result:
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
   β”‚ name  β”‚ dept β”‚
   β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
   β”‚ Alice β”‚ CS   β”‚
   β”‚ Bob   β”‚ CS   β”‚
   β”‚ Carol β”‚ EE   β”‚
   β”‚ Dave  β”‚ ME   β”‚
   β”‚ Eve   β”‚ CS   β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜


2. Project distinct departments from STUDENT:

   Ο€_{dept}(STUDENT)

   Result:
   β”Œβ”€β”€β”€β”€β”€β”€β”
   β”‚ dept β”‚
   β”œβ”€β”€β”€β”€β”€β”€β”€
   β”‚ CS   β”‚
   β”‚ EE   β”‚
   β”‚ ME   β”‚
   β””β”€β”€β”€β”€β”€β”€β”˜
   (Note: CS appears only ONCE because duplicates are removed)


3. Compose selection and projection:

   "Find names of CS students"
   Ο€_{name}(Οƒ_{dept='CS'}(STUDENT))

   Step 1: Οƒ_{dept='CS'}(STUDENT) β†’ {Alice/CS, Bob/CS, Eve/CS}
   Step 2: Ο€_{name}(...) β†’ {Alice, Bob, Eve}

   Result:
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”
   β”‚ name  β”‚
   β”œβ”€β”€β”€β”€β”€β”€β”€β”€
   β”‚ Alice β”‚
   β”‚ Bob   β”‚
   β”‚ Eve   β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”˜

Rename (rho)

The rename operation changes the name of a relation and/or its attributes.

Notation:   ρ_{S(B1, B2, ..., Bn)}(R)

            Renames relation R to S and attributes to B1, B2, ..., Bn

Shorthand:  ρ_S(R)          -- rename relation only
            ρ_{(B1,...,Bn)}(R) -- rename attributes only

Examples:

1. Rename STUDENT to S:

   ρ_S(STUDENT)
   β†’ Same tuples, but relation is now called S

2. Rename for self-join preparation:

   ρ_{S1(sid1, name1, year1, dept1)}(STUDENT)
   ρ_{S2(sid2, name2, year2, dept2)}(STUDENT)

   Now we can join S1 and S2 without attribute name conflicts.

3. Practical use - find pairs of students in the same department:

   Οƒ_{S1.dept = S2.dept AND S1.sid < S2.sid}(
       ρ_{S1(sid1, name1, year1, dept1)}(STUDENT) Γ—
       ρ_{S2(sid2, name2, year2, dept2)}(STUDENT)
   )

3. Set Operations

Set operations require union-compatible (or type-compatible) relations: they must have the same number of attributes, and corresponding attributes must have compatible domains.

Union compatibility:
  R(A1: D1, A2: D2, ..., An: Dn)
  S(B1: D1, B2: D2, ..., Bn: Dn)

  Same number of attributes (n) and compatible domains.
  Attribute names may differ (result uses R's names by convention).

Union

Notation:   R βˆͺ S

Output:     All tuples that are in R, in S, or in both.
            Duplicates are eliminated.

R βˆͺ S = { t | t ∈ R  OR  t ∈ S }

Example:

CS_STUDENTS = Ο€_{sid, name}(Οƒ_{dept='CS'}(STUDENT))
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ S001 β”‚ Alice β”‚
β”‚ S002 β”‚ Bob   β”‚
β”‚ S005 β”‚ Eve   β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

YEAR3_STUDENTS = Ο€_{sid, name}(Οƒ_{year=3}(STUDENT))
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ S001 β”‚ Alice β”‚
β”‚ S004 β”‚ Dave  β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

CS_STUDENTS βˆͺ YEAR3_STUDENTS =
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ S001 β”‚ Alice β”‚  ← appears in both, but listed only once
β”‚ S002 β”‚ Bob   β”‚
β”‚ S005 β”‚ Eve   β”‚
β”‚ S004 β”‚ Dave  β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

Set Difference

Notation:   R βˆ’ S  (or R \ S)

Output:     Tuples in R that are NOT in S.

R βˆ’ S = { t | t ∈ R  AND  t βˆ‰ S }

Example:

"CS students who are NOT in year 3"

CS_STUDENTS βˆ’ YEAR3_STUDENTS =
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚ S002 β”‚ Bob  β”‚
β”‚ S005 β”‚ Eve  β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

Note: Alice (S001) is removed because she is in YEAR3_STUDENTS.
Note: R βˆ’ S β‰  S βˆ’ R  (set difference is NOT commutative)

YEAR3_STUDENTS βˆ’ CS_STUDENTS =
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚ S004 β”‚ Dave β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

Intersection

Notation:   R ∩ S

Output:     Tuples that are in BOTH R and S.

R ∩ S = { t | t ∈ R  AND  t ∈ S }

Note: R ∩ S = R βˆ’ (R βˆ’ S)  (intersection is derivable)

Example:

"Students who are in CS AND in year 3"

CS_STUDENTS ∩ YEAR3_STUDENTS =
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ S001 β”‚ Alice β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

Set Operations Summary

     R           S                R βˆͺ S         R βˆ’ S         R ∩ S

  β”Œβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”
  β”‚ a    β”‚   β”‚ a    β”‚         β”‚ a    β”‚      β”‚ b    β”‚     β”‚ a    β”‚
  β”‚ b    β”‚   β”‚ a    β”‚         β”‚ b    β”‚      β”‚ c    β”‚     β”‚      β”‚
  β”‚ c    β”‚   β”‚ d    β”‚         β”‚ c    β”‚      β””β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”˜
  β””β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”˜         β”‚ d    β”‚
                              β””β”€β”€β”€β”€β”€β”€β”˜

  Venn diagram:
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚  R          β”‚
       β”‚      β”Œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”
       β”‚      β”‚ R∩S  β”‚      β”‚
       β”‚      β”‚      β”‚   S  β”‚
       β””β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”˜      β”‚
              β”‚             β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  R βˆͺ S = entire shaded area
  R βˆ’ S = left only (not overlap)
  R ∩ S = overlap only
  S βˆ’ R = right only (not overlap)

4. Binary Operations: Cartesian Product and Joins

Cartesian Product (Cross Product)

Notation:   R Γ— S

Output:     Every tuple in R combined with every tuple in S.
            If R has n tuples and S has m tuples, R Γ— S has n Γ— m tuples.
            If R has p attributes and S has q attributes, R Γ— S has p + q attributes.

Formal definition:

R Γ— S = { <t_r, t_s> | t_r ∈ R  AND  t_s ∈ S }

Example:

A = {(a1, b1), (a2, b2)}     B = {(c1, d1), (c2, d2), (c3, d3)}

A Γ— B =
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”
β”‚ A  β”‚ B  β”‚ C  β”‚ D  β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€
β”‚ a1 β”‚ b1 β”‚ c1 β”‚ d1 β”‚
β”‚ a1 β”‚ b1 β”‚ c2 β”‚ d2 β”‚
β”‚ a1 β”‚ b1 β”‚ c3 β”‚ d3 β”‚
β”‚ a2 β”‚ b2 β”‚ c1 β”‚ d1 β”‚
β”‚ a2 β”‚ b2 β”‚ c2 β”‚ d2 β”‚
β”‚ a2 β”‚ b2 β”‚ c3 β”‚ d3 β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”˜

|A| = 2, |B| = 3, |A Γ— B| = 2 Γ— 3 = 6

The Cartesian product by itself is rarely useful because it produces many meaningless combinations. Its power comes when combined with selection (which gives us joins).

Theta Join

A theta join combines Cartesian product with selection:

Notation:   R β‹ˆ_ΞΈ S   (where ΞΈ is a condition)

Definition: R β‹ˆ_ΞΈ S = Οƒ_ΞΈ(R Γ— S)

The condition ΞΈ can use any comparison: =, β‰ , <, >, ≀, β‰₯

Example:

"Find students and courses in the same department"

STUDENT β‹ˆ_{STUDENT.dept = COURSE.dept} COURSE

Equivalent to: Οƒ_{STUDENT.dept = COURSE.dept}(STUDENT Γ— COURSE)

Result (partial):
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name  β”‚ year β”‚ s.deptβ”‚ cid  β”‚ title            β”‚ credits β”‚c.dept β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚ CS101 β”‚ Intro to CS      β”‚ 3       β”‚ CS    β”‚
β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚ CS301 β”‚ Database Theory  β”‚ 3       β”‚ CS    β”‚
β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚ CS401 β”‚ Machine Learning β”‚ 4       β”‚ CS    β”‚
β”‚ S002 β”‚ Bob   β”‚  2   β”‚ CS   β”‚ CS101 β”‚ Intro to CS      β”‚ 3       β”‚ CS    β”‚
β”‚ S002 β”‚ Bob   β”‚  2   β”‚ CS   β”‚ CS301 β”‚ Database Theory  β”‚ 3       β”‚ CS    β”‚
β”‚ S002 β”‚ Bob   β”‚  2   β”‚ CS   β”‚ CS401 β”‚ Machine Learning β”‚ 4       β”‚ CS    β”‚
β”‚ S003 β”‚ Carol β”‚  4   β”‚ EE   β”‚ EE201 β”‚ Circuit Analysis β”‚ 3       β”‚ EE    β”‚
β”‚ S005 β”‚ Eve   β”‚  1   β”‚ CS   β”‚ CS101 β”‚ Intro to CS      β”‚ 3       β”‚ CS    β”‚
β”‚ S005 β”‚ Eve   β”‚  1   β”‚ CS   β”‚ CS301 β”‚ Database Theory  β”‚ 3       β”‚ CS    β”‚
β”‚ S005 β”‚ Eve   β”‚  1   β”‚ CS   β”‚ CS401 β”‚ Machine Learning β”‚ 4       β”‚ CS    β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
(Dave/ME has no matching course, so not in result)

Equi-Join

An equi-join is a theta join where the condition uses only equality (=):

R β‹ˆ_{R.A = S.B} S

All equi-joins are theta joins, but not all theta joins are equi-joins.
A theta join with R.A > S.B is NOT an equi-join.

Natural Join

A natural join is a special equi-join that: 1. Joins on ALL common attribute names 2. Removes duplicate columns from the result

Notation:   R β‹ˆ S   (no subscript)

Definition: Join on all attributes with the same name in R and S,
            then project out the duplicate attribute columns.

Example:

STUDENT β‹ˆ ENROLLMENT

Common attribute: sid

Step 1: Equi-join on STUDENT.sid = ENROLLMENT.sid
Step 2: Remove duplicate sid column

Result:
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name  β”‚ year β”‚ dept β”‚ cid   β”‚ grade β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚ CS101 β”‚ A     β”‚
β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚ CS301 β”‚ A+    β”‚
β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚ MA101 β”‚ B+    β”‚
β”‚ S002 β”‚ Bob   β”‚  2   β”‚ CS   β”‚ CS101 β”‚ B     β”‚
β”‚ S002 β”‚ Bob   β”‚  2   β”‚ CS   β”‚ CS301 β”‚ A-    β”‚
β”‚ S003 β”‚ Carol β”‚  4   β”‚ EE   β”‚ EE201 β”‚ A     β”‚
β”‚ S003 β”‚ Carol β”‚  4   β”‚ EE   β”‚ CS101 β”‚ B+    β”‚
β”‚ S004 β”‚ Dave  β”‚  3   β”‚ ME   β”‚ CS101 β”‚ C     β”‚
β”‚ S005 β”‚ Eve   β”‚  1   β”‚ CS   β”‚ CS101 β”‚ A     β”‚
β”‚ S005 β”‚ Eve   β”‚  1   β”‚ CS   β”‚ MA101 β”‚ A-    β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

WARNING: Be careful with natural join when relations share
attribute names unintentionally!

STUDENT(sid, name, year, dept) β‹ˆ COURSE(cid, title, credits, dept)
                          ^^^^                                ^^^^
This joins on 'dept', which may not be what you want!
It matches students with courses in THEIR department.

Outer Joins

Standard joins discard tuples that do not match. Outer joins preserve unmatched tuples by padding them with NULLs.

Three types:

1. LEFT OUTER JOIN (βŸ•):
   Keep all tuples from the LEFT relation.
   Pad with NULLs if no match on the right.

2. RIGHT OUTER JOIN (βŸ–):
   Keep all tuples from the RIGHT relation.
   Pad with NULLs if no match on the left.

3. FULL OUTER JOIN (βŸ—):
   Keep all tuples from BOTH relations.
   Pad with NULLs on either side as needed.

Example: Left Outer Join

STUDENT βŸ•_{STUDENT.sid = ENROLLMENT.sid} ENROLLMENT

"Find all students and their enrollments, INCLUDING students
 with no enrollments"

(Suppose S004/Dave had no enrollments in our data.)

If Dave had NO enrollments:

Result:
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sid  β”‚ name  β”‚ year β”‚ dept β”‚ cid   β”‚ grade β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚ CS101 β”‚ A     β”‚
β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚ CS301 β”‚ A+    β”‚
β”‚ S001 β”‚ Alice β”‚  3   β”‚ CS   β”‚ MA101 β”‚ B+    β”‚
β”‚ S002 β”‚ Bob   β”‚  2   β”‚ CS   β”‚ CS101 β”‚ B     β”‚
β”‚ S002 β”‚ Bob   β”‚  2   β”‚ CS   β”‚ CS301 β”‚ A-    β”‚
β”‚ S003 β”‚ Carol β”‚  4   β”‚ EE   β”‚ EE201 β”‚ A     β”‚
β”‚ S003 β”‚ Carol β”‚  4   β”‚ EE   β”‚ CS101 β”‚ B+    β”‚
β”‚ S004 β”‚ Dave  β”‚  3   β”‚ ME   β”‚ NULL  β”‚ NULL  β”‚  ← preserved with NULLs
β”‚ S005 β”‚ Eve   β”‚  1   β”‚ CS   β”‚ CS101 β”‚ A     β”‚
β”‚ S005 β”‚ Eve   β”‚  1   β”‚ CS   β”‚ MA101 β”‚ A-    β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

Join Comparison Summary

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Join Types                                β”‚
β”‚                                                                  β”‚
β”‚  Cartesian Product  R Γ— S      All combinations (n Γ— m tuples)  β”‚
β”‚  Theta Join         R β‹ˆ_ΞΈ S    Cartesian + selection on ΞΈ       β”‚
β”‚  Equi-Join          R β‹ˆ_{=} S  Theta join with equality only    β”‚
β”‚  Natural Join       R β‹ˆ S      Equi-join on common attrs,       β”‚
β”‚                                 remove duplicates                β”‚
β”‚  Left Outer Join    R βŸ• S      Natural + keep unmatched R       β”‚
β”‚  Right Outer Join   R βŸ– S      Natural + keep unmatched S       β”‚
β”‚  Full Outer Join    R βŸ— S      Natural + keep all unmatched     β”‚
β”‚  Semi-Join          R ⋉ S      Tuples in R with a match in S   β”‚
β”‚  Anti-Join          R β–· S      Tuples in R with NO match in S  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Semi-Join and Anti-Join

Semi-Join:  R ⋉ S = Ο€_{attrs(R)}(R β‹ˆ S)

"Return tuples from R that have a matching tuple in S"
(Only R's attributes appear in the result)

Anti-Join:  R β–· S = R βˆ’ Ο€_{attrs(R)}(R β‹ˆ S)

"Return tuples from R that have NO matching tuple in S"

Example:

"Students who are enrolled in at least one course" (Semi-Join):
STUDENT ⋉ ENROLLMENT
β†’ All students from STUDENT who appear in ENROLLMENT

"Students who are NOT enrolled in any course" (Anti-Join):
STUDENT β–· ENROLLMENT
β†’ Students who have no matching enrollment

5. Division

The division operation answers "for all" queries. It is one of the most powerful and least intuitive operations in relational algebra.

Notation:   R Γ· S

Given:      R(A1, A2, ..., An, B1, B2, ..., Bm)
            S(B1, B2, ..., Bm)

Result:     Tuples t over {A1, ..., An} such that for EVERY tuple s in S,
            the tuple <t, s> is in R.

Formally:
  R Γ· S = { t | t ∈ Ο€_{A}(R) AND βˆ€s ∈ S : <t,s> ∈ R }

  where A = {A1, ..., An} (attributes of R not in S)

Division Expressed Using Fundamental Operations

R Γ· S = Ο€_A(R) βˆ’ Ο€_A( (Ο€_A(R) Γ— S) βˆ’ R )

Explanation:
  1. Ο€_A(R)              = all possible A-values in R
  2. Ο€_A(R) Γ— S          = every A-value paired with every S-tuple
  3. (Ο€_A(R) Γ— S) βˆ’ R    = combinations that are MISSING from R
  4. Ο€_A(...)             = A-values that are missing at least one S-tuple
  5. Ο€_A(R) βˆ’ ...         = A-values that are NOT missing any S-tuple
                          = A-values associated with ALL S-tuples

Division Example

"Find students enrolled in ALL CS courses"

Step 1: Define the dividend (student-course pairs)
  R = Ο€_{sid, cid}(ENROLLMENT)
  β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ sid  β”‚ cid   β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ S001 β”‚ CS101 β”‚
  β”‚ S001 β”‚ CS301 β”‚
  β”‚ S001 β”‚ MA101 β”‚
  β”‚ S002 β”‚ CS101 β”‚
  β”‚ S002 β”‚ CS301 β”‚
  β”‚ S003 β”‚ EE201 β”‚
  β”‚ S003 β”‚ CS101 β”‚
  β”‚ S004 β”‚ CS101 β”‚
  β”‚ S005 β”‚ CS101 β”‚
  β”‚ S005 β”‚ MA101 β”‚
  β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

Step 2: Define the divisor (all CS courses)
  S = Ο€_{cid}(Οƒ_{dept='CS'}(COURSE))
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”
  β”‚ cid   β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ CS101 β”‚
  β”‚ CS301 β”‚
  β”‚ CS401 β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”˜

Step 3: R Γ· S = students associated with ALL of {CS101, CS301, CS401}

  Check each student:
    S001: has {CS101, CS301} but NOT CS401 β†’ βœ—
    S002: has {CS101, CS301} but NOT CS401 β†’ βœ—
    S003: has {CS101} only                 β†’ βœ—
    S004: has {CS101} only                 β†’ βœ—
    S005: has {CS101} only                 β†’ βœ—

  Result: EMPTY SET (no student is enrolled in ALL three CS courses)

If CS401 were not in the course list (only CS101 and CS301):
  S = {CS101, CS301}
  S001: has {CS101, CS301} β†’ βœ“
  S002: has {CS101, CS301} β†’ βœ“
  Others: βœ—

  Result:
  β”Œβ”€β”€β”€β”€β”€β”€β”
  β”‚ sid  β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€
  β”‚ S001 β”‚
  β”‚ S002 β”‚
  β””β”€β”€β”€β”€β”€β”€β”˜

6. Additional Operations

Aggregation and Grouping

The grouping/aggregation operator extends relational algebra to support aggregate functions.

Notation:   _{G1, G2, ..., Gn} G _{F1(A1), F2(A2), ..., Fk(Ak)} (R)

            or more commonly written as:

            Ξ³_{G; F1(A1) AS name1, ...}(R)

Where:
  G1, ..., Gn = grouping attributes
  F1, ..., Fk = aggregate functions (COUNT, SUM, AVG, MIN, MAX)
  A1, ..., Ak = attributes to aggregate

Example:

"Count students per department"

Ξ³_{dept; COUNT(sid) AS count}(STUDENT)

Result:
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ dept β”‚ count β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ CS   β”‚ 3     β”‚
β”‚ EE   β”‚ 1     β”‚
β”‚ ME   β”‚ 1     β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜


"Average salary per department (for instructors)"

Ξ³_{dept; AVG(salary) AS avg_sal, COUNT(*) AS num}(INSTRUCTOR)

Result:
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ dept β”‚ avg_sal β”‚ num β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ CS   β”‚ 91500   β”‚ 2   β”‚
β”‚ EE   β”‚ 92000   β”‚ 1   β”‚
β”‚ MA   β”‚ 85000   β”‚ 1   β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

Assignment

The assignment operator stores intermediate results:

Notation:   temp ← expression

Example:
  CS_STUDENTS ← Οƒ_{dept='CS'}(STUDENT)
  CS_NAMES ← Ο€_{name}(CS_STUDENTS)

Sorting

Notation:   Ο„_{A1 ASC, A2 DESC}(R)

Note: Sorting produces a LIST, not a SET. Strictly speaking, it
      goes beyond pure relational algebra (which produces only sets).

7. Query Trees and Algebraic Optimization

Query Trees

A query tree (or operator tree) represents a relational algebra expression as a tree where: - Leaf nodes are base relations - Internal nodes are relational algebra operations - The root produces the final result

Example: "Find names of CS students enrolled in CS301"

Relational algebra:
  Ο€_{name}(Οƒ_{dept='CS' AND cid='CS301'}(STUDENT β‹ˆ ENROLLMENT))

Query tree:

              Ο€_{name}
                 β”‚
          Οƒ_{dept='CS' AND cid='CS301'}
                 β”‚
                β‹ˆ_{sid=sid}
               / \
          STUDENT  ENROLLMENT

Algebraic Optimization

The query optimizer transforms query trees using equivalence rules to find more efficient execution plans.

Key Equivalence Rules

Rule 1: Cascade of Selection

Οƒ_{c1 AND c2}(R) = Οƒ_{c1}(Οƒ_{c2}(R))

A conjunctive selection can be broken into a sequence of selections.

Rule 2: Commutativity of Selection

Οƒ_{c1}(Οƒ_{c2}(R)) = Οƒ_{c2}(Οƒ_{c1}(R))

The order of selections does not matter.

Rule 3: Cascade of Projection

Ο€_{L1}(Ο€_{L2}(...Ο€_{Ln}(R)...)) = Ο€_{L1}(R)

Only the outermost projection matters (if L1 βŠ† L2 βŠ† ... βŠ† Ln).

Rule 4: Commuting Selection with Projection

If the selection condition c only involves attributes in L:
  Ο€_L(Οƒ_c(R)) = Οƒ_c(Ο€_L(R))

Rule 5: Commutativity of Join

R β‹ˆ S = S β‹ˆ R
R Γ— S = S Γ— R

Rule 6: Associativity of Join

(R β‹ˆ S) β‹ˆ T = R β‹ˆ (S β‹ˆ T)

Rule 7: Pushing Selection Through Join

If condition c involves only attributes of R:
  Οƒ_c(R β‹ˆ S) = Οƒ_c(R) β‹ˆ S

This is the MOST IMPORTANT optimization rule!
It reduces the size of intermediate results.

Rule 8: Commutativity of Set Operations

R βˆͺ S = S βˆͺ R
R ∩ S = S ∩ R
(But R βˆ’ S β‰  S βˆ’ R)

Optimization Example

Original query tree (unoptimized):

              Ο€_{name}
                 β”‚
          Οƒ_{dept='CS' AND cid='CS301'}
                 β”‚
                β‹ˆ_{sid=sid}
               / \
          STUDENT  ENROLLMENT
          (5 rows)  (10 rows)
          Cartesian: 50 rows before filter

Optimized query tree (push selections down):

              Ο€_{name}
                 β”‚
                β‹ˆ_{sid=sid}
               / \
   Οƒ_{dept='CS'}  Οƒ_{cid='CS301'}
        |              |
     STUDENT       ENROLLMENT
     (β†’3 rows)     (β†’2 rows)
     Join: 6 combinations, ~2 matches

The optimized tree:
  1. Filters STUDENT to 3 CS students FIRST
  2. Filters ENROLLMENT to 2 CS301 enrollments FIRST
  3. Joins only 3 Γ— 2 = 6 combinations instead of 5 Γ— 10 = 50
  4. Significant reduction in intermediate result size

Heuristic Optimization Rules

1. Push selections down as far as possible
   (Reduce tuple count early)

2. Push projections down as far as possible
   (Reduce attribute count early, but keep join attributes)

3. Choose the most restrictive selection first
   (A selection that eliminates the most tuples)

4. Avoid Cartesian products
   (Always prefer joins over Cartesian + selection)

5. Choose join order to minimize intermediate result size
   (This is the hardest optimization problem β€” often NP-hard)

8. Relational Calculus (Brief Introduction)

While relational algebra is procedural (specifies how to compute the result), relational calculus is declarative (specifies what the result should be, not how to compute it).

Tuple Relational Calculus (TRC)

In TRC, queries are expressed using tuple variables that range over relations.

General form:
  { t | P(t) }

  "The set of all tuples t such that predicate P(t) is true"

Example: "Find all CS students"

  { t | t ∈ STUDENT ∧ t.dept = 'CS' }

  "The set of tuples t from STUDENT where dept is CS"

Example: "Find names and departments of students in year 3 or 4"

  { t.name, t.dept | t ∈ STUDENT ∧ (t.year = 3 ∨ t.year = 4) }

Example: "Find students enrolled in CS301"

  { t | t ∈ STUDENT ∧ βˆƒe ∈ ENROLLMENT(e.sid = t.sid ∧ e.cid = 'CS301') }

  "Tuples t from STUDENT such that there EXISTS an enrollment e
   with the same sid and cid = CS301"

Domain Relational Calculus (DRC)

In DRC, variables range over individual values (domains) rather than tuples.

General form:
  { <x1, x2, ..., xn> | P(x1, x2, ..., xn) }

Example: "Find names of CS students"

  { <n> | βˆƒs, y, d (STUDENT(s, n, y, d) ∧ d = 'CS') }

  "The set of name values n such that there exist values s, y, d
   where (s, n, y, d) is a tuple in STUDENT and d is CS"

Equivalence of Algebra and Calculus

Codd's Theorem (1972): Relational algebra and (safe) relational calculus have the same expressive power. Any query expressible in one can be expressed in the other.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                      β”‚
β”‚  Relational Algebra  ≑  Safe Tuple Relational Calc.  β”‚
β”‚                      ≑  Safe Domain Relational Calc.  β”‚
β”‚                      ≑  SQL (core)                   β”‚
β”‚                                                      β”‚
β”‚  A query language is "relationally complete" if it   β”‚
β”‚  can express everything that relational algebra can. β”‚
β”‚                                                      β”‚
β”‚  SQL is relationally complete (and more β€” it has     β”‚
β”‚  aggregation, ordering, recursion, etc.)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Safety of Expressions

A relational calculus expression is safe if it guarantees a finite result. Unsafe expressions can produce infinite results:

Unsafe:  { t | ¬(t ∈ STUDENT) }
         "All tuples NOT in STUDENT" β€” this is infinite!

Safe:    { t | t ∈ STUDENT ∧ Β¬(βˆƒe ∈ ENROLLMENT(e.sid = t.sid)) }
         "Students NOT enrolled in any course" β€” finite result

9. Equivalence with SQL

Every relational algebra expression has an SQL equivalent. Understanding this correspondence helps in writing and optimizing SQL queries.

Operation-by-Operation Mapping

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Relational Algebra β”‚ SQL Equivalent                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Οƒ_{c}(R)           β”‚ SELECT * FROM R WHERE c                  β”‚
β”‚ Ο€_{A,B}(R)         β”‚ SELECT DISTINCT A, B FROM R              β”‚
β”‚ ρ_{S}(R)           β”‚ R AS S  (in FROM clause)                 β”‚
β”‚ R βˆͺ S              β”‚ SELECT * FROM R UNION SELECT * FROM S    β”‚
β”‚ R ∩ S              β”‚ SELECT * FROM R INTERSECT SELECT * FROM Sβ”‚
β”‚ R βˆ’ S              β”‚ SELECT * FROM R EXCEPT SELECT * FROM S   β”‚
β”‚ R Γ— S              β”‚ SELECT * FROM R, S  (or CROSS JOIN)      β”‚
β”‚ R β‹ˆ_{c} S          β”‚ SELECT * FROM R JOIN S ON c              β”‚
β”‚ R β‹ˆ S              β”‚ SELECT * FROM R NATURAL JOIN S           β”‚
β”‚ R βŸ• S              β”‚ SELECT * FROM R LEFT OUTER JOIN S ON ... β”‚
β”‚ R Γ· S              β”‚ (requires NOT EXISTS + correlated sub)   β”‚
β”‚ Ξ³_{G;F(A)}(R)      β”‚ SELECT G, F(A) FROM R GROUP BY G        β”‚
β”‚ Ο„_{A}(R)           β”‚ SELECT * FROM R ORDER BY A               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Detailed SQL Equivalences

Selection:

Οƒ_{dept='CS' AND year>=3}(STUDENT)

SELECT *
FROM student
WHERE dept = 'CS' AND year >= 3;

Projection:

Ο€_{name, dept}(STUDENT)

SELECT DISTINCT name, dept
FROM student;

Note: SQL does NOT eliminate duplicates by default.
      You must use DISTINCT to match relational algebra's set semantics.

Natural Join:

STUDENT β‹ˆ ENROLLMENT

-- Method 1: NATURAL JOIN (matches on all common columns)
SELECT * FROM student NATURAL JOIN enrollment;

-- Method 2: Explicit JOIN (safer β€” you control which columns match)
SELECT s.sid, s.name, s.year, s.dept, e.cid, e.grade
FROM student s
JOIN enrollment e ON s.sid = e.sid;

Division (the hardest to express in SQL):

R Γ· S  =  "Find tuples in R associated with ALL tuples in S"

-- "Find students enrolled in ALL CS courses"
SELECT DISTINCT e.sid
FROM enrollment e
WHERE NOT EXISTS (
    SELECT c.cid
    FROM course c
    WHERE c.dept = 'CS'
    AND NOT EXISTS (
        SELECT 1
        FROM enrollment e2
        WHERE e2.sid = e.sid
        AND e2.cid = c.cid
    )
);

-- Alternative using COUNT:
SELECT e.sid
FROM enrollment e
JOIN course c ON e.cid = c.cid
WHERE c.dept = 'CS'
GROUP BY e.sid
HAVING COUNT(DISTINCT e.cid) = (
    SELECT COUNT(*) FROM course WHERE dept = 'CS'
);

Composition Example:

"Find names of students who received an A in any course"

Relational Algebra:
  Ο€_{name}(Οƒ_{grade='A'}(STUDENT β‹ˆ ENROLLMENT))

SQL:
  SELECT DISTINCT s.name
  FROM student s
  JOIN enrollment e ON s.sid = e.sid
  WHERE e.grade = 'A';

10. Complete Worked Examples

Example 1: Multi-Step Query

Query: "Find the names and departments of students who are enrolled in a course taught by the CS department but are not CS majors themselves."

Relational Algebra:

  CS_COURSES ← Οƒ_{dept='CS'}(COURSE)
  CS_ENROLLED ← ENROLLMENT β‹ˆ_{cid} CS_COURSES
  NON_CS_STUDENTS ← Οƒ_{deptβ‰ 'CS'}(STUDENT)
  RESULT ← Ο€_{name, dept}(NON_CS_STUDENTS β‹ˆ_{sid} CS_ENROLLED)

Step-by-step:

1. CS_COURSES = Οƒ_{dept='CS'}(COURSE)
   β†’ {CS101, CS301, CS401}

2. CS_ENROLLED = Ο€_{sid}(ENROLLMENT β‹ˆ CS_COURSES)
   β†’ {S001, S002, S003, S004, S005} (all who took CS101/CS301/CS401)

3. NON_CS_STUDENTS = σ_{dept≠'CS'}(STUDENT)
   β†’ {S003/Carol/EE, S004/Dave/ME}

4. RESULT = Ο€_{name, dept}(NON_CS_STUDENTS β‹ˆ CS_ENROLLED)
   β†’ {(Carol, EE), (Dave, ME)}

SQL:
  SELECT DISTINCT s.name, s.dept
  FROM student s
  JOIN enrollment e ON s.sid = e.sid
  JOIN course c ON e.cid = c.cid
  WHERE c.dept = 'CS' AND s.dept <> 'CS';

Example 2: Division Query

Query: "Find students who have taken ALL courses that student S001 has taken."

Relational Algebra:

  S001_COURSES ← Ο€_{cid}(Οƒ_{sid='S001'}(ENROLLMENT))
  ALL_ENROLLMENTS ← Ο€_{sid, cid}(ENROLLMENT)
  RESULT ← ALL_ENROLLMENTS Γ· S001_COURSES

Step-by-step:

1. S001_COURSES = {CS101, CS301, MA101}

2. Check each student:
   S001: {CS101, CS301, MA101} βŠ‡ {CS101, CS301, MA101} β†’ βœ“
   S002: {CS101, CS301}        βŠ‡ {CS101, CS301, MA101} β†’ βœ— (missing MA101)
   S003: {EE201, CS101}        βŠ‡ {CS101, CS301, MA101} β†’ βœ—
   S004: {CS101}               βŠ‡ {CS101, CS301, MA101} β†’ βœ—
   S005: {CS101, MA101}        βŠ‡ {CS101, CS301, MA101} β†’ βœ— (missing CS301)

3. RESULT = {S001}

SQL:
  SELECT e.sid
  FROM enrollment e
  WHERE e.cid IN (
      SELECT cid FROM enrollment WHERE sid = 'S001'
  )
  GROUP BY e.sid
  HAVING COUNT(DISTINCT e.cid) = (
      SELECT COUNT(DISTINCT cid) FROM enrollment WHERE sid = 'S001'
  );

Example 3: Outer Join

Query: "List all students with their enrollment counts, including students with no enrollments."

Relational Algebra:

  JOINED ← STUDENT βŸ• ENROLLMENT       (left outer join on sid)
  RESULT ← Ξ³_{sid, name; COUNT(cid) AS num_courses}(JOINED)

If Dave (S004) had no enrollments:

SQL:
  SELECT s.sid, s.name, COUNT(e.cid) AS num_courses
  FROM student s
  LEFT OUTER JOIN enrollment e ON s.sid = e.sid
  GROUP BY s.sid, s.name
  ORDER BY num_courses DESC;

Result:
  β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ sid  β”‚ name  β”‚ num_courses β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ S001 β”‚ Alice β”‚ 3           β”‚
  β”‚ S002 β”‚ Bob   β”‚ 2           β”‚
  β”‚ S003 β”‚ Carol β”‚ 2           β”‚
  β”‚ S005 β”‚ Eve   β”‚ 2           β”‚
  β”‚ S004 β”‚ Dave  β”‚ 0           β”‚  ← preserved by LEFT OUTER JOIN
  β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

11. Exercises

Basic Operations

Exercise 3.1: Using the sample database, write relational algebra expressions for:

  • (a) All students in year 2 or year 3
  • (b) Course titles with 4 or more credits
  • (c) Student IDs enrolled in EE201
  • (d) Names of students NOT in the CS department

Exercise 3.2: Evaluate the following expressions step by step, showing intermediate results:

  • (a) Ο€_{name}(Οƒ_{year > 2}(STUDENT))
  • (b) Ο€_{sid}(Οƒ_{grade='A'}(ENROLLMENT)) ∩ Ο€_{sid}(Οƒ_{dept='CS'}(STUDENT))
  • (c) STUDENT β‹ˆ (Οƒ_{cid='CS301'}(ENROLLMENT))

Join Operations

Exercise 3.3: For each of the following, write the relational algebra expression AND the equivalent SQL:

  • (a) Find names of students enrolled in "Database Theory"
  • (b) Find course titles that have at least one student with grade A
  • (c) Find students who are enrolled in courses outside their department
  • (d) Find pairs of students who share at least one course

Exercise 3.4: Explain the difference between the results of these three queries on our sample data:

Q1: STUDENT β‹ˆ ENROLLMENT
Q2: STUDENT βŸ• ENROLLMENT
Q3: STUDENT Γ— ENROLLMENT

How many tuples does each produce? Which tuples appear in Q2 but not Q1?

Division

Exercise 3.5: Write relational algebra expressions using division for:

  • (a) Students enrolled in every 3-credit course
  • (b) Students who have taken ALL courses that Bob (S002) has taken

Show the step-by-step evaluation.

Exercise 3.6: Express each division query from Exercise 3.5 in SQL using: - (i) Double NOT EXISTS - (ii) GROUP BY with HAVING COUNT

Optimization

Exercise 3.7: Given the query:

Ο€_{name}(Οƒ_{credits=3 AND grade='A'}(STUDENT β‹ˆ ENROLLMENT β‹ˆ COURSE))
  • (a) Draw the initial (unoptimized) query tree
  • (b) Apply algebraic optimization rules to produce an optimized query tree
  • (c) Explain which rules you applied and why the optimized tree is better
  • (d) Estimate the reduction in intermediate result sizes

Exercise 3.8: Prove or disprove the following equivalences:

  • (a) Οƒ_{c1}(R βˆͺ S) = Οƒ_{c1}(R) βˆͺ Οƒ_{c1}(S)
  • (b) Οƒ_{c1}(R βˆ’ S) = Οƒ_{c1}(R) βˆ’ Οƒ_{c1}(S)
  • (c) Ο€_A(R βˆͺ S) = Ο€_A(R) βˆͺ Ο€_A(S)
  • (d) Οƒ_{c1}(R Γ— S) = Οƒ_{c1}(R) Γ— S (where c1 involves only R's attributes)

Relational Calculus

Exercise 3.9: Express the following in Tuple Relational Calculus (TRC):

  • (a) Names of students in the CS department
  • (b) Students enrolled in at least two courses
  • (c) Courses with no enrolled students
  • (d) Students enrolled in ALL courses offered by the CS department

Exercise 3.10: Determine which of the following TRC expressions are safe. If unsafe, explain why and provide a safe equivalent:

  • (a) { t | Β¬(t ∈ STUDENT) }
  • (b) { t.name | t ∈ STUDENT ∧ t.gpa > 3.5 }
  • (c) { <x, y> | βˆƒt ∈ STUDENT(t.sid = x ∧ t.name = y) }
  • (d) { t | t.salary > 100000 }

Challenge Problems

Exercise 3.11: A relation R(A, B, C) contains the following tuples:

{(a1, b1, c1), (a1, b2, c1), (a1, b2, c2), (a2, b1, c1), (a2, b1, c2)}

A relation S(B, C) contains: {(b1, c1), (b1, c2)}

Compute R Γ· S step by step. Verify your answer by checking that each resulting tuple is associated with ALL tuples in S.

Exercise 3.12: Write a single relational algebra expression (no assignment) for:

"Find the department with the highest average instructor salary."

Hint: This requires aggregation and a comparison pattern.


Previous: Relational Model | Next: ER Modeling

to navigate between lessons