09. Indexing
09. Indexing¶
Previous: Query Processing and Optimization | Next: Transaction Theory
Learning Objectives¶
- Understand why indexing is critical for database performance
- Distinguish between ordered indices: primary, clustering, and secondary
- Master B-Tree and B+Tree structures, operations, and complexity
- Learn hash-based indexing techniques: static, extendible, and linear hashing
- Understand bitmap indices and multi-dimensional index structures
- Apply index design guidelines in practice
1. Why Indexing Matters¶
The Fundamental Problem¶
Consider a table employees with 1,000,000 rows stored on disk. Each disk block holds 10 rows, so the table occupies 100,000 blocks. To find a single employee by ID:
Without index: Scan all 100,000 blocks β O(n) disk I/Os
With index: Follow index pointers β O(log n) disk I/Os
For 100,000 blocks: - Sequential scan: up to 100,000 block reads - B+Tree index (branching factor 200): approximately log_200(1,000,000) β 3 block reads
This is a 33,000x improvement in I/O cost.
Sequential vs. Indexed Access¶
Sequential Access (Table Scan):
ββββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ
β Blk1 β Blk2 β Blk3 β Blk4 β Blk5 β ... β BlkN β
ββββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ
β Read every block sequentially
Indexed Access:
βββββββββββββββββββ
β Index Root β β 1 I/O
β [50β100β150] β
ββββ¬βββββββ¬βββββββ¬ββ
β β β
ββββββ ββββββ ββββββ
βLeafβ βLeafβ βLeafβ β 1 I/O (follow pointer)
ββββ¬ββ ββββββ ββββββ
β
ββββββββ
β Data β β 1 I/O (fetch actual record)
β Blockβ
ββββββββ
When Not to Use an Index¶
Indexes are not always beneficial:
- Small tables: A sequential scan of a few blocks is faster than index traversal overhead
- High selectivity queries: If a query returns most rows (e.g., >15-20% of the table), a full scan is cheaper
- Frequent writes: Every INSERT, UPDATE, or DELETE must also update the index
- Low-cardinality columns: Indexing a boolean column with a B+Tree is wasteful (bitmap index may be appropriate)
Cost Model Basics¶
The cost of disk access dominates query performance. We measure in terms of block transfers (reading/writing a disk block) and seeks (moving the disk head).
Let:
- b = number of blocks in the file
- n = number of records
- f = blocking factor (records per block), so b = βn/fβ
| Access Method | Average Cost (equality search) |
|---|---|
| Linear scan (unsorted) | b/2 block transfers |
| Linear scan (sorted) | βlogβ(b)β (binary search) |
| Primary index (sorted) | βlogβ(b_i)β + 1, where b_i = index blocks |
| B+Tree | βlog_p(b)β + 1, where p = branching factor |
| Hash index | 1 (ideal) to 2 (with overflow) |
2. Ordered Indices¶
Index Structure Fundamentals¶
An index is a data structure that maps search key values to pointers (record identifiers or block addresses). The search key need not be a primary key -- any attribute or set of attributes can serve as a search key.
Index Entry Format:
βββββββββββββββ¬ββββββββββββββ
β Search Key β Pointer(s) β
β Value β to Records β
βββββββββββββββ΄ββββββββββββββ
2.1 Primary Index¶
A primary index is built on the ordering key of a sequentially ordered file. The file is physically sorted on the indexed attribute.
Primary Index on Employee ID (file sorted by ID):
Index: Data File:
βββββββ¬βββββββ βββββββββββββββββββββ
β 100 β βββββββββββββββββββ β 100 β Alice β ... β Block 1
βββββββΌβββββββ€ β 105 β Bob β ... β
β 200 β βββββββ βββββββββββββββββββββ€
βββββββΌβββββββ€ βββββββββββββ β 200 β Carol β ... β Block 2
β 300 β βββββββ β 210 β Dave β ... β
βββββββ΄βββββββ β βββββββββββββββββββββ€
βββββββββββββ β 300 β Eve β ... β Block 3
β 350 β Frank β ... β
βββββββββββββββββββββ
Properties: - One index entry per block (not per record) -- this is a sparse index - The search key is the same attribute the file is sorted on - Very efficient for equality and range queries - Only one primary index can exist per table (since the file can only be sorted one way)
2.2 Clustering Index¶
A clustering index is built on a non-key attribute that the file is physically ordered on. Multiple records may share the same search key value.
Clustering Index on Department:
Index: Data File (sorted by dept):
ββββββββββββ¬βββββββ βββββββββββββββββββββββββββββ
β Acctg β βββββββββββββββ β Acctg β Alice β 60000 β
ββββββββββββΌβββββββ€ β Acctg β Bob β 55000 β
β Eng β βββββββ βββββββββββββββββββββββββββββ€
ββββββββββββΌβββββββ€ ββββββββ β Eng β Carol β 80000 β
β Sales β βββββββ β Eng β Dave β 75000 β
ββββββββββββ΄βββββββ β β Eng β Eve β 72000 β
β βββββββββββββββββββββββββββββ€
βββββββ β Sales β Frank β 50000 β
β Sales β Grace β 52000 β
βββββββββββββββββββββββββββββ
Properties: - One index entry per distinct value of the search key - Each pointer leads to the first block containing that value - Efficient for grouping and aggregation queries
2.3 Secondary Index¶
A secondary index provides an alternative access path on a non-ordering attribute. The file is not sorted on this attribute.
Secondary Index on Salary:
Index (sorted by salary): Data File (sorted by ID, NOT salary):
βββββββββ¬βββββββ βββββββββββββββββββββββββββββ
β 50000 β βββββββββββββββββββ β 100 β Alice β 60000 β Block 1
βββββββββΌβββββββ€ β 105 β Bob β 50000 β
β 55000 β βββββββ βββββββββββββββββββββββββββββ€
βββββββββΌβββββββ€ β β 200 β Carol β 80000 β Block 2
β 60000 β ββββββββ β 210 β Dave β 55000 β
βββββββββΌβββββββ€ ββ βββββββββββββββββββββββββββββ€
β 72000 β βββ ββ β 300 β Eve β 72000 β Block 3
β ... β β ββ β 350 β Frank β 75000 β
βββββββββ΄βββββ ββ βββββββββββββββββββββββββββββ
βββ (pointers cross block boundaries)
Properties: - Must be a dense index (one entry per record, not per block) since data is not sorted by the search key - Multiple secondary indices can exist on the same table - Less efficient for range queries (random I/O pattern due to non-sequential data) - An extra level of indirection (bucket of pointers) is sometimes used for duplicate key values
Dense vs. Sparse Indices¶
Dense Index: One index entry for every record in the data file.
Dense Index:
βββββββ¬ββββ ββββββββββββββββ
β 100 β βββΌββββ β 100 β Alice β
βββββββΌββββ€ β 105 β Bob β βββ entry for 105 exists
β 105 β βββΌββββ β β
βββββββΌββββ€ ββββββββββββββββ€
β 200 β βββΌββββ β 200 β Carol β
βββββββΌββββ€ β 210 β Dave β
β 210 β βββΌββββ β β
βββββββ΄ββββ ββββββββββββββββ
Sparse Index: One index entry for some records (typically one per block).
Sparse Index:
βββββββ¬ββββ ββββββββββββββββ
β 100 β βββΌββββ β 100 β Alice β Block 1
β β β β 105 β Bob β (no entry for 105)
βββββββΌββββ€ ββββββββββββββββ€
β 200 β βββΌββββ β 200 β Carol β Block 2
β β β β 210 β Dave β (no entry for 210)
βββββββ΄ββββ ββββββββββββββββ
Comparison:
| Aspect | Dense Index | Sparse Index |
|---|---|---|
| Space | Larger (entry per record) | Smaller (entry per block) |
| Search | Direct lookup possible | May need to scan within block |
| Requirement | Works on unsorted files | Requires sorted data file |
| Maintenance | More updates on insert/delete | Fewer updates |
| Use case | Secondary indices | Primary indices |
Multi-Level Indices¶
When the index itself is too large to fit in memory, we can build an index on the index:
Level 2 (outer index):
ββββββββ¬ββββ
β 1 β βββΌββββ ββββββββ¬ββββ
β 500 β βββΌββ β 1 β βββΌββββ Data Block 1
ββββββββ΄ββββ β β 100 β βββΌββββ Data Block 2
β β 200 β βββΌββββ Data Block 3
β β ... β β
β ββββββββ΄ββββ
β Level 1 (inner index, part 1)
β
ββ ββββββββ¬ββββ
β 500 β βββΌββββ Data Block K
β 600 β βββΌββββ Data Block K+1
β ... β β
ββββββββ΄ββββ
Level 1 (inner index, part 2)
This naturally leads us to tree-structured indices, specifically B-Trees and B+Trees.
3. B-Tree¶
3.1 Structure¶
A B-Tree of order m (also called a B-Tree of degree m) satisfies:
- Every node has at most
mchildren - Every internal node (except root) has at least
βm/2βchildren - The root has at least 2 children (if it is not a leaf)
- All leaves appear at the same level
- A non-leaf node with
kchildren containsk-1keys
B-Tree of order 4 (each node holds up to 3 keys):
βββββββββββ
β 30 β
ββββ¬ββββ¬βββ
ββββββ ββββββ
ββββββββ΄βββ ββββββ΄βββββββ
β 10 β 20 β β 40 β 50 β
ββ¬ββββ¬ββββ¬β ββ¬ββββ¬βββββ¬ββ
β β β β β β
ββββββββββββ ββββββββ βββββ
β5 ββ15ββ25β β35ββ45β β55 β
β8 ββ ββ β β ββ β β60 β
ββββββββββββ ββββββββ βββββ
Note: In a B-Tree (unlike B+Tree), data pointers
exist at EVERY node, not just leaves.
Node structure (internal node with n keys):
ββββββ¬ββββββ¬βββββ¬ββββββ¬βββββ¬ββββββ¬βββββ
β Pβ β Kβ β Pβ β Kβ β Pβ β ... β Pββββ
β β Dβ β β Dβ β β β β
ββββββ΄ββββββ΄βββββ΄ββββββ΄βββββ΄ββββββ΄βββββ
β β β β
subtree subtree subtree subtree
< Kβ [Kβ,Kβ) [Kβ,Kβ) β₯ Kβ
Where:
Pα΅’ = pointer to child node (or data block for leaf)
Kα΅’ = search key value
Dα΅’ = pointer to the data record with key Kα΅’
3.2 Search¶
Searching a B-Tree for key K:
BTREE-SEARCH(node, K):
i = 1
while i β€ node.n and K > node.key[i]:
i = i + 1
if i β€ node.n and K == node.key[i]:
return (node, i) // Found at this node
if node is a leaf:
return NOT_FOUND
// Read child from disk
DISK-READ(node.child[i])
return BTREE-SEARCH(node.child[i], K)
Complexity: O(log_m n) disk I/Os, where n is the number of keys and m is the order.
3.3 Insertion¶
Insert key K into a B-Tree of order m:
- Find the appropriate leaf node using search
- Insert the key in sorted order within the leaf
- If the node overflows (has
mkeys): - Split the node into two nodes at the median key
- Promote the median key to the parent
- Recursively handle parent overflow if needed
Example: Insert 25 into a B-Tree of order 3 (max 2 keys per node):
Before:
ββββββ
β 20 β
βββ¬βββ
ββββββ΄ββββββ
βββββββ ββββββββ
β10β15β β22β30 β β Node is full
βββββββ ββββββββ
Step 1: Key 25 belongs in the right leaf [22, 30].
Step 2: Insert β [22, 25, 30] β overflow! (3 keys > max 2)
Step 3: Split at median (25). Promote 25 to parent.
After:
ββββββββ
β20β25 β
ββ¬βββ¬βββ
ββββββ β ββββββ
βββββββ ββββ ββββ
β10β15β β22β β30β
βββββββ ββββ ββββ
3.4 Deletion¶
Delete key K from a B-Tree of order m:
- Find the node containing
K - If
Kis in a leaf: remove it directly - If
Kis in an internal node: replace with predecessor (or successor) from leaf, then delete from leaf - If the node underflows (fewer than
βm/2β - 1keys): - Redistribute (borrow) from a sibling if possible
- Otherwise, merge with a sibling and pull down a key from the parent
- Recursively handle parent underflow
Example: Delete 20 from above tree (order 3, min 1 key per non-root node):
Before:
ββββββββ
β20β25 β
ββ¬βββ¬βββ
ββββββ β ββββββ
βββββββ ββββ ββββ
β10β15β β22β β30β
βββββββ ββββ ββββ
Step 1: Key 20 is in an internal node.
Step 2: Replace with in-order predecessor (15).
Step 3: Delete 15 from the leaf.
After replacement:
ββββββββ
β15β25 β
ββ¬βββ¬βββ
ββββββ β ββββββ
ββββ ββββ ββββ
β10β β22β β30β
ββββ ββββ ββββ
3.5 Complexity Analysis¶
For a B-Tree of order m with n keys:
| Operation | Disk I/Os | CPU Time |
|---|---|---|
| Search | O(log_m n) | O(m Β· log_m n) |
| Insert | O(log_m n) | O(m Β· log_m n) |
| Delete | O(log_m n) | O(m Β· log_m n) |
Height bound: For n β₯ 1 keys and minimum degree t = βm/2β:
h β€ log_t((n+1)/2)
With a typical branching factor of 100-200, a tree of height 3-4 can index billions of records.
4. B+Tree¶
4.1 Structure and Properties¶
The B+Tree is the most widely used index structure in relational databases (PostgreSQL, MySQL InnoDB, Oracle, SQL Server all use B+Trees). It differs from the B-Tree in crucial ways:
- All data pointers are in leaf nodes -- internal nodes only store keys and child pointers
- Leaf nodes are linked in a doubly-linked list for efficient range scans
- Keys in internal nodes are duplicated in leaves (internal keys serve only as guides)
B+Tree Structure:
Internal Nodes (guide keys only):
ββββββββββββββ
β 30 β 60 β
ββββ¬ββββ¬βββ¬βββ
ββββββββββ β ββββββββββ
β β β
βββββββββββ βββββββββββ βββββββββββ
β 10 β 20 β β 40 β 50 β β 70 β 80 β
βββ¬βββ¬βββ¬ββ βββ¬βββ¬βββ¬ββ βββ¬βββ¬βββ¬ββ
β β β β β β β β β
Leaf Nodes (actual data, linked):
βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ
β 5,8 βββ10,15βββ20,25βββ30,35βββ40,45βββ50,55βββ60,65βββ70,75βββ80,85β
β β β β β β β β β β β β β β β β β β
β D* β β D* β β D* β β D* β β D* β β D* β β D* β β D* β β D* β
βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ
β β
Head Leaf chain (doubly linked) Tail
D* = pointer to actual data record
Node Format¶
Internal node (order m, at most m pointers, m-1 keys):
ββββββ¬ββββββ¬βββββ¬ββββββ¬βββββ¬ββββββ¬βββββ
β Pβ β Kβ β Pβ β Kβ β Pβ β ... β Pββ
ββββββ΄ββββββ΄βββββ΄ββββββ΄βββββ΄ββββββ΄βββββ
All keys in subtree(Pβ) < Kβ
Kβ β€ all keys in subtree(Pβ) < Kβ
...
Kβββ β€ all keys in subtree(Pβ)
Leaf node (at most m-1 key-pointer pairs, plus sibling pointer):
ββββββββ¬βββββββ¬βββββββ¬βββββββ¬ββββββββββ
βKβ,DββKβ,DββKβ,Dββ ... β Pββββ β
ββββββββ΄βββββββ΄βββββββ΄βββββββ΄ββββββββββ
Dα΅’ = pointer to record with key Kα΅’
Pββββ = pointer to next leaf node
4.2 B+Tree vs. B-Tree Comparison¶
| Feature | B-Tree | B+Tree |
|---|---|---|
| Data pointers | At every node | Only at leaf nodes |
| Key duplication | No duplicates | Internal keys duplicated in leaves |
| Leaf linking | Not linked | Doubly linked list |
| Range queries | Must traverse tree multiple times | Follow leaf chain |
| Fan-out | Lower (data pointers take space) | Higher (internal nodes are slimmer) |
| Equality search | May terminate early (at internal node) | Always goes to leaf |
| Space usage | Slightly less total | Slightly more (key duplication) |
| Practical use | Rarely used in databases | Standard in all major RDBMS |
4.3 Search in B+Tree¶
Equality search for key K:
BPLUS-SEARCH(root, K):
node = root
while node is not a leaf:
i = smallest index such that K < node.key[i]
if no such i exists:
node = node.child[last] // rightmost child
else:
node = node.child[i] // go to child i
// Now at leaf: scan for K
for each (key, pointer) in node:
if key == K:
return pointer
return NOT_FOUND
Range search for keys in [lo, hi]:
BPLUS-RANGE-SEARCH(root, lo, hi):
// Step 1: Find the leaf containing lo
leaf = find leaf for lo (using equality search logic)
// Step 2: Scan leaves via linked list
results = []
while leaf is not null:
for each (key, pointer) in leaf:
if key > hi:
return results
if key >= lo:
results.append(pointer)
leaf = leaf.next // Follow leaf chain!
return results
This is why B+Trees excel at range queries -- once you find the starting leaf, you simply follow the linked list.
4.4 Insertion in B+Tree¶
Insert key K with data pointer D:
BPLUS-INSERT(root, K, D):
leaf = find appropriate leaf node
if leaf has room:
insert (K, D) in sorted order in leaf
else:
// Leaf overflow: split
Create new leaf L'
Distribute entries evenly: first βm/2β stay, rest go to L'
// Copy up: the first key of L' goes to parent
parent-key = L'.key[1]
INSERT-IN-PARENT(leaf, parent-key, L')
Update leaf linked list pointers
INSERT-IN-PARENT(left, key, right):
if parent has room:
insert (key, right pointer) in parent
else:
// Internal node overflow: split
Create new internal node N'
Distribute: first βm/2β pointers stay, rest go to N'
// Push up: the middle key goes to parent (NOT copied)
middle-key = the key that separates the two halves
INSERT-IN-PARENT(parent, middle-key, N')
Important distinction: - Leaf split: copy up (the separator key exists in both the leaf and the parent) - Internal split: push up (the separator key moves to the parent and is removed from the splitting node)
Example: Insert 7 into a B+Tree of order 3 (max 2 keys per leaf):
Before:
ββββββ
β 5 β
ββ¬ββββ
ββββββ ββββββ
βββββββ ββββββββ
β 3,4 β βββ β 5,6 β
βββββββ ββββββββ
Step 1: Key 7 belongs in leaf [5, 6]. Leaf is full.
Step 2: Split leaf: [5] and [6, 7]. Copy up key 6 to parent.
After:
ββββββββ
β 5 β 6β
ββ¬βββ¬βββ
βββββ β βββββ
βββββββ βββββ βββββββ
β 3,4 βββ 5 βββ 6,7 β
βββββββ βββββ βββββββ
4.5 Deletion in B+Tree¶
Delete key K:
BPLUS-DELETE(root, K):
leaf = find leaf containing K
Remove K from leaf
if leaf has enough keys (β₯ β(m-1)/2β):
done (may need to update parent key if first key changed)
else:
// Underflow
if sibling has extra keys:
redistribute (borrow from sibling)
update parent key
else:
merge with sibling
delete entry from parent
recursively handle parent underflow
Leaf merge removes a key from the parent. Internal node merge pulls down a key from the parent. When the root is left with a single child, the child becomes the new root (tree height decreases).
4.6 Bulk Loading¶
Building a B+Tree one insertion at a time from an empty tree is inefficient. Bulk loading is used when creating an index on existing data:
BULK-LOAD(sorted_data, m):
// Step 1: Sort data by search key (external sort if needed)
// Step 2: Fill leaf nodes sequentially
for each key in sorted_data:
add to current leaf
if leaf full:
start new leaf
add separator to parent level
// Step 3: Build internal levels bottom-up
repeat for each level until root is created
Advantages of bulk loading: - Sequential I/O instead of random I/O - Fill factor can be controlled (e.g., 90% full for future inserts) - O(n log_m n) total, but with much better constants
PostgreSQL:
-- CREATE INDEX uses bulk loading internally when table has data
CREATE INDEX idx_emp_salary ON employees(salary);
-- REINDEX rebuilds an index (useful after heavy updates)
REINDEX INDEX idx_emp_salary;
4.7 Height and Performance Analysis¶
For a B+Tree of order m with n search key values:
Maximum height:
h β€ βlog_{βm/2β}(n)β
Practical example:
Given: - Block size: 4 KB - Key size: 8 bytes, pointer size: 8 bytes - Internal node fan-out: m = 4096 / (8 + 8) = 256 - Leaf entries: (m-1) = 255 per leaf
For n = 100,000,000 (100 million) records:
Height = βlogβββ(100,000,000)β = βlogβββ(10βΈ)β β β8/2.1β β 4
So 4 disk reads suffice to find any record among 100 million. In practice, the root and top levels are cached in memory, reducing this to 1-2 disk reads.
5. Hash-Based Indexing¶
5.1 Static Hashing¶
A hash index uses a hash function h(K) to map search key values directly to bucket addresses.
Hash Function h maps key K to bucket number:
Key: "Alice" β h("Alice") = 2
Key: "Bob" β h("Bob") = 0
Key: "Carol" β h("Carol") = 2 (collision!)
Bucket Array:
βββββββββββ
β Bucket 0 β β [Bob, ...]
βββββββββββ€
β Bucket 1 β β [empty]
βββββββββββ€
β Bucket 2 β β [Alice, ...] β [Carol, ...] (overflow chain)
βββββββββββ€
β Bucket 3 β β [Dave, ...]
βββββββββββ
Properties: - Ideal case: O(1) lookup -- one disk read - Overflow handling: Chaining (linked list of overflow buckets) - Weakness: Fixed number of buckets. Performance degrades as file grows
Hash function requirements: 1. Uniform distribution: Keys should be spread evenly across buckets 2. Deterministic: Same key always maps to the same bucket 3. Fast computation: O(1) time
5.2 Extendible Hashing¶
Extendible hashing adapts to data growth by using a directory that doubles in size as needed, without reorganizing the entire file.
Key concepts:
- Global depth (d): Number of bits of the hash value used by the directory
- Local depth (dα΅’): Number of bits used by bucket i
- Directory size = 2^d entries
Example with global depth d = 2:
Hash values (binary): Directory (d=2):
h(Kβ) = 01001... ββββββ¬ββββββββββββ
h(Kβ) = 10110... β 00 β βββ Bucket A (local depth 2)
h(Kβ) = 00101... β 01 β βββ Bucket B (local depth 2)
h(Kβ) = 11010... β 10 β βββ Bucket C (local depth 1)
h(Kβ
) = 10011... β 11 β βββ Bucket C (local depth 1)
ββββββ΄ββββββββββββ
Note: Entries 10 and 11 both point to Bucket C
because C's local depth (1) < global depth (2).
Only the first bit matters for Bucket C.
Bucket split when bucket overflows:
If Bucket B (local depth 2) overflows:
Case 1: local depth < global depth
β Split bucket, increase local depth
β Redistribute entries
β Update directory pointers
Case 2: local depth == global depth
β Double the directory (global depth increases by 1)
β Split bucket, increase local depth
β Redistribute entries
β Update directory pointers
Advantages: - No overflow chains (splits handle growth) - At most 2 disk accesses for any lookup (1 for directory, 1 for bucket) - Grows gracefully
Disadvantages: - Directory can become large if data is skewed - Directory doubling is expensive (though infrequent)
5.3 Linear Hashing¶
Linear hashing avoids the directory entirely. Buckets are split in a round-robin fashion, one at a time, controlled by a split pointer.
Key idea:
- Maintain a split pointer s that tracks the next bucket to split
- Use two hash functions: hβ(K) = K mod N and hβ(K) = K mod 2N
- When a bucket overflows, split the bucket at the split pointer (not necessarily the overflowing bucket)
State with N=4 buckets, split pointer s=1:
Buckets:
ββββββββββββ
β Bucket 0 β β [records with hβ(K)=0] (already split, uses hβ)
ββββββββββββ€
β Bucket 1 β β [records with hβ(K)=1] β split pointer s
ββββββββββββ€
β Bucket 2 β β [records with hβ(K)=2] (not yet split, uses hβ)
ββββββββββββ€
β Bucket 3 β β [records with hβ(K)=3] (not yet split, uses hβ)
ββββββββββββ€
β Bucket 4 β β [records with hβ(K)=4] (created by splitting Bucket 0)
ββββββββββββ
Lookup algorithm:
b = hβ(K) // e.g., K mod 4
if b < s: // bucket already split
b = hβ(K) // use K mod 8 instead
search bucket b
When to split:
- Split is triggered when any bucket overflows
- The bucket at the split pointer is split (not the overflowing bucket!)
- The overflowing bucket uses overflow chains temporarily
- Split pointer advances: s = s + 1
- When s = N, a round is complete: N = 2N, s = 0, advance hash functions
Advantages: - No directory needed - Smooth growth (one bucket at a time) - Guaranteed O(1) average access
Disadvantages: - Temporary overflow chains - Splitting may not relieve the actual overflowing bucket immediately
Hash vs. B+Tree Comparison¶
| Criteria | Hash Index | B+Tree Index |
|---|---|---|
| Equality query | O(1) -- excellent | O(log n) -- good |
| Range query | Not supported | Excellent (leaf chain) |
| Ordered traversal | Not supported | Natural (leaf scan) |
| Dynamic growth | Extendible/linear | Splits and merges |
| Space usage | Can waste space (empty buckets) | Good utilization (~67%) |
| Implementation | Simpler | More complex |
| Common in RDBMS | PostgreSQL (hash), Redis | All major RDBMS |
6. Bitmap Index¶
6.1 Concept¶
A bitmap index creates a bit vector for each distinct value of an indexed attribute. Each bit corresponds to a row in the table.
Table: employees (8 rows)
βββββββ¬ββββββββ¬βββββββββ¬βββββββββ
β RID β Name β Dept β Gender β
βββββββΌββββββββΌβββββββββΌβββββββββ€
β 0 β Alice β Eng β F β
β 1 β Bob β Sales β M β
β 2 β Carol β Eng β F β
β 3 β Dave β HR β M β
β 4 β Eve β Eng β F β
β 5 β Frank β Sales β M β
β 6 β Grace β HR β F β
β 7 β Hank β Eng β M β
βββββββ΄ββββββββ΄βββββββββ΄βββββββββ
Bitmap Index on Dept:
Eng: [1, 0, 1, 0, 1, 0, 0, 1] (rows 0,2,4,7)
Sales: [0, 1, 0, 0, 0, 1, 0, 0] (rows 1,5)
HR: [0, 0, 0, 1, 0, 0, 1, 0] (rows 3,6)
Bitmap Index on Gender:
F: [1, 0, 1, 0, 1, 0, 1, 0] (rows 0,2,4,6)
M: [0, 1, 0, 1, 0, 1, 0, 1] (rows 1,3,5,7)
6.2 Bitmap Operations¶
Queries can be answered by combining bitmaps using bitwise operations:
Query: Find all female engineers
Dept=Eng: [1, 0, 1, 0, 1, 0, 0, 1]
AND
Gender=F: [1, 0, 1, 0, 1, 0, 1, 0]
βββββββββββββββββββββββββββββββββββββ
Result: [1, 0, 1, 0, 1, 0, 0, 0] β rows 0, 2, 4
(Alice, Carol, Eve)
Query: Find all employees in Sales OR HR
Dept=Sales: [0, 1, 0, 0, 0, 1, 0, 0]
OR
Dept=HR: [0, 0, 0, 1, 0, 0, 1, 0]
βββββββββββββββββββββββββββββββββββββ
Result: [0, 1, 0, 1, 0, 1, 1, 0] β rows 1, 3, 5, 6
Query: Find all non-engineers
NOT Dept=Eng: NOT [1, 0, 1, 0, 1, 0, 0, 1]
βββββββββββββββββββββββββββββββββββββββββββββ
Result: [0, 1, 0, 1, 0, 1, 1, 0] β rows 1, 3, 5, 6
6.3 When to Use Bitmap Indices¶
Ideal for: - Low cardinality attributes (Gender: 2 values, Status: 3-5 values) - Data warehousing with mostly read-only access - Complex multi-attribute queries (AND/OR combinations) - Counting queries (COUNT with WHERE) -- just count set bits - Star schema fact tables with many dimension foreign keys
Not ideal for: - High cardinality attributes (e.g., unique IDs -- one bitmap per value!) - OLTP with frequent updates (updating a bitmap on every insert is expensive) - Highly concurrent environments (bitmap locks affect many rows)
6.4 Bitmap Compression¶
For large tables, bitmaps can become very large. Compression techniques help:
Run-Length Encoding (RLE):
Original: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
RLE: (0, 11), (1, 1), (0, 4) β "11 zeros, 1 one, 4 zeros"
Word-Aligned Hybrid (WAH): Used in Oracle and other systems. Compresses sequences of identical words (32 or 64 bits) while still allowing efficient bitwise operations on the compressed form.
Roaring Bitmaps: A modern compression technique that partitions the bitmap into chunks and uses the optimal representation (array, bitmap, or run) for each chunk. Widely used in Apache Lucene, Apache Spark, and other systems.
6.5 Space Analysis¶
For a table with n rows and an attribute with c distinct values:
Uncompressed bitmap size = n Γ c bits = nc/8 bytes
Compared to B+Tree:
B+Tree index size β n Γ (key_size + pointer_size) bytes
Example: 1 million rows, attribute with 5 values: - Bitmap: 1,000,000 x 5 / 8 = 625,000 bytes β 610 KB - B+Tree: 1,000,000 x (8 + 8) = 16,000,000 bytes β 15.3 MB
Bitmap is 25x smaller for low-cardinality attributes.
7. Multi-Dimensional Indexing¶
7.1 The Problem¶
Standard B+Trees and hash indices work well for one-dimensional search keys. But many queries involve multiple dimensions:
-- Spatial query
SELECT * FROM restaurants
WHERE latitude BETWEEN 37.7 AND 37.8
AND longitude BETWEEN -122.5 AND -122.4;
-- Multi-attribute range query
SELECT * FROM products
WHERE price BETWEEN 10 AND 50
AND weight BETWEEN 0.5 AND 2.0;
A B+Tree on (latitude, longitude) can efficiently filter on latitude but then must scan all matching entries for longitude. True multi-dimensional indices address both dimensions simultaneously.
7.2 R-Tree¶
The R-Tree (Rectangle Tree) is the most widely used spatial index. It organizes data using minimum bounding rectangles (MBRs).
R-Tree Structure:
Root: [MBRβ, MBRβ]
ββββββββββββββββββββββββββββββββββββββββ
β βββββββββββββ ββββββββββββββββ β
β β MBRβ β β MBRβ β β
β β ββββ ββββ β β βββββ ββββ β β
β β βP1β βP2β β β βP3 β βP4β β β
β β ββββ ββββ β β βββββ ββββ β β
β β ββββ β β ββββ β β
β β βP5β β β βP6β β β
β β ββββ β β ββββ β β
β βββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββ
Each internal node: [MBRβ, ptrβ, MBRβ, ptrβ, ...]
Each leaf node: [MBRβ, oidβ, MBRβ, oidβ, ...]
Properties:
- Balanced tree (all leaves at the same level)
- Each node contains between βm/2β and m entries
- MBRs at a given level may overlap
- Search may need to follow multiple paths (unlike B+Tree)
Operations: - Search: Start at root, descend into all children whose MBR overlaps the query rectangle - Insert: Choose the subtree whose MBR needs the least enlargement; split if necessary - Used in: PostgreSQL (GiST indices), Oracle Spatial, PostGIS
7.3 kd-Tree¶
The kd-tree (k-dimensional tree) partitions space by alternating splitting dimensions at each level.
2D kd-tree example (splitting on x, then y, then x, ...):
Data points: (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
(7,2) Split on x=7
/ \
(5,4) (9,6) Split on y=4, y=6
/ \ \
(2,3) (4,7) (8,1) Split on x=2, x=4, x=8
Spatial partitioning:
βββββββββββββββββββββββββ
β β β
β (2,3) β (9,6) β
β Β· β Β· β
ββββββββ(5,4)ββββββ β
β Β· β β β
β (4,7) β β (8,1) β
β β β Β· β
βββββββββ΄ββββ΄ββββββββββββ
x=7 (split line)
Properties: - Binary tree (each node splits space in half) - Efficient for low-dimensional data (d β€ 20) - Search: O(n^(1-1/d) + k) where k is the number of results - Not balanced for dynamic data (use variants like k-d-B tree)
Comparison of spatial indices:
| Feature | R-Tree | kd-Tree |
|---|---|---|
| Dimensions | Works for any d | Best for low d |
| Dynamic updates | Good (designed for it) | Poor (may unbalance) |
| Disk-based | Yes (designed for it) | Originally in-memory |
| Overlap | Allows overlap | No overlap |
| Use case | GIS, spatial databases | k-NN search, in-memory |
8. Index Selection and Design Guidelines¶
8.1 Index Selection Problem¶
Choosing the right indices is critical for performance. The index selection problem asks: given a workload (set of queries), which indices should be created?
Factors to consider: 1. Query patterns: Which columns appear in WHERE, JOIN, ORDER BY, GROUP BY? 2. Update frequency: How often are INSERT, UPDATE, DELETE executed? 3. Data distribution: Cardinality and selectivity of each column 4. Storage budget: Each index costs disk space and maintenance overhead 5. Correlation: Columns that are frequently queried together
8.2 Covering Index¶
A covering index includes all columns needed by a query, so the database can answer the query from the index alone without accessing the table (index-only scan).
-- Query:
SELECT name, salary FROM employees WHERE department = 'Eng';
-- Non-covering index (requires table access):
CREATE INDEX idx_dept ON employees(department);
-- Finds matching rows via index, then fetches name, salary from table
-- Covering index (no table access needed):
CREATE INDEX idx_dept_covering ON employees(department, name, salary);
-- All needed columns are in the index
PostgreSQL INCLUDE syntax:
-- INCLUDE columns are stored in the leaf level but not in the search key
CREATE INDEX idx_dept_incl ON employees(department) INCLUDE (name, salary);
-- department is the search key; name and salary are payload only
Benefits: - Eliminates table access (huge I/O savings) - Particularly effective for frequently-run queries
Trade-offs: - Larger index size - More columns to maintain on updates
8.3 Composite (Multi-Column) Index¶
A composite index is built on multiple columns. The column order matters significantly.
CREATE INDEX idx_dept_salary ON employees(department, salary);
This index is useful for:
-- Uses full index (both columns):
SELECT * FROM employees WHERE department = 'Eng' AND salary > 80000;
-- Uses index prefix (department only):
SELECT * FROM employees WHERE department = 'Eng';
-- Does NOT use this index efficiently:
SELECT * FROM employees WHERE salary > 80000; -- salary is not a prefix
Leftmost prefix rule: A composite index on (A, B, C) can be used for queries on:
- A
- A, B
- A, B, C
But NOT efficiently for:
- B alone
- C alone
- B, C
8.4 Partial Index¶
A partial index indexes only a subset of rows, defined by a predicate.
-- Index only active employees (if 90% are inactive, index is much smaller)
CREATE INDEX idx_active_emp ON employees(name)
WHERE status = 'active';
-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders(customer_id)
WHERE order_date > '2025-01-01';
-- Index only non-null values
CREATE INDEX idx_email ON users(email)
WHERE email IS NOT NULL;
Benefits: - Smaller index size (fewer entries) - Faster to maintain (fewer updates) - Better cache utilization
8.5 Expression Index¶
Some databases support indexing computed expressions:
-- PostgreSQL: index on expression
CREATE INDEX idx_lower_email ON users(lower(email));
-- Useful for case-insensitive searches:
SELECT * FROM users WHERE lower(email) = 'alice@example.com';
-- Index on year extracted from timestamp:
CREATE INDEX idx_order_year ON orders(EXTRACT(YEAR FROM order_date));
8.6 Practical Guidelines¶
Rule 1: Index columns in WHERE, JOIN, ORDER BY
-- This query benefits from indices on customer_id, order_date, status
SELECT o.* FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date > '2025-01-01'
ORDER BY o.status;
Rule 2: Consider selectivity - High selectivity (few matching rows) = good index candidate - Low selectivity (most rows match) = poor index candidate - Rule of thumb: Index is useful if it selects < 15-20% of rows
Rule 3: Avoid over-indexing
Each index costs:
- Disk space (size of the index structure)
- Insert overhead: ~1 B+Tree insert per index
- Update overhead: delete old + insert new per index
- Vacuum/maintenance overhead
Typical guideline: 3-5 indices per table, rarely more than 10
Rule 4: Use EXPLAIN to verify index usage
EXPLAIN ANALYZE SELECT * FROM employees WHERE department = 'Eng';
-- Look for:
-- "Index Scan" or "Index Only Scan" β index is being used
-- "Seq Scan" β index is NOT being used (or doesn't exist)
-- "Bitmap Index Scan" β bitmap index used for multi-condition queries
Rule 5: Monitor and maintain indices
-- PostgreSQL: check index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- Indices with idx_scan = 0 are unused and can be dropped
-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_emp_salary'));
-- Rebuild fragmented indices
REINDEX INDEX idx_emp_salary;
9. Index Structures in Major Databases¶
PostgreSQL¶
-- B+Tree (default)
CREATE INDEX idx_btree ON table(column);
-- Hash (equality only, WAL-logged since v10)
CREATE INDEX idx_hash ON table USING hash(column);
-- GiST (Generalized Search Tree -- R-Tree, full-text, etc.)
CREATE INDEX idx_gist ON table USING gist(geom_column);
-- GIN (Generalized Inverted Index -- arrays, JSONB, full-text)
CREATE INDEX idx_gin ON table USING gin(jsonb_column);
-- BRIN (Block Range Index -- large sorted tables)
CREATE INDEX idx_brin ON table USING brin(timestamp_column);
-- SP-GiST (Space-Partitioned GiST -- kd-tree, radix tree)
CREATE INDEX idx_spgist ON table USING spgist(point_column);
MySQL InnoDB¶
- Primary key = clustered B+Tree index (data stored in leaf nodes)
- Secondary indices store the primary key value (not a row pointer)
- This means secondary index lookup requires two B+Tree traversals:
1. Secondary index β primary key value
2. Primary index β actual row data
Summary Table¶
| Index Type | Best For | Limitation |
|---|---|---|
| B+Tree | General purpose, range, ORDER BY | Not great for multi-dimension |
| Hash | Exact equality lookup | No range support |
| Bitmap | Low cardinality, analytics | Poor for OLTP, high cardinality |
| GiST/R-Tree | Spatial, geometric data | Overlap can slow search |
| GIN | Full-text, arrays, JSONB | Large index, slow updates |
| BRIN | Very large, naturally ordered tables | Low precision |
10. Exercises¶
Conceptual Questions¶
Exercise 1: Explain why a secondary index must be dense (one entry per record) while a primary index can be sparse (one entry per block).
Exercise 2: A table has 500,000 records stored in blocks of 4 KB each. Each record is 200 bytes. A B+Tree index on the primary key uses 8-byte keys and 8-byte pointers. Each index node is one block (4 KB).
(a) How many records fit in one data block? (b) How many data blocks does the table occupy? (c) What is the maximum fan-out (order) of the B+Tree? (d) What is the maximum height of the B+Tree? (e) How many disk I/Os are needed for an equality search using the index vs. a full table scan?
Exercise 3: Compare B-Tree and B+Tree. Why do virtually all database systems use B+Trees instead of B-Trees?
Exercise 4: A bitmap index is created on a color column with 8 distinct values in a table with 10 million rows.
(a) What is the total uncompressed size of the bitmap index? (b) If only 0.1% of rows have color = 'purple', how compressible is the purple bitmap using run-length encoding? (c) Compare this to a B+Tree index on the same column.
Practical Questions¶
Exercise 5: For the following query workload, design an indexing strategy:
-- Q1 (90% of traffic): exact lookup
SELECT * FROM users WHERE email = ?;
-- Q2 (5% of traffic): range scan with sort
SELECT name, created_at FROM users
WHERE created_at > ? ORDER BY created_at DESC LIMIT 20;
-- Q3 (3% of traffic): multi-column filter
SELECT * FROM users
WHERE country = ? AND age BETWEEN ? AND ?;
-- Q4 (2% of traffic): text search
SELECT * FROM users WHERE name ILIKE '%smith%';
For each query, specify: (a) What index (or indices) would you create? (b) What type of index (B+Tree, hash, GIN, etc.)? (c) Would you use a composite, covering, or partial index?
Exercise 6: Given the following extendible hash directory with global depth 2 and bucket capacity 2:
Directory: Buckets:
00 β Bucket A [h=00110, h=00010] (local depth 2)
01 β Bucket B [h=01100] (local depth 2)
10 β Bucket C [h=10001, h=10110] (local depth 2)
11 β Bucket D [h=11000] (local depth 2)
Show the state of the directory and buckets after inserting a record with hash value h = 00001.
Exercise 7: Consider a linear hashing scheme with 4 initial buckets (N=4), split pointer s=0, and bucket capacity 2. The hash functions are hβ(K) = K mod 4 and hβ(K) = K mod 8. Starting with:
Bucket 0: [8, 16] (full)
Bucket 1: [5]
Bucket 2: [10]
Bucket 3: [7, 15] (full)
Show the state after inserting keys 12, 9, and 3 (one at a time).
Analysis Questions¶
Exercise 8: Prove that the height of a B+Tree with order m and n keys is at most βlog_{βm/2β}(n)β.
Exercise 9: A database administrator notices that a query SELECT COUNT(*) FROM orders WHERE status = 'pending' takes 2 seconds on a table with 50 million rows. The status column has 5 distinct values. Recommend an indexing strategy and estimate the improvement.
Exercise 10: You have a table sensor_data with 1 billion rows, columns (sensor_id, timestamp, value, location_x, location_y), and the following query patterns:
- Time-range queries:
WHERE timestamp BETWEEN ? AND ? - Sensor-specific queries:
WHERE sensor_id = ? AND timestamp BETWEEN ? AND ? - Spatial queries:
WHERE location_x BETWEEN ? AND ? AND location_y BETWEEN ? AND ? - Aggregations:
SELECT sensor_id, AVG(value) FROM ... GROUP BY sensor_id
Design a comprehensive indexing strategy. Consider B+Tree, BRIN, GiST, and composite indices. Justify your choices.
Previous: Query Processing and Optimization | Next: Transaction Theory