File System Implementation ββββ
File System Implementation ββββ¶
Overview¶
Learn how file systems store and manage data on disks. Covers key concepts needed for actual implementation including block allocation methods, inode structure, journaling, and RAID.
Table of Contents¶
- File System Structure
- Disk Block Allocation
- inode Structure
- Directory Implementation
- File System Examples
- Journaling
- RAID
- Practice Problems
1. File System Structure¶
1.1 Disk Layout¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Typical File System Layout β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Entire Disk β β
β ββββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬ββββββββββββββββββββββββββββββββ€ β
β β MBR βPartitionβPartitionβPartitionβ β β β
β β β 1 β 2 β 3 β ... β Free space β β
β ββββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ΄ββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Partition Structure (ext4) β β
β ββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββββββββββββββββββββββββ€ β
β β Boot β Super β Group β Block Groups β β
β βBlock β Block βDescriptorβ 0 β 1 β 2 β ... β β
β ββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Block Group Structure β β
β ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββββββββββββββββββββββ€ β
β β Block β inode β inode β Data Blocks β β
β β Bitmap β Bitmap β Table β β β
β ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1.2 Major Components¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β File System Components β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Component β Role β β
β ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Boot Block β Bootstrap code (OS loading) β β
β β β Block 0 of partition β β
β ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Superblock β File system metadata β β
β β β - Block size, total blocks/inodes β β
β β β - Free blocks/inodes count β β
β β β - Mount count, last check time β β
β ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Block Bitmap β Usage status of each block (0/1) β β
β β β Fast free block search β β
β ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β inode Bitmap β Usage status of each inode (0/1) β β
β ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β inode Table β Stores all inodes β β
β β β Core of file metadata β β
β ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Data Blocks β Actual file/directory content β β
β ββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Disk Block Allocation¶
2.1 Contiguous Allocation¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Contiguous Allocation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Directory Entry: β
β βββββββββββββ¬ββββββββββ¬βββββββββ β
β β Filename βStart Blockβ Lengthβ β
β βββββββββββββΌββββββββββΌβββββββββ€ β
β β file_a β 0 β 3 β β
β β file_b β 6 β 2 β β
β β file_c β 10 β 4 β β
β βββββββββββββ΄ββββββββββ΄βββββββββ β
β β
β Disk Blocks: β
β βββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ¬ββββ β
β β A β A β A β β β β B β B β β β C β C β C β C β β β
β βββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ΄ββββ β
β 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 β
β β
β Advantages: β
β - Fast sequential access (minimal disk head movement) β
β - Easy direct access: block n = start + n β
β β
β Disadvantages: β
β - External fragmentation (blocks 3-5, 8-9 wasted) β
β - Difficult to grow files β
β - Must know size in advance β
β β
β Used in: CD-ROM, DVD (read-only) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2.2 Linked Allocation¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Linked Allocation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Directory Entry: β
β βββββββββββββ¬ββββββββββ¬ββββββββββ β
β β Filename βStart BlockβEnd Blockβ β
β βββββββββββββΌββββββββββΌββββββββββ€ β
β β file_a β 0 β 12 β β
β βββββββββββββ΄ββββββββββ΄ββββββββββ β
β β
β Disk Blocks (each block includes next block pointer): β
β β
β βββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ β
β β Block 0 β β Block 5 β β Block 8 β β Block 12 β β
β β Data ββββΆβ Data ββββΆβ Data ββββΆβ Data β β
β β Next: 5 β β Next: 8 β β Next: 12 β β Next: -1 β β
β βββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ β
β 0 5 8 12 β
β β
β Advantages: β
β - No external fragmentation β
β - Files can grow dynamically β
β β
β Disadvantages: β
β - Inefficient direct access (must follow chain n times for block n) β
β - Pointer space overhead (4 bytes per block) β
β - Reliability issues (file lost if pointer corrupted) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2.3 FAT (File Allocation Table)¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FAT Structure β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β FAT (cached in memory) Disk Blocks β
β βββββββββββββββββββ βββββββββββββββββββββββββββββββββββββ β
β βIndexβNext Block β β β β
β βββββββΌβββββββββββ€ β 0 β 1 β 2 β 3 β 4 β ... β β
β β 0 β 5 ββββββββββββββΌβ A ββΌββββββΌβ B ββΌββββββΌββββββΌββββββ€ β
β β 1 β FREE β β β β β β β β β
β β 2 β 4 ββββββββββββββΌββββββΌββββββΌββββββΌββββββΌβ B ββΌββββββ€ β
β β 3 β FREE β β β β
β β 4 β EOF ββββββββββββββΌββββββΌβ A ββΌββββββΌββββββΌββββββΌββββββ€ β
β β 5 β 7 β β β β
β β 6 β FREE β β β β
β β 7 β EOF β βββββββββββββββββββββββββββββββββββββ β
β βββββββ΄βββββββββββ β
β β
β file_a: 0 β 5 β 7 β EOF β
β file_b: 2 β 4 β EOF β
β β
β Advantages: β
β - Fast direct access if FAT in memory β
β - Pointers outside data blocks improve reliability β
β β
β Disadvantages: β
β - FAT can be large (for large disks) β
β - Entire file system problems if FAT corrupted β
β β
β Used in: MS-DOS, Windows (FAT32), USB drives β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2.4 Indexed Allocation¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Indexed Allocation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Directory Entry: β
β βββββββββββββ¬ββββββββββββ β
β β Filename βIndex Blockβ β
β βββββββββββββΌββββββββββββ€ β
β β file_a β 8 β β
β βββββββββββββ΄ββββββββββββ β
β β β
β βΌ β
β Index Block (block 8): Data Blocks: β
β βββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββ β
β β [0] β Block 3 β β β β
β β [1] β Block 10 β β 3 β 10 β 15 β 21 β ... β β
β β [2] β Block 15 β βData βData βData βData β β β
β β [3] β Block 21 β β β β β β β β
β β [4] β -1 (end) β βββββββββββββββββββββββββββββββββββββ β
β β ... β β
β βββββββββββββββββββββ β
β β
β Advantages: β
β - Direct access possible: block n = index[n] β
β - No external fragmentation β
β β
β Disadvantages: β
β - Index block overhead β
β - Inefficient for small files β
β β
β Extensions: β
β - Linked index: link index blocks β
β - Multi-level index: tree structure (Unix inode) β
β - Combined approach: direct + indirect pointers (Unix inode) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. inode Structure¶
3.1 Unix/Linux inode¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β inode Structure (Unix/ext4) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β inode β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β mode (permissions, type) β β
β β uid (owner) β β
β β gid (group) β β
β β size (file size) β β
β β atime (access time) β β
β β mtime (modification time) β β
β β ctime (status change time) β β
β β link count (hard link count) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Block pointers (data location) β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Direct pointers [0-11] β Data blocks (12 blocks) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β
β β β Single indirect [12] β Index block β Data β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β
β β β Double indirect [13] β Index β Index β Data β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β
β β β Triple indirect [14] β Index β Index β Index β Data β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.2 Direct/Indirect Block Pointers¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β inode Block Pointer Structure β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Block size = 4KB, Pointer size = 4 bytes β
β Pointers/block = 4KB / 4B = 1024 β
β β
β inode β
β βββββββββββββββββ β
β β Direct [0] ββββββββββββββββββββββββββββββββββββββΆ [Data block] β
β β Direct [1] ββββββββββββββββββββββββββββββββββββββΆ [Data block] β
β β ... β β
β β Direct [11] ββββββββββββββββββββββββββββββββββββββΆ [Data block] β
β βββββββββββββββββ€ β
β β Single indirect[12]ββββΆββββββββββββ β
β β β βIndex blockββββΆ [Data] Γ 1024 β
β β β ββββββββββββ β
β βββββββββββββββββ€ β
β β Double indirect[13]ββββΆββββββββββββ ββββββββββββ β
β β β βIndex blockββββΆβIndex blockββββΆ [Data] Γ 1024 β
β β β β Γ 1024 β ββββββββββββ Γ 1024 β
β β β ββββββββββββ β
β βββββββββββββββββ€ β
β β Triple indirect[14]ββββΆ (3-level index) β
β βββββββββββββββββ β
β β
β Maximum file size calculation (4KB blocks): β
β Direct: 12 Γ 4KB = 48KB β
β Single indirect: 1024 Γ 4KB = 4MB β
β Double indirect: 1024 Γ 1024 Γ 4KB = 4GB β
β Triple indirect: 1024 Γ 1024 Γ 1024 Γ 4KB = 4TB β
β Total: ~4TB β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.3 inode Size Calculation Example¶
// Find nth block location of file
#define BLOCK_SIZE 4096
#define PTRS_PER_BLOCK (BLOCK_SIZE / sizeof(uint32_t)) // 1024
#define DIRECT_BLOCKS 12
#define SINGLE_INDIRECT_LIMIT (DIRECT_BLOCKS + PTRS_PER_BLOCK)
#define DOUBLE_INDIRECT_LIMIT (SINGLE_INDIRECT_LIMIT + PTRS_PER_BLOCK * PTRS_PER_BLOCK)
uint32_t get_block_number(inode_t* inode, uint32_t logical_block) {
if (logical_block < DIRECT_BLOCKS) {
// Direct block
return inode->direct[logical_block];
}
logical_block -= DIRECT_BLOCKS;
if (logical_block < PTRS_PER_BLOCK) {
// Single indirect
uint32_t* indirect = read_block(inode->single_indirect);
return indirect[logical_block];
}
logical_block -= PTRS_PER_BLOCK;
if (logical_block < PTRS_PER_BLOCK * PTRS_PER_BLOCK) {
// Double indirect
uint32_t index1 = logical_block / PTRS_PER_BLOCK;
uint32_t index2 = logical_block % PTRS_PER_BLOCK;
uint32_t* level1 = read_block(inode->double_indirect);
uint32_t* level2 = read_block(level1[index1]);
return level2[index2];
}
// Triple indirect (omitted)
return 0;
}
4. Directory Implementation¶
4.1 Directory Entry Structure¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Directory Implementation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Linear List (simple implementation): β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Directory file content β β
β ββββββββββββββ¬βββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ€ β
β β inode num β Entry length β Filename β β
β ββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ€ β
β β 2 β 12 β . β β
β β 2 β 12 β .. β β
β β 15 β 16 β file1.txt β β
β β 23 β 20 β documents β β
β β 45 β 24 β long_filename.doc β β
β ββββββββββββββ΄βββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββ β
β β
β ext4 directory entry (struct ext4_dir_entry_2): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β inode (4 bytes) : inode number β β
β β rec_len (2 bytes) : total length of this entry β β
β β name_len (1 byte) : filename length β β
β β file_type (1 byte) : file type (file/dir/link/etc) β β
β β name (variable) : filename β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4.2 Hash Table Directory¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hash Table Directory β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Filename β Hash function β Bucket index β
β β
β Hash Table β
β βββββββββββ β
β β 0 β β [readme.txt, inode 15] β
β βββββββββββ€ β
β β 1 β β [config.json, inode 23] β [data.csv, inode 45] β
β βββββββββββ€ β
β β 2 β β (empty) β
β βββββββββββ€ β
β β 3 β β [main.c, inode 67] β [test.c, inode 89] β
β βββββββββββ€ β
β β ... β β
β βββββββββββ β
β β
β Advantages: β
β - O(1) average search (linear list is O(n)) β
β - Effective for large directories β
β β
β ext4's htree (HTree Index): β
β - B-tree based hash directory β
β - Fast search even with thousands of files β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5. File System Examples¶
5.1 FAT32¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FAT32 Structure β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reserved β FAT β FAT β Data Region β β
β β Region β 1 β 2 β (Clusters) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Features: β
β - Cluster size: 4KB ~ 32KB β
β - FAT entry: 32 bits (28 bits used) β
β - Max file size: 4GB - 1 (32-bit size field) β
β - Max volume size: 2TB β
β β
β Directory entry (32 bytes): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Filename (8 bytes) + Extension (3 bytes) = 8.3 format β β
β β Attributes (1 byte): read-only, hidden, system, directory etcβ β
β β Creation/modification/access times (10 bytes) β β
β β Starting cluster number (4 bytes) β β
β β File size (4 bytes) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β VFAT: Long filename support (uses separate entries) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5.2 ext4¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ext4 Key Features β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Extents β
β Represent consecutive blocks as single entry β Efficient for large filesβ
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Starting block: 1000, Length: 500 β β
β β β References blocks 1000~1499 at once β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β 2. Journaling β
β - Metadata journaling: Default mode, fast β
β - Data journaling: Journals data too, safe β
β β
β 3. Large Volume Support β
β - Max file size: 16TB β
β - Max volume size: 1EB (Exabyte) β
β β
β 4. Other Features β
β - Delayed allocation: Write optimization β
β - Multi-block allocation: Allocate multiple blocks at once β
β - Directory htree: Fast search β
β - Online defragmentation β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6. Journaling¶
6.1 Need for Journaling¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Journaling Concept β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Problem: File system inconsistency β
β β
β File deletion process (without journaling): β
β 1. Remove entry from directory β
β 2. Add inode to free list β Crash here! β
β 3. Add data blocks to free list β
β β
β Problems after crash: β
β - Deleted from directory β
β - But inode and blocks still marked 'in use' β
β - Space inaccessible + unrecoverable β
β β
β Solution: Journaling β
β Record to log (journal) before change β Recover based on log after crashβ
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6.2 Journaling Operation¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Journaling Operation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Journal Area β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β TxB β Data blocks β Metadata β TxE β TxB β ... β TxE β ... β β
β ββββ¬βββ΄ββββββββββββ΄βββββββββββββ΄βββ¬βββ β β
β β β β β
β β Transaction 1 β Transaction 2 β β
β β β β β
β TxB: Transaction Begin β β
β TxE: Transaction End (Commit) β β
β β
β Write process: β
β 1. Record Transaction Begin β
β 2. Record blocks to be changed in journal β
β 3. Record Transaction End (commit) β
β 4. Write data to actual location (Checkpoint) β
β 5. Remove transaction from journal β
β β
β Recovery process (at boot): β
β 1. Examine journal β
β 2. Completed transactions (has TxE): rewrite to actual location β
β 3. Incomplete transactions (no TxE): ignore (rollback) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6.3 Journal Modes¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ext4 Journal Modes β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Mode β Description β β
β βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β journal β Journal both data + metadata β β
β β β Safest but slowest (2x writes) β β
β βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β ordered β Journal metadata only β β
β β (default) β Write data first, then journal metadata β β
β β β Guarantees data consistency, reasonable performanceβ β
β βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β writeback β Journal metadata only β β
β β β No guarantee on data write order β β
β β β Fastest but can lose data on crash β β
β βββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β # Specify option at mount β
β $ mount -o data=ordered /dev/sda1 /mnt β
β $ mount -o data=journal /dev/sda1 /mnt β
β $ mount -o data=writeback /dev/sda1 /mnt β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7. RAID¶
7.1 RAID Overview¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAID Overview β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β RAID = Redundant Array of Independent Disks β
β β
β Goals: β
β 1. Performance improvement (Striping - data distribution) β
β 2. Reliability improvement (Mirroring/Parity - redundant storage) β
β 3. Large capacity (combine multiple disks) β
β β
β Implementation methods: β
β - Hardware RAID: RAID controller card β
β - Software RAID: Implemented in OS (mdadm) β
β - Hybrid: Motherboard built-in RAID β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7.2 RAID Levels¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAID 0 (Striping) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Data: A B C D E F G H β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββΌββββββββββββββββ β
β βΌ βΌ βΌ β
β βββββββββββ βββββββββββ βββββββββββ β
β β Disk 0 β β Disk 1 β β Disk 2 β β
β β A D G β β B E H β β C F β β
β βββββββββββ βββββββββββ βββββββββββ β
β β
β Features: β
β - Data distributed across all disks (striping) β
β - Read/write speed: nΓ improvement β
β - Capacity: disk count Γ single capacity β
β - No redundancy: one disk failure β total data loss β
β - Use case: Performance critical, data loss acceptable β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAID 1 (Mirroring) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Data: A B C D β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββ΄ββββββββββββββββ β
β βΌ βΌ β
β βββββββββββ βββββββββββ β
β β Disk 0 β β Disk 1 β β
β β A B C D β βββ Identical βββΆ β A B C D β β
β β(Original)β β (Mirror)β β
β βββββββββββ βββββββββββ β
β β
β Features: β
β - Perfect copy (mirroring) β
β - Read speed: 2Γ (parallel read from both disks) β
β - Write speed: Same (must write to both) β
β - Capacity: Single disk capacity (50% efficiency) β
β - One disk failure preserves data β
β - Use case: Critical data, system drives β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAID 5 (Striping with Parity) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ β
β β Disk 0 β Disk 1 β Disk 2 β Disk 3 β β
β βββββββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββ€ β
β β A1 β A2 β A3 β Ap (parity)β β
β βββββββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββ€ β
β β B1 β B2 β Bp (parity)β B3 β β
β βββββββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββ€ β
β β C1 β Cp (parity)β C2 β C3 β β
β βββββββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββ€ β
β β Dp (parity)β D1 β D2 β D3 β β
β βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ β
β β
β Parity calculation: Ap = A1 XOR A2 XOR A3 β
β Recovery: A1 = Ap XOR A2 XOR A3 (if A1 lost) β
β β
β Features: β
β - Parity distributed (RAID 4 uses dedicated disk) β
β - Capacity: (n-1) Γ single capacity β
β - Tolerates 1 disk failure β
β - Improved read speed, write slightly slower (parity calculation) β
β - Use case: Most servers β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAID 6 (Dual Parity) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ β
β β Disk 0 β Disk 1 β Disk 2 β Disk 3 β β
β βββββββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββ€ β
β β A1 β A2 β Ap (P) β Aq (Q) β β
β βββββββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββ€ β
β β B1 β Bp (P) β Bq (Q) β B2 β β
β βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ β
β β
β Features: β
β - 2 parities (P, Q) - different algorithms β
β - Tolerates 2 simultaneous disk failures β
β - Capacity: (n-2) Γ single capacity β
β - Requires minimum 4 disks β
β - Safer than RAID 5, lower write performance β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAID 10 (1+0, Mirror + Stripe) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Data: A B C D β
β β β
β βββββββββββββββββ΄ββββββββββββββββ β
β βΌ βΌ β
β βββββββββββββββ βββββββββββββββ β
β β Stripe 0 β β Stripe 1 β β
β β A C β β B D β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ β
β β β β
β ββββββββ΄βββββββ ββββββββ΄βββββββ β
β βΌ βΌ βΌ βΌ β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β Disk 0 β β Disk 1 β β Disk 2 β β Disk 3 β β
β β A C β β A C β β B D β β B D β β
β β(Original)β β (Mirror)β β(Original)β β (Mirror)β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β
β Features: β
β - Combines RAID 1 (mirror) + RAID 0 (stripe) β
β - Performance: Close to RAID 0 β
β - Reliability: Close to RAID 1 β
β - Capacity: 50% efficiency β
β - Tolerates 1 disk per mirror pair (up to 2 disks) β
β - Use case: High performance + high reliability needed β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7.3 RAID Level Comparison¶
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAID Level Comparison β
ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ€
β RAID β Min β Capacity β Read β Write β Failure Tolerance β
β Level β Disks β Efficiencyβ Speed β Speed β β
ββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌββββββββββββββββββββββββββ€
β RAID 0 β 2 β 100% β nΓ β nΓ β 0 disks (dangerous) β
ββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌββββββββββββββββββββββββββ€
β RAID 1 β 2 β 50% β 2Γ β 1Γ β 1 disk β
ββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌββββββββββββββββββββββββββ€
β RAID 5 β 3 β (n-1)/n β Fast β Medium β 1 disk β
ββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌββββββββββββββββββββββββββ€
β RAID 6 β 4 β (n-2)/n β Fast β Slow β 2 disks β
ββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌββββββββββββββββββββββββββ€
β RAID 10 β 4 β 50% β Fast β Fast β 1 per pair β
ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββ
Practice Problems¶
Problem 1: inode Block Pointers¶
In a Unix file system with 4KB blocks and 4-byte pointers, how many indirect blocks are needed to store a 100MB file?
Show Answer
Block size: 4KB = 4096 bytes
Pointers/block: 4096 / 4 = 1024
File size: 100MB = 102,400KB = 25,600 blocks
Direct blocks: 12 β 12 blocks covered
Single indirect: 1024 β 1024 blocks covered
Double indirect: 1024 Γ 1024 = 1,048,576 blocks covered
Required blocks: 25,600
Direct: 12 used
Single indirect: 1024 used (1 indirect block)
Remaining blocks: 25,600 - 12 - 1024 = 24,564
Double indirect:
- Level-1 indirect blocks: ceil(24564 / 1024) = 24
- Level-2 indirect block: 1
Total indirect blocks:
- Single indirect: 1
- Double indirect level-1: 1
- Double indirect level-2: 24
Total: 26
Problem 2: FAT Chain¶
Given the following FAT table, list the cluster chain for file A (start: 3).
| Cluster | Value |
|---|---|
| 0 | FREE |
| 1 | 8 |
| 2 | FREE |
| 3 | 7 |
| 4 | EOF |
| 5 | 1 |
| 6 | FREE |
| 7 | 4 |
| 8 | EOF |
Show Answer
File A starting: cluster 3
Track chain:
3 β FAT[3]=7 β FAT[7]=4 β FAT[4]=EOF
File A cluster chain: 3 β 7 β 4 β EOF
File size: 3 clusters
Note: another file starting at cluster 5:
5 β FAT[5]=1 β FAT[1]=8 β FAT[8]=EOF
Chain: 5 β 1 β 8 β EOF
Problem 3: Journaling Recovery¶
After system crash, the journal has the following state. What is the file system state after recovery?
Journal contents:
[TxB_1] [Block 100: DataA] [Block 101: MetadataA] [TxE_1]
[TxB_2] [Block 200: DataB] [Block 201: MetadataB]
(No TxE_2)
Show Answer
Recovery process:
1. Check Transaction 1:
- Both TxB_1 and TxE_1 exist
- Completed transaction
- Rewrite blocks 100, 101 to actual location (redo)
2. Check Transaction 2:
- Only TxB_2, no TxE_2
- Incomplete transaction
- Ignore this transaction (undo)
- Do not apply blocks 200, 201 changes
Result:
- Transaction 1 changes: Applied (file A changes complete)
- Transaction 2 changes: Not applied (no changes to file B)
File system is in consistent state with Transaction 1 only
Problem 4: RAID Capacity Calculation¶
Calculate usable capacity when configuring the following RAID with 6Γ 2TB disks:
- RAID 0
- RAID 1
- RAID 5
- RAID 6
- RAID 10
Show Answer
Disks: 6 Γ 2TB = 12TB total capacity
1. RAID 0: 6 Γ 2TB = 12TB (100%)
- Use all space
- No redundancy
2. RAID 1: 2 Γ 2TB = 4TB (3 pairs in mirror)
OR simple 2-disk mirror: 2TB
- 6 disks as 3 pairs, mirroring β each pair 2TB
- Total: 3 Γ 2TB = 6TB (50%)
3. RAID 5: (6-1) Γ 2TB = 10TB (83%)
- 1 disk worth for parity
4. RAID 6: (6-2) Γ 2TB = 8TB (67%)
- 2 disks worth for parity
5. RAID 10: 6/2 Γ 2TB = 6TB (50%)
- 3 stripes, each mirrored
- OR: (disk count / 2) Γ single capacity
Problem 5: File Deletion Process¶
Explain the process when deleting /home/user/file.txt in Unix file system from inode and block perspective.
Show Answer
Delete command: rm /home/user/file.txt
1. Path resolution:
- Find /home directory's inode
- Find /home/user directory's inode
- Find file.txt's inode number (e.g., inode 12345)
2. Permission check:
- Check write permission on directory (/home/user)
- Check if file is write-locked
3. Remove directory entry:
- Delete "file.txt" entry from /home/user directory file
- Adjust rec_len to merge space
4. Decrease inode link count:
- inode 12345's link_count--
- If link_count > 0, stop here (other links exist)
5. If link_count == 0 and no open files:
- Free inode 12345 in inode bitmap (mark as 0)
- Check inode's data block pointers
- Free all data blocks in block bitmap
- Free indirect blocks too
6. Update superblock:
- Increase free inode count
- Increase free block count
Note: Actual data not erased
- Blocks just marked 'available'
- Recoverable until overwritten with new data
Next Steps¶
Continue to 18_IO_and_IPC.md to learn I/O systems and inter-process communication!
References¶
- Silberschatz, "Operating System Concepts" Chapters 14-15
- Linux kernel source:
fs/ext4/,fs/fat/ - ext4 documentation: https://www.kernel.org/doc/html/latest/filesystems/ext4/
- RAID fundamentals: https://en.wikipedia.org/wiki/RAID