1# Compiler Design Examples
2
3This directory contains 10 Python examples demonstrating key compiler design concepts, from lexical analysis through bytecode generation and execution. All examples use only Python's standard library (no external dependencies).
4
5## Files Overview
6
7### 1. `01_lexer.py` - Lexical Analysis (Scanner)
8**Concepts:**
9- Token types and Token representation (dataclass)
10- Regular expressions for token pattern matching
11- Keyword vs identifier recognition
12- String and character literal scanning
13- Line/column tracking for error reporting
14- Whitespace and comment handling
15
16**Run:** `python 01_lexer.py`
17
18---
19
20### 2. `02_thompson_nfa.py` - Thompson's Construction (Regex to NFA)
21**Concepts:**
22- Converting regular expressions to Non-deterministic Finite Automata (NFA)
23- Infix-to-postfix conversion using Shunting Yard algorithm
24- Thompson's construction rules for operators: |, *, +, ?, concatenation
25- NFA state and transition representation
26- Epsilon-closure computation
27- NFA simulation and string acceptance
28
29**Run:** `python 02_thompson_nfa.py`
30
31---
32
33### 3. `03_subset_construction.py` - NFA to DFA Conversion
34**Concepts:**
35- Subset construction (powerset construction) algorithm
36- Converting NFA to equivalent Deterministic Finite Automaton (DFA)
37- Epsilon-closure and move operations
38- DFA state minimization concepts
39- Transition table representation
40- Comparing NFA vs DFA simulation performance
41
42**Run:** `python 03_subset_construction.py`
43
44---
45
46### 4. `04_recursive_descent_parser.py` - Recursive Descent Parser
47**Concepts:**
48- Hand-written recursive descent parsing
49- Abstract Syntax Tree (AST) construction
50- Grammar representation (left-recursion removed)
51- Operator precedence and associativity
52- Control flow statements: if/else, while loops
53- Expression evaluation with operator precedence
54- Error handling and recovery
55
56**Run:** `python 04_recursive_descent_parser.py`
57
58---
59
60### 5. `05_ll1_parser.py` - LL(1) Table-Driven Parser
61**Concepts:**
62- Context-free grammar (CFG) representation
63- FIRST set computation
64- FOLLOW set computation
65- LL(1) parsing table construction
66- Table-driven parsing with explicit stack
67- Conflict detection in parsing tables
68- Parsing ambiguous vs unambiguous grammars
69
70**Run:** `python 05_ll1_parser.py`
71
72---
73
74### 6. `06_ast_visitor.py` - AST and Visitor Pattern
75**Concepts:**
76- Abstract Syntax Tree node classes (dataclass-based)
77- Visitor design pattern for tree traversal
78- Multiple operations on same AST: evaluation, type checking, pretty-printing
79- Type inference for literals and operators
80- Expression evaluation with different data types
81- Separation of structure from operations
82
83**Run:** `python 06_ast_visitor.py`
84
85---
86
87### 7. `07_type_checker.py` - Semantic Analysis and Type Checking
88**Concepts:**
89- Symbol table management with nested scopes
90- Variable declaration and scope resolution
91- Function definitions with typed parameters and return types
92- Type compatibility checking for assignments
93- Function call validation (arity and argument types)
94- Return type matching
95- Error detection and reporting
96
97**Run:** `python 07_type_checker.py`
98
99---
100
101### 8. `08_three_address_code.py` - Intermediate Code Generation
102**Concepts:**
103- Three-Address Code (TAC) representation
104- Control Flow Graph (CFG) construction
105- TAC instruction types: ASSIGN, BINOP, UNARY, LABEL, JUMP, CJUMP
106- Converting AST to intermediate representation
107- Basic block identification
108- Lowering high-level constructs to TAC
109
110**Run:** `python 08_three_address_code.py`
111
112---
113
114### 9. `09_optimizer.py` - Local Optimizations
115**Concepts:**
116- Constant folding at compile time
117- Constant propagation through basic blocks
118- Algebraic simplification (identities like x*1=x, x+0=x)
119- Dead code elimination
120- Copy propagation
121- Redundant computation elimination
122- Data flow analysis and reaching definitions
123
124**Run:** `python 09_optimizer.py`
125
126---
127
128### 10. `10_bytecode_vm.py` - Bytecode Compiler and Virtual Machine
129**Concepts:**
130- Bytecode instruction set design
131- Bytecode compilation from AST
132- Stack-based virtual machine architecture
133- Operand stack and call stack (frames)
134- Local variable storage and function calls
135- Constant pool for literals
136- Instruction execution and stack machine simulation
137- Function parameters and return values
138
139**Run:** `python 10_bytecode_vm.py`
140
141---
142
143## Requirements
144
145- Python 3.8 or higher
146- Standard library only (no external dependencies)
147
148## Usage
149
150Each file is self-contained and can be run independently:
151
152```bash
153# Run a specific example
154python 01_lexer.py
155
156# Or run all examples sequentially
157for f in *.py; do echo "=== $f ==="; python "$f"; echo; done
158```
159
160## Learning Path
161
162Recommended order for learning the compilation process:
163
1641. **Frontend (Analysis):**
165 - 01_lexer.py (lexical analysis)
166 - 02_thompson_nfa.py → 03_subset_construction.py (automata theory)
167 - 04_recursive_descent_parser.py → 05_ll1_parser.py (parsing)
168 - 06_ast_visitor.py (AST representation)
169 - 07_type_checker.py (semantic analysis)
170
1712. **Middle-end (Intermediate Code):**
172 - 08_three_address_code.py (TAC generation)
173 - 09_optimizer.py (code optimization)
174
1753. **Backend (Code Generation):**
176 - 10_bytecode_vm.py (bytecode and execution)
177
178## Key Takeaways
179
180- **Lexical Analysis** tokenizes source code and handles the lexicon of a language
181- **Parsing** builds a hierarchical tree structure (AST) from a flat token stream
182- **Two approaches:** Recursive descent (top-down) and table-driven (bottom-up via LR/LL)
183- **Automata theory** (NFA/DFA) provides theoretical foundation for lexer design
184- **Semantic analysis** ensures type safety and correct variable scoping
185- **Intermediate representation** (TAC) is language and machine-independent
186- **Optimization** improves performance by eliminating redundant/unnecessary code
187- **Code generation** translates optimized IR into executable bytecode or native code
188- **Virtual machines** provide a portable execution platform independent of host CPU
189
190## Additional Resources
191
192- *Compilers: Principles, Techniques, and Tools* (Dragon Book) by Aho, Lam, Sethi, Ullman
193- *Crafting Interpreters* by Robert Nystrom (free online: https://craftinginterpreters.com/)
194- *Engineering a Compiler* by Cooper and Torczon
195- *Essentials of Compilation* by Siek (free lectures: https://www.youtube.com/playlist?list=PLOPRGayY_TV-cxQSuqklRqYfWgAbQv25x)