README.md

Download
markdown 196 lines 6.2 KB
  1# Compiler Design Examples
  2
  3This directory contains 10 Python examples demonstrating key compiler design concepts, from lexical analysis through bytecode generation and execution. All examples use only Python's standard library (no external dependencies).
  4
  5## Files Overview
  6
  7### 1. `01_lexer.py` - Lexical Analysis (Scanner)
  8**Concepts:**
  9- Token types and Token representation (dataclass)
 10- Regular expressions for token pattern matching
 11- Keyword vs identifier recognition
 12- String and character literal scanning
 13- Line/column tracking for error reporting
 14- Whitespace and comment handling
 15
 16**Run:** `python 01_lexer.py`
 17
 18---
 19
 20### 2. `02_thompson_nfa.py` - Thompson's Construction (Regex to NFA)
 21**Concepts:**
 22- Converting regular expressions to Non-deterministic Finite Automata (NFA)
 23- Infix-to-postfix conversion using Shunting Yard algorithm
 24- Thompson's construction rules for operators: |, *, +, ?, concatenation
 25- NFA state and transition representation
 26- Epsilon-closure computation
 27- NFA simulation and string acceptance
 28
 29**Run:** `python 02_thompson_nfa.py`
 30
 31---
 32
 33### 3. `03_subset_construction.py` - NFA to DFA Conversion
 34**Concepts:**
 35- Subset construction (powerset construction) algorithm
 36- Converting NFA to equivalent Deterministic Finite Automaton (DFA)
 37- Epsilon-closure and move operations
 38- DFA state minimization concepts
 39- Transition table representation
 40- Comparing NFA vs DFA simulation performance
 41
 42**Run:** `python 03_subset_construction.py`
 43
 44---
 45
 46### 4. `04_recursive_descent_parser.py` - Recursive Descent Parser
 47**Concepts:**
 48- Hand-written recursive descent parsing
 49- Abstract Syntax Tree (AST) construction
 50- Grammar representation (left-recursion removed)
 51- Operator precedence and associativity
 52- Control flow statements: if/else, while loops
 53- Expression evaluation with operator precedence
 54- Error handling and recovery
 55
 56**Run:** `python 04_recursive_descent_parser.py`
 57
 58---
 59
 60### 5. `05_ll1_parser.py` - LL(1) Table-Driven Parser
 61**Concepts:**
 62- Context-free grammar (CFG) representation
 63- FIRST set computation
 64- FOLLOW set computation
 65- LL(1) parsing table construction
 66- Table-driven parsing with explicit stack
 67- Conflict detection in parsing tables
 68- Parsing ambiguous vs unambiguous grammars
 69
 70**Run:** `python 05_ll1_parser.py`
 71
 72---
 73
 74### 6. `06_ast_visitor.py` - AST and Visitor Pattern
 75**Concepts:**
 76- Abstract Syntax Tree node classes (dataclass-based)
 77- Visitor design pattern for tree traversal
 78- Multiple operations on same AST: evaluation, type checking, pretty-printing
 79- Type inference for literals and operators
 80- Expression evaluation with different data types
 81- Separation of structure from operations
 82
 83**Run:** `python 06_ast_visitor.py`
 84
 85---
 86
 87### 7. `07_type_checker.py` - Semantic Analysis and Type Checking
 88**Concepts:**
 89- Symbol table management with nested scopes
 90- Variable declaration and scope resolution
 91- Function definitions with typed parameters and return types
 92- Type compatibility checking for assignments
 93- Function call validation (arity and argument types)
 94- Return type matching
 95- Error detection and reporting
 96
 97**Run:** `python 07_type_checker.py`
 98
 99---
100
101### 8. `08_three_address_code.py` - Intermediate Code Generation
102**Concepts:**
103- Three-Address Code (TAC) representation
104- Control Flow Graph (CFG) construction
105- TAC instruction types: ASSIGN, BINOP, UNARY, LABEL, JUMP, CJUMP
106- Converting AST to intermediate representation
107- Basic block identification
108- Lowering high-level constructs to TAC
109
110**Run:** `python 08_three_address_code.py`
111
112---
113
114### 9. `09_optimizer.py` - Local Optimizations
115**Concepts:**
116- Constant folding at compile time
117- Constant propagation through basic blocks
118- Algebraic simplification (identities like x*1=x, x+0=x)
119- Dead code elimination
120- Copy propagation
121- Redundant computation elimination
122- Data flow analysis and reaching definitions
123
124**Run:** `python 09_optimizer.py`
125
126---
127
128### 10. `10_bytecode_vm.py` - Bytecode Compiler and Virtual Machine
129**Concepts:**
130- Bytecode instruction set design
131- Bytecode compilation from AST
132- Stack-based virtual machine architecture
133- Operand stack and call stack (frames)
134- Local variable storage and function calls
135- Constant pool for literals
136- Instruction execution and stack machine simulation
137- Function parameters and return values
138
139**Run:** `python 10_bytecode_vm.py`
140
141---
142
143## Requirements
144
145- Python 3.8 or higher
146- Standard library only (no external dependencies)
147
148## Usage
149
150Each file is self-contained and can be run independently:
151
152```bash
153# Run a specific example
154python 01_lexer.py
155
156# Or run all examples sequentially
157for f in *.py; do echo "=== $f ==="; python "$f"; echo; done
158```
159
160## Learning Path
161
162Recommended order for learning the compilation process:
163
1641. **Frontend (Analysis):**
165   - 01_lexer.py (lexical analysis)
166   - 02_thompson_nfa.py → 03_subset_construction.py (automata theory)
167   - 04_recursive_descent_parser.py → 05_ll1_parser.py (parsing)
168   - 06_ast_visitor.py (AST representation)
169   - 07_type_checker.py (semantic analysis)
170
1712. **Middle-end (Intermediate Code):**
172   - 08_three_address_code.py (TAC generation)
173   - 09_optimizer.py (code optimization)
174
1753. **Backend (Code Generation):**
176   - 10_bytecode_vm.py (bytecode and execution)
177
178## Key Takeaways
179
180- **Lexical Analysis** tokenizes source code and handles the lexicon of a language
181- **Parsing** builds a hierarchical tree structure (AST) from a flat token stream
182- **Two approaches:** Recursive descent (top-down) and table-driven (bottom-up via LR/LL)
183- **Automata theory** (NFA/DFA) provides theoretical foundation for lexer design
184- **Semantic analysis** ensures type safety and correct variable scoping
185- **Intermediate representation** (TAC) is language and machine-independent
186- **Optimization** improves performance by eliminating redundant/unnecessary code
187- **Code generation** translates optimized IR into executable bytecode or native code
188- **Virtual machines** provide a portable execution platform independent of host CPU
189
190## Additional Resources
191
192- *Compilers: Principles, Techniques, and Tools* (Dragon Book) by Aho, Lam, Sethi, Ullman
193- *Crafting Interpreters* by Robert Nystrom (free online: https://craftinginterpreters.com/)
194- *Engineering a Compiler* by Cooper and Torczon
195- *Essentials of Compilation* by Siek (free lectures: https://www.youtube.com/playlist?list=PLOPRGayY_TV-cxQSuqklRqYfWgAbQv25x)