Pipelining
Pipelining¶
Overview¶
Pipelining is a technique that increases CPU throughput by executing multiple instructions simultaneously. Like the washer-dryer analogy, the next task starts before the current one finishes, improving overall efficiency.
Table of Contents¶
- Pipelining Concept
- 5-Stage Pipeline
- Pipeline Performance
- Pipeline Hazards
- Hazard Resolution Techniques
- Practice Problems
1. Pipelining Concept¶
Basic Idea¶
Non-pipelined (Sequential Execution):
βββββββ βββββββ βββββββ
β I1 ββββββΆβ I2 ββββββΆβ I3 β
βββββββ βββββββ βββββββ
5ns 5ns 5ns Total 15ns
Pipelined (Parallel Execution):
Time: 1ns 2ns 3ns 4ns 5ns 6ns 7ns
I1: [IF]β[ID]β[EX]β[MEM]β[WB]
I2: [IF]β[ID]β[EX]β[MEM]β[WB]
I3: [IF]β[ID]β[EX]β[MEM]β[WB]
Total 7ns
Laundry Analogy¶
Non-pipelined:
Wash1 βββΆ Dry1 βββΆ Wash2 βββΆ Dry2 βββΆ Wash3 βββΆ Dry3
Pipelined:
Time: 1 2 3 4 5 6
Wash1 β
Dry1 β
Wash2 β
Dry2 β
Wash3 β
Dry3 β
3x faster!
2. 5-Stage Pipeline¶
MIPS 5-Stage Pipeline¶
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 5-Stage Pipeline β
βββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ β
β IF β ID β EX β MEM β WB β β
β(Fetch)β(Decodeβ(Exec) β(Memoryβ(Write β β
β β β βAccess)β Back) β β
βββββββββ΄ββββββββ΄ββββββββ΄ββββββββ΄ββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Stage Descriptions¶
| Stage | Name | Operation |
|---|---|---|
| IF | Instruction Fetch | Fetch instruction from memory at PC |
| ID | Instruction Decode | Decode instruction, read registers |
| EX | Execute | Perform ALU operation, address calculation |
| MEM | Memory Access | Read/write memory (load/store) |
| WB | Write Back | Store result to register |
Pipeline Registers¶
βββββββ βββββββββββ βββββββββββ ββββββββββββ βββββββββββ
β IF ββββΆβ IF/ID ββββΆβ ID/EX ββββΆβ EX/MEM ββββΆβ MEM/WB β
βββββββ βRegister β βRegister β βRegister β βRegister β
βββββββββββ βββββββββββ ββββββββββββ βββββββββββ
Pipeline Registers: Temporarily store data between stages
- Data is passed to the next stage every clock cycle
Pipeline Usage by Instruction Type¶
Stage usage by instruction type:
R-type (add, sub):
IF ββΆ ID ββΆ EX ββΆ --- ββΆ WB
Load (lw):
IF ββΆ ID ββΆ EX ββΆ MEM ββΆ WB
Store (sw):
IF ββΆ ID ββΆ EX ββΆ MEM ββΆ ---
Branch (beq):
IF ββΆ ID ββΆ EX ββΆ --- ββΆ ---
3. Pipeline Performance¶
Ideal Speedup¶
Speedup = Number of pipeline stages (ideal)
5-stage pipeline β Maximum 5x speedup
In practice:
- Pipeline fill/drain time
- Stalls due to hazards
- Stage imbalance
Throughput Calculation¶
Throughput = Number of instructions / Time
Non-pipelined:
- 1 instruction / 5 cycles
Pipelined (ideal):
- 1 instruction / 1 cycle (full pipeline state)
CPI (Cycles Per Instruction):
- Ideal: CPI = 1
- Actual: CPI = 1 + stall cycles
Example: Executing 100 Instructions¶
Non-pipelined:
Time = 100 Γ 5 = 500 cycles
5-stage pipeline:
Time = 5 + (100 - 1) = 104 cycles
βfirst instr. βremaining instructions
Speedup = 500 / 104 β 4.8x
4. Pipeline Hazards¶
Hazard Types¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Pipeline Hazards β
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββββββ€
β Structural β Data β Control β
β Hazard β Hazard β Hazard β
βββββββββββββββββββΌββββββββββββββββββΌββββββββββββββββββββββ€
β Hardware β Problems due β Problems due β
β resource β to data β to branch β
β conflicts β dependencies β instructions β
βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββββββ
4.1 Structural Hazards¶
Problem: Trying to use the same hardware resource simultaneously
Example: Single memory usage
Cycle 4:
- Instruction 1: MEM stage (data memory access)
- Instruction 4: IF stage (instruction memory access)
I1: IFβIDβEXβMEMβWB
I4: IF β Conflict!
Solution: Harvard architecture (separate instruction/data memory)
4.2 Data Hazards¶
Three types:
1. RAW (Read After Write) - Most common
add $s0, $t0, $t1 # Write to $s0
sub $t2, $s0, $t3 # Read $s0 β Not written yet!
2. WAR (Write After Read)
sub $t2, $s0, $t3 # Read $s0
add $s0, $t0, $t1 # Write to $s0
3. WAW (Write After Write)
add $s0, $t0, $t1 # Write to $s0
sub $s0, $t2, $t3 # Write to $s0
RAW Hazard Example¶
add $s0, $t0, $t1
sub $t2, $s0, $t3
Time: 1 2 3 4 5 6 7
add: IF ID EX MEM [WB] β $s0 written
sub: IF ID [EX] β $s0 needed!
β
Problem occurs!
$s0 is written in cycle 5,
but sub needs $s0 in cycle 4
4.3 Control Hazards¶
beq $t0, $t1, target # Branch decision
add $t2, $t3, $t4 # Should this execute?
sub $t5, $t6, $t7 # Should this execute?
Time: 1 2 3 4 5
beq: IF ID [EX] β Branch decided
add: IF ID β Wrong instruction?
sub: IF β Wrong instruction?
Next instructions already entered pipeline before branch decision
5. Hazard Resolution Techniques¶
5.1 Stalling¶
Pause pipeline by inserting bubbles (NOPs)
add $s0, $t0, $t1
sub $t2, $s0, $t3
Time: 1 2 3 4 5 6 7 8 9
add: IF ID EX MEM WB
--- stall ---
--- stall ---
sub: IF ID EX MEM WB
2 cycle stall occurs β Performance degradation
5.2 Forwarding/Bypassing¶
Pass ALU result directly to next instruction
add $s0, $t0, $t1
sub $t2, $s0, $t3
Time: 1 2 3 4 5 6
add: IF ID [EX] MEM WB
β
βββββββΆ Forwarding
sub: IF ID [EX] MEM WB
β
Use $s0 value
Execution possible without stalls!
Forwarding Paths¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Forwarding Unit β
β β
β EX/MEM.ALUResult ββββββββββββββββ β
β βΌ β
β MEM/WB.ALUResult ββββββββββββββΆ MUX βββΆ ALU input β
β β² β
β ID/EX.RegisterRs ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Forwarding conditions:
1. EX/MEM.RegisterRd == ID/EX.RegisterRs
2. MEM/WB.RegisterRd == ID/EX.RegisterRs
5.3 Load-Use Hazard¶
Cases that cannot be resolved by forwarding:
lw $s0, 0($t0) # Load from memory
add $t2, $s0, $t3 # Use immediately
Time: 1 2 3 4 5 6 7
lw: IF ID EX [MEM] WB
β
βββββΆ Data available
add: IF ID stall [EX] MEM WB
β
Need data but not available yet
1 cycle stall mandatory (Load-Use Stall)
5.4 Branch Prediction¶
Static Prediction:
- Always Not Taken: Predict no branch
- Always Taken: Predict branch taken
- BTFN: Backward Taken, Forward Not Taken
Dynamic Prediction:
- Predict based on branch history
- Covered in detail in next lesson
5.5 Delayed Branch¶
Place always-executed instruction in slot after branch instruction
beq $t0, $t1, target
add $t2, $t3, $t4 # Delay slot (always executed)
...
target:
sub $t5, $t6, $t7
Compiler places branch-independent instruction in delay slot
6. Practice Problems¶
Basic Problems¶
-
What are the names and roles of each stage in a 5-stage pipeline?
-
How many cycles are needed to execute 100 instructions in a 5-stage pipeline? (Assume no hazards)
-
Which of the following is NOT a type of data hazard?
- (a) RAW
- (b) RAR
- (c) WAR
- (d) WAW
Hazard Analysis¶
- Find the data hazards in the following code:
add $s0, $t0, $t1
sub $s1, $s0, $t2
and $s2, $s0, $s1
- Which hazard cannot be resolved by forwarding?
lw $s0, 0($t0)
add $t1, $s0, $t2
Performance Calculation¶
- If 30% of 1000 instructions are branches, branch misprediction rate is 20%, and misprediction penalty is 3 cycles, what is the CPI?
Answers
1. IF (Fetch), ID (Decode), EX (Execute), MEM (Memory Access), WB (Write Back) 2. 5 + (100 - 1) = 104 cycles 3. (b) RAR - Read After Read is not a hazard 4. - add β sub: RAW on $s0 - add β and: RAW on $s0 - sub β and: RAW on $s1 5. Load-Use hazard. Data is only available after MEM stage of lw, so 1 cycle stall is required 6. - Number of branch instructions: 1000 Γ 0.3 = 300 - Mispredictions: 300 Γ 0.2 = 60 - Penalty cycles: 60 Γ 3 = 180 - CPI = 1 + 180/1000 = 1.18Next Steps¶
- 12_Branch_Prediction.md - Dynamic Branch Prediction Techniques
References¶
- Computer Organization and Design, Chapter 4 (Patterson & Hennessy)
- Pipeline Visualization
- MIPS Pipeline Simulator