Branch Prediction
Branch Prediction¶
Overview¶
Branch prediction is a technique that improves pipeline performance by predicting the outcome of conditional branch instructions in advance. Modern processors achieve prediction accuracy of over 90%.
Table of Contents¶
- Control Hazards and Branch Problem
- Static Branch Prediction
- Dynamic Branch Prediction
- Branch Target Buffer
- Speculative Execution
- Practice Problems
1. Control Hazards and Branch Problem¶
Pipeline Bubbles from Branches¶
beq $t0, $t1, target
add $s0, $s1, $s2 # Should this execute?
sub $s3, $s4, $s5 # Should this execute?
...
target:
or $s6, $s7, $s8
Time: 1 2 3 4 5 6 7
beq: IF ID [EX] MEM WB
β
Branch decision
add: IF ID β May need flush
sub: IF β May need flush
Branch Penalty¶
Penalty based on decision timing:
ββββββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββ
β Decision Stage β Penalty β Note β
ββββββββββββββββββββΌβββββββββββββββββΌββββββββββββ€
β ID stage β 1 cycle β Early MIPSβ
β EX stage β 2 cycles β Typical β
β MEM stage β 3 cycles β Complex β
ββββββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββ
Modern processors: 10-20 cycle pipelines
β Branch prediction essential!
Necessity of Branch Prediction¶
Branch frequency: ~20% (1 in every 5 instructions)
Always stall without prediction:
CPI = 1 + 0.2 Γ 3 = 1.6 (assuming 3 cycle penalty)
90% accurate prediction:
CPI = 1 + 0.2 Γ 0.1 Γ 3 = 1.06
Performance improvement: 1.6/1.06 = 1.5x
2. Static Branch Prediction¶
2.1 Always Not Taken¶
Strategy: Predict all branches are not taken
Pros: Simple implementation
Cons: Poor performance on loops
for (i = 0; i < 100; i++) // 99 times taken, 1 time not taken
// Prediction accuracy: 1%
2.2 Always Taken¶
Strategy: Predict all branches are taken
Pros: Good for loops
Cons: Requires branch target address calculation
for (i = 0; i < 100; i++) // Prediction accuracy: 99%
2.3 BTFN (Backward Taken, Forward Not Taken)¶
Strategy:
- Backward branches (loops): Predict Taken
- Forward branches (if statements): Predict Not Taken
βββββββββββββββββββββββββββββββββββββββββββ
β Prediction by Branch Direction β
β β
β PC=100: beq label (label=80) β
β β β
β Backward branch β Predict Takenβ
β β
β PC=100: beq label (label=120) β
β β β
β Forward branch β Predict Not Takenβ
βββββββββββββββββββββββββββββββββββββββββββ
Loop end jumping back to start: Backward branch
if statement jumping to else: Forward branch
3. Dynamic Branch Prediction¶
3.1 1-bit Predictor¶
βββββββββββββββββββββββββββββββββββββββββββ
β 1-bit Predictor β
βββββββββββββββββββββββββββββββββββββββββββ€
β State: Taken(1) or Not Taken(0) β
β β
β Operation: β
β - Predict based on current state β
β - Flip state if wrong β
βββββββββββββββββββββββββββββββββββββββββββ
State transition:
Wrong Wrong
ββββββββββ ββββββββββ
β β β β
βΌ β βΌ β
ββββββ β ββββββ β
β NT βββββββ΄ββ T βββββββ
ββββββ Right ββββββ Right
Problem: Always wrong twice at loop start and end
Limitations of 1-bit Predictor¶
100 loop iterations:
T T T ... T T N (99 T, 1 N)
β Loop exit
Prediction: T T T ... T T T N
Actual: T T T ... T T N β
β Wrong (predicted T, actual N)
Next loop start:
Prediction: N β Wrong (predicted N, actual T)
2 mispredictions per 100 iterations β 98% accuracy
3.2 2-bit Predictor¶
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2-bit Saturating Counter β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 4 states: β
β - 00: Strongly Not Taken β
β - 01: Weakly Not Taken β
β - 10: Weakly Taken β
β - 11: Strongly Taken β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
State transition diagram:
N N N
ββββββββββ ββββββββββ ββββββββββ
β βΌ β βΌ β βΌ
ββββ΄βββ ββββ΄βββ ββββ΄βββ ββββ΄βββ
β SNT β β WNT β β WT β β ST β
β 00 β β 01 β β 10 β β 11 β
ββββ¬βββ ββββ¬βββ ββββ¬βββ ββββ¬βββ
β β² β β² β β²
ββββββββββ ββββββββββ ββββββββββ
T T T
Prediction: Taken if upper bit is 1, Not Taken if 0
2-bit Predictor Operation Example¶
100 loop iterations:
Actual: T T T ... T T N T T T ... (next loop)
State: ST ST ST ... ST WT ST ST ST ...
Pred: T T T ... T T T T T ...
Correct: β β β ... β β β β β ...
Only 1 misprediction! 1 failure per loop
99%+ accuracy
3.3 Branch History Table (BHT)¶
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Branch History Table (BHT) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β PC[9:2] ββββΆ βββββββββββ β
β β Index β β
β β 256 β β
β β entries β β
β ββββββ¬βββββ β
β β β
β βΌ β
β 2-bit counter β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Uses part of PC as index
- Each entry stores a 2-bit counter
- Multiple branches can map to same index (aliasing)
3.4 Correlating Predictor¶
Idea: Other branch outcomes affect current branch
Example code:
if (a == 2) // B1
a = 0;
if (b == 2) // B2
b = 0;
if (a != b) // B3: Correlated with B1, B2 outcomes!
...
(m, n) predictor:
- m: Record last m branch outcomes (Global History)
- n: n-bit counter
ββββββββββββββββββββββββββββββββββββββββββββ
β (2, 2) Correlating Predictor β
β β
β Global History: 2 bits (00, 01, 10, 11) β
β Per history: 2-bit counter β
β β
β Total of 4 2-bit counters β
ββββββββββββββββββββββββββββββββββββββββββββ
3.5 Tournament Predictor¶
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tournament Predictor β
β β
β βββββββββββββββ β
β PC ββββΆβ Selector β (2-bit counter) β
β ββββββββ¬βββββββ β
β β β
β ββββββββ΄βββββββ β
β βΌ βΌ β
β ββββββββββββββ ββββββββββββββ β
β β Local β β Global β β
β β Predictor β β Predictor β β
β ββββββββ¬ββββββ βββββββ¬βββββββ β
β β β β
β ββββββββ¬βββββββ β
β βΌ β
β MUX (select) β
β β β
β βΌ β
β Prediction β
ββββββββββββββββββββββββββββββββββββββββββββββββββ
- Local: Uses individual branch history
- Global: Uses overall branch history
- Selector: Learns which predictor is more accurate
4. Branch Target Buffer¶
BTB (Branch Target Buffer)¶
Stores whether instruction is a branch + target address
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Branch Target Buffer β
βββββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ€
β Tag β Target β Prediction (optional) β
βββββββββββββΌβββββββββββΌββββββββββββββββββββββββββ€
β 0x1000... β 0x2000 β ST β
β 0x1100... β 0x1500 β WT β
β 0x2000... β 0x1800 β SNT β
β ... β ... β ... β
βββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββ
BTB lookup in IF stage:
- Hit + Taken prediction: Jump to target immediately
- Miss: Normal execution, update BTB later
BTB Operation Flow¶
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β IF Stage β
β β
β PC ββββΆ BTB Lookup β
β β β
β βΌ β
β ββββββββββ β
β β Hit? β β
β ββββββ¬ββββ β
β β β
β No β Yes β
β βΌ β βΌ β
β PC + 4 β Taken prediction? β
β β β β
β β Yes β No β
β β βΌ β βΌ β
β β Target PC + 4 β
β β addr β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
5. Speculative Execution¶
Concept¶
Execute instructions on predicted path before branch result confirmed
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Speculative Execution β
β β
β beq prediction: Taken β
β β β
β βΌ β
β Execute target instructions speculatively β
β β β
β βΌ β
β Branch result confirmed β
β β β
β ββββ΄βββ β
β β β β
β Correct Wrong β
β prediction prediction β
β β β β
β βΌ βΌ β
β Commit Discard β
β result (flush) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Misprediction Recovery¶
On misprediction:
1. Pipeline flush (remove speculative instructions)
2. Restore register state
3. Re-execute on correct path
Recovery cost:
- Proportional to pipeline depth
- Modern processors: 10-20 cycle penalty
Impact of Misprediction Rate¶
CPI = CPI_base + (branch ratio) Γ (misprediction rate) Γ (penalty)
Example:
- CPI_base = 1
- Branch ratio = 20%
- Penalty = 15 cycles
ββββββββββββββββββ¬βββββββββββββββββββ
β Misprediction β CPI β
β Rate β β
ββββββββββββββββββΌβββββββββββββββββββ€
β 10% β 1 + 0.2Γ0.1Γ15 β
β β = 1.30 β
ββββββββββββββββββΌβββββββββββββββββββ€
β 5% β 1 + 0.2Γ0.05Γ15 β
β β = 1.15 β
ββββββββββββββββββΌβββββββββββββββββββ€
β 2% β 1 + 0.2Γ0.02Γ15 β
β β = 1.06 β
ββββββββββββββββββΌβββββββββββββββββββ€
β 1% β 1 + 0.2Γ0.01Γ15 β
β β = 1.03 β
ββββββββββββββββββ΄βββββββββββββββββββ
6. Practice Problems¶
Basic Problems¶
-
What are the 3 static branch prediction methods?
-
What are the 4 states of a 2-bit predictor?
-
For the sequence below, what are the predictions of a 1-bit predictor (initial state T)?
T T N T T N T T
Analysis Problems¶
- How does a 2-bit predictor (initial ST) behave for this loop?
for (int i = 0; i < 4; i++) {
// loop body
}
- With 25% branch ratio, 10 cycle penalty, and 95% prediction accuracy, what is the CPI?
Advanced Problems¶
-
Explain why tournament predictors are better than single predictors.
-
Compare branch handling with and without BTB.
Answers
1. Always Not Taken, Always Taken, BTFN (Backward Taken, Forward Not Taken) 2. Strongly Not Taken (00), Weakly Not Taken (01), Weakly Taken (10), Strongly Taken (11) 3.Actual: T T N T T N T T
Pred: T T T N T T N T
State: T T N T T N T T
Correct: β β β β β β β β (4/8 = 50%)
Iterations: T T T N (4 iterations)
State: ST ST ST WT WT
Pred: T T T T T
Correct: β β β β
Next Steps¶
- 13_Superscalar_Out_of_Order.md - ILP and Out-of-Order Execution
References¶
- Computer Architecture: A Quantitative Approach, Chapter 3 (Hennessy & Patterson)
- Branch Prediction Competition
- Agner Fog's Microarchitecture Guide