Lesson 08: Verification and Validation
Lesson 08: Verification and Validation¶
Previous: 07. Software Quality Assurance | Next: 09. Configuration Management
"Are we building the product right?" and "Are we building the right product?" These two questions, posed by Barry Boehm in 1979, capture the essence of Verification and Validation (V&V). Together they form the quality backbone of software engineering — a systematic, disciplined approach to ensuring that software is both technically correct and genuinely useful. This lesson surveys the full spectrum of V&V techniques, from unit tests to formal proofs, and equips you to design a comprehensive test strategy for real-world projects.
Difficulty: ⭐⭐⭐
Prerequisites: - Software Quality Assurance fundamentals (Lesson 07) - Basic programming and unit testing experience - Familiarity with software development lifecycle (Lesson 02)
Learning Objectives: - Distinguish verification from validation and articulate why both are necessary - Design test cases using black-box techniques (equivalence partitioning, boundary value analysis, decision tables, state transition) - Select appropriate coverage criteria for white-box testing - Plan a test strategy that spans unit, integration, system, and acceptance testing - Understand the bug lifecycle and write effective defect reports - Evaluate the role of formal methods in high-assurance software - Build a regression test suite and integrate it into automated pipelines
Table of Contents¶
- Verification vs Validation
- Testing Levels
- Testing Types
- Black-Box Testing Techniques
- White-Box Testing Techniques
- Test Planning and Documentation
- Test-Driven Development
- Regression Testing and Test Automation
- Reviews and Inspections
- Formal Verification
- The Bug Lifecycle
- Summary
- Practice Exercises
- Further Reading
1. Verification vs Validation¶
1.1 The Core Distinction¶
| Verification | Validation | |
|---|---|---|
| Question | Are we building the product right? | Are we building the right product? |
| Reference | Specification, design documents | User needs, business goals |
| Activities | Reviews, inspections, static analysis, testing against spec | User acceptance testing, beta testing, prototyping |
| Phase | Throughout development | Primarily at milestones and delivery |
| Finds | Does the system conform to its specification? | Does the specification reflect what users actually need? |
Both are essential. A system that perfectly implements its specification but fails to meet user needs is a validated failure — the specification was wrong. A system that meets user needs but is built without verification is a pile of technical debt and latent defects.
1.2 The V-Model¶
The V-Model maps development phases (left leg of the V) to corresponding test phases (right leg), making the verification/validation relationship explicit:
Requirements Analysis ──────────────────────── Acceptance Testing
│ │
System Design ──────────────────── System Testing │
│ │ │
Architecture Design ──── Integration Testing │
│ │ │
Detailed Design ─── Unit Testing │
│ │ │
Coding ────────────────┘ │
│ │
└──────── Development ──────── Testing ────────┘
Each test phase validates the output of the corresponding development phase. Acceptance testing validates requirements; system testing validates system design; integration testing validates architecture; unit testing validates detailed design.
1.3 V&V Independence¶
The IEEE standard on V&V (IEEE Std 1012) distinguishes between:
- Independent V&V (IV&V): performed by a group with no stake in the development outcome — often a separate organization. Required for safety-critical systems (medical devices, aerospace, nuclear).
- Non-independent V&V: performed by the development team or organization. Suitable for most commercial software.
Studies show that IV&V catches a significantly higher percentage of defects because reviewers have no blind spots from writing the code.
2. Testing Levels¶
Testing is organized into four hierarchical levels, each with a distinct scope and purpose.
2.1 Unit Testing¶
Scope: A single function, method, or class — the smallest testable unit.
Who: The developer who wrote the code (or pair partner in TDD).
Goal: Verify that individual units behave correctly in isolation.
Characteristics: - Fast (milliseconds per test) - No external dependencies (databases, network, file system are mocked) - Highly focused; a failing test points to a specific function
# pytest example: testing a utility function in isolation
from decimal import Decimal
import pytest
from pricing import apply_discount
class TestApplyDiscount:
def test_percentage_discount(self):
price = Decimal("100.00")
result = apply_discount(price, discount_pct=10)
assert result == Decimal("90.00")
def test_zero_discount(self):
price = Decimal("50.00")
result = apply_discount(price, discount_pct=0)
assert result == Decimal("50.00")
def test_hundred_percent_discount(self):
result = apply_discount(Decimal("75.00"), discount_pct=100)
assert result == Decimal("0.00")
def test_negative_discount_raises(self):
with pytest.raises(ValueError, match="discount must be non-negative"):
apply_discount(Decimal("100.00"), discount_pct=-5)
def test_discount_above_100_raises(self):
with pytest.raises(ValueError, match="discount cannot exceed 100"):
apply_discount(Decimal("100.00"), discount_pct=110)
2.2 Integration Testing¶
Scope: Interactions between units, modules, or subsystems.
Goal: Verify that components work correctly together — catch interface mismatches, contract violations, and integration-level bugs.
Approaches:
| Approach | Description | Pros | Cons |
|---|---|---|---|
| Big Bang | Integrate everything at once, then test | Simple to set up | Hard to locate failures |
| Top-Down | Integrate from top-level modules downward; stub lower modules | Tests high-level logic early | Stubs can hide lower-level bugs |
| Bottom-Up | Integrate from lower modules upward; use test drivers | Tests real lower-level behavior early | High-level logic tested late |
| Sandwich (Hybrid) | Top-down and bottom-up simultaneously | Balances both | More complex planning |
| Incremental | Integrate one component at a time | Failures easy to locate | More effort planning integration order |
2.3 System Testing¶
Scope: The complete, integrated system as a whole.
Goal: Validate that the entire system meets its requirements — functional and non-functional.
System testing includes both functional tests (does it do the right thing?) and non-functional tests (is it fast enough? secure enough? available enough?). It is typically performed by a dedicated QA team, not the developers.
2.4 Acceptance Testing¶
Scope: The complete system from the user's perspective.
Goal: Validate that the system meets user needs and business requirements.
| Type | Who performs it | Purpose |
|---|---|---|
| User Acceptance Testing (UAT) | End users or their representatives | Confirm the system works for real tasks |
| Alpha Testing | Internal users (company employees outside dev team) | Find bugs before external release |
| Beta Testing | Selected external users | Find bugs under real-world conditions |
| Contract Acceptance Testing | Customer, per contract terms | Verify contractual obligations |
| Regulation Acceptance Testing | Regulatory authority | Verify compliance with regulations |
3. Testing Types¶
Testing types cut across levels and classify tests by what property they verify.
3.1 Functional Testing¶
Verifies that the system does what it should do — checks features against requirements.
- Smoke testing: a quick set of tests that verify the system can start and perform basic operations. Run after every build to decide if deeper testing is warranted.
- Sanity testing: a subset of regression testing to verify a specific fix works.
- Feature testing: systematic testing of all features described in requirements.
3.2 Non-Functional Testing¶
| Type | What it measures | Key question |
|---|---|---|
| Performance testing | Speed, throughput, resource usage | Can it handle the expected load? |
| Load testing | Behavior under expected peak load | Does it degrade gracefully at peak? |
| Stress testing | Behavior under extreme or unexpected load | Where does it break? How does it fail? |
| Soak/endurance testing | Behavior over extended time at normal load | Are there memory leaks or slow degradation? |
| Scalability testing | How capacity grows with added resources | Does performance scale linearly with servers? |
| Security testing | Resistance to attacks | Is it vulnerable to OWASP Top 10? |
| Usability testing | Ease of use | Can users accomplish tasks without confusion? |
| Accessibility testing | Compliance with WCAG | Can users with disabilities use the system? |
| Compatibility testing | Operation across environments | Does it work on Chrome, Firefox, Safari, iOS, Android? |
| Recovery testing | Behavior after failure | Does it recover correctly from a crash? |
3.3 Performance Testing Example¶
# locust: load testing tool for web applications
from locust import HttpUser, task, between
class WebsiteUser(HttpUser):
wait_time = between(1, 3) # simulate 1-3 seconds think time
@task(3) # weight: this task runs 3x more often than weight-1 tasks
def browse_products(self):
self.client.get("/api/products?page=1&limit=20")
@task(1)
def view_product_detail(self):
product_id = 42
self.client.get(f"/api/products/{product_id}")
@task(1)
def add_to_cart(self):
self.client.post("/api/cart/items", json={
"product_id": 42,
"quantity": 1
})
# Run: locust -f locustfile.py --headless -u 1000 -r 100 --host http://localhost:8000
# -u 1000: 1000 concurrent users
# -r 100: ramp up at 100 users/second
4. Black-Box Testing Techniques¶
Black-box techniques derive test cases from the specification without knowledge of the internal implementation. They answer: "What inputs should I try?"
4.1 Equivalence Partitioning¶
Divide the input domain into classes where the system should behave the same way for all members. Test one value from each class.
Example: A function that accepts age (integer) to determine a pricing tier. - Valid ages: 0–12 (child), 13–17 (teen), 18–64 (adult), 65+ (senior) - Invalid: negative, non-integer, null
Partition Representative Value Expected Result
───────────────────────────────────────────────────────────────
Valid: child (0–12) 8 "child" tier
Valid: teen (13–17) 15 "teen" tier
Valid: adult (18–64) 30 "adult" tier
Valid: senior (65+) 70 "senior" tier
Invalid: negative -1 ValueError
Invalid: null None TypeError
Equivalence partitioning reduces the test space from infinite to manageable while maintaining coverage of distinct behaviors.
4.2 Boundary Value Analysis¶
Bugs cluster at the boundaries of equivalence classes. Test values at, just below, and just above each boundary.
Using the age example:
Boundary Values to test
────────────────────────────────────────
0 (lower bound of child) -1, 0, 1
12/13 (child/teen boundary) 12, 13
17/18 (teen/adult boundary) 17, 18
64/65 (adult/senior bound.) 64, 65
Max (e.g., 150) 149, 150, 151
Why boundaries matter: Off-by-one errors (< vs <=, > vs >=) are among the most common programming mistakes, and they only manifest at boundaries.
4.3 Decision Table Testing¶
Decision tables systematically enumerate combinations of conditions and their corresponding actions. They prevent "combination blindness" — missing the interaction between two conditions.
Example: A loan approval system with three conditions.
| R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | |
|---|---|---|---|---|---|---|---|---|
| Credit score > 700 | T | T | T | T | F | F | F | F |
| Income > $50k | T | T | F | F | T | T | F | F |
| Debt ratio < 40% | T | F | T | F | T | F | T | F |
| Action | Approve | Approve with review | Approve with review | Reject | Approve with review | Reject | Reject | Reject |
Each column (rule) becomes a test case. Collapsed rules (same action for multiple conditions) can be merged, but each distinct action needs at least one test.
4.4 State Transition Testing¶
For systems with distinct states, derive test cases from the state machine. Test valid transitions, invalid transitions, and boundary states.
Example: An order state machine.
┌──────────────────────────────────────────────┐
│ │
[New] ──── pay ────▶ [Paid] ──── ship ───▶ [Shipped]
│ │ │
cancel cancel deliver
│ │ │
▼ ▼ ▼
[Cancelled] [Cancelled] [Delivered]
│
return
│
▼
[Returned]
Test cases to cover: 1. New → pay → Paid (valid) 2. New → cancel → Cancelled (valid) 3. Paid → ship → Shipped (valid) 4. Paid → cancel → Cancelled (valid) 5. Shipped → deliver → Delivered (valid) 6. Delivered → return → Returned (valid) 7. New → ship (invalid transition — should be rejected) 8. Cancelled → pay (invalid transition from terminal state)
5. White-Box Testing Techniques¶
White-box (structural) techniques derive test cases from the source code's internal structure. They measure coverage — how much of the code is exercised.
5.1 Statement Coverage¶
Every executable statement is executed at least once.
def classify(x):
result = "unknown" # Statement 1
if x > 0: # Statement 2
result = "positive" # Statement 3
elif x < 0: # Statement 4
result = "negative" # Statement 5
else:
result = "zero" # Statement 6
return result # Statement 7
For 100% statement coverage, need at least 3 test cases: x=1, x=-1, x=0.
Weakness: Statement coverage can be achieved without testing false branches.
5.2 Branch Coverage (Decision Coverage)¶
Every branch from every decision point is taken at least once (both the true and false outcome of every if, while, for).
Branch coverage subsumes statement coverage: 100% branch coverage implies 100% statement coverage, but not vice versa.
def validate_age(age):
if age is None: # Branch: True (age is None), False (age is not None)
return False
if age < 0 or age > 150: # Branch: True, False; compound condition
return False
return True
For 100% branch coverage:
- validate_age(None) — True branch of first if
- validate_age(25) — False branch of first if, False branch of second if
- validate_age(-1) — True branch of second if
5.3 Condition Coverage¶
Every boolean sub-condition (predicate) evaluates to both True and False independently.
For compound conditions like age < 0 or age > 150:
- Need age < 0 to be True and False
- Need age > 150 to be True and False
5.4 Path Coverage¶
Every distinct path through the code is executed. For a function with n independent decision points, there are up to 2^n paths.
Path coverage is the strongest criterion but is usually impractical for real functions (exponential explosion). It is used selectively for safety-critical modules.
5.5 Coverage Summary¶
Coverage Criterion Strength Practical Use
──────────────────────────────────────────────────────────────
Statement Weakest Minimum acceptable (80–90%)
Branch Moderate Standard for most projects
Condition Stronger Security-critical code
MC/DC* Strong DO-178C (avionics), safety systems
Path Strongest Impractical except for small units
*MC/DC = Modified Condition/Decision Coverage, required by FAA for avionics software.
5.6 Measuring Coverage in Practice¶
# Python: pytest + coverage
pip install pytest pytest-cov
pytest --cov=src --cov-report=html --cov-fail-under=80
# Output:
# Name Stmts Miss Cover
# ────────────────────────────────────────
# src/pricing.py 45 3 93%
# src/inventory.py 72 18 75%
# src/checkout.py 98 12 88%
# ────────────────────────────────────────
# TOTAL 215 33 85%
# JavaScript: Jest
jest --coverage --coverageThreshold='{"global":{"branches":80,"lines":80}}'
6. Test Planning and Documentation¶
6.1 The Test Plan¶
A test plan (IEEE Std 829) documents the scope, approach, resources, and schedule for testing. Key sections:
| Section | Content |
|---|---|
| Test Scope | What features/components are in scope and out of scope |
| Test Approach | Testing levels, types, techniques to be used |
| Entry/Exit Criteria | When testing begins and when it is complete |
| Test Environment | Hardware, OS, browsers, network configuration |
| Resources | Who performs which tests; tools required |
| Schedule | Timeline for each testing phase |
| Risk and Contingency | Risks to the test effort and mitigation plans |
| Deliverables | Test cases, test data, defect reports, test summary report |
Entry criteria example: - All code for the sprint is merged and passing CI - Unit test coverage ≥ 80% - No open critical/high defects from the previous cycle
Exit criteria example: - All planned test cases executed - No open critical or high-severity defects - Defect removal efficiency ≥ 90% - Performance test results within 10% of targets
6.2 Writing Test Cases¶
A good test case is atomic, independent, and reproducible.
Test Case ID: TC-CHECKOUT-042
Title: Cart checkout fails gracefully when payment service is down
Preconditions: - User is logged in
- Cart has 2 items totaling $59.98
- Payment service mock is configured to return 503
Feature: Checkout / Payment
Test Data: User: testuser@example.com, Cart: [item_id:1, item_id:7]
Steps:
1. Navigate to /cart
2. Click "Proceed to Checkout"
3. Enter valid credit card: 4111 1111 1111 1111
4. Click "Place Order"
Expected Result: System displays "Payment service temporarily unavailable.
Your cart has been saved. Please try again in a few minutes."
No order is created in the database.
Cart contents are preserved.
Actual Result: (filled in during test execution)
Status: Pass / Fail / Blocked
Severity: High
Priority: High
Author: J. Smith
Date: 2024-03-15
6.3 Test Data Management¶
Good test data is: - Representative: covers all equivalence partitions - Reproducible: the same data produces the same result - Isolated: tests do not share mutable state - Anonymized: uses synthetic data, not real production data
# Using factories for reproducible test data (Factory Boy library)
import factory
from factory.django import DjangoModelFactory
from myapp.models import User, Order
class UserFactory(DjangoModelFactory):
class Meta:
model = User
username = factory.Sequence(lambda n: f"user_{n}")
email = factory.LazyAttribute(lambda obj: f"{obj.username}@example.com")
is_active = True
class OrderFactory(DjangoModelFactory):
class Meta:
model = Order
user = factory.SubFactory(UserFactory)
status = "new"
total = factory.fuzzy.FuzzyDecimal(10.00, 500.00, precision=2)
7. Test-Driven Development¶
Test-Driven Development (TDD) inverts the traditional workflow: tests are written before the code they test.
7.1 The Red-Green-Refactor Cycle¶
┌─────────────────────────────────────────────┐
│ │
▼ │
RED: Write a failing test │
(the test describes desired behavior) │
│ │
▼ │
GREEN: Write the minimum code to pass the test │
(no more, no less) │
│ │
▼ │
REFACTOR: Clean up the code │
(the test suite ensures you didn't break │
anything) │
│ │
└─────────────────────────────────────────────┘
7.2 TDD Benefits and Costs¶
| Benefit | Explanation |
|---|---|
| Tests as specification | Tests document exactly what the code should do |
| Design pressure | Hard-to-test code usually has poor design; TDD reveals this early |
| Confidence to refactor | Green test suite proves refactoring didn't break anything |
| Regression safety net | Every bug fix gets a test that prevents recurrence |
| Cost | Mitigation |
|---|---|
| Slower initial development | Offset by reduced debugging time and higher quality |
| Learning curve | Team training; pair programming with experienced practitioners |
| UI/integration tests harder | Apply TDD at the unit level; use separate integration test strategy |
7.3 Brief Example¶
# Step 1: RED — write a failing test
def test_fizzbuzz_returns_fizz_for_multiples_of_3():
assert fizzbuzz(3) == "Fizz"
assert fizzbuzz(6) == "Fizz"
assert fizzbuzz(9) == "Fizz"
# Step 2: GREEN — minimum code to pass
def fizzbuzz(n):
if n % 3 == 0:
return "Fizz"
return str(n)
# Step 3: Add more tests → RED
def test_fizzbuzz_returns_buzz_for_multiples_of_5():
assert fizzbuzz(5) == "Buzz"
# Step 4: GREEN
def fizzbuzz(n):
if n % 15 == 0:
return "FizzBuzz"
if n % 3 == 0:
return "Fizz"
if n % 5 == 0:
return "Buzz"
return str(n)
TDD is covered in greater depth in the Programming topic (Lesson 10: Testing and TDD).
8. Regression Testing and Test Automation¶
8.1 Regression Testing¶
A regression is a bug introduced by a change that used to work correctly. Regression testing re-runs previously passing tests after a change to detect regressions.
The regression trap: As systems grow, the manual regression test suite becomes impossibly large. A 100-feature system with 50 tests each = 5,000 manual test executions per release — infeasible.
Solution: Automate the regression suite. Every bug fixed gets an automated test. Every feature gets automated tests. The suite runs on every commit.
8.2 The Test Automation Pyramid¶
The test pyramid (Mike Cohn) prescribes the right balance of test types:
╱╲
╱ ╲
╱ UI ╲
╱ Tests ╲ ← Slow, brittle, expensive
╱──────────╲ Few (10–20% of suite)
╱ ╲
╱ Integration ╲
╱ Tests ╲ ← Medium speed/cost
╱────────────────────╲ Some (20–30% of suite)
╱ ╲
╱ Unit Tests ╲ ← Fast, cheap, precise
╱────────────────────────────╲ Many (50–70% of suite)
Inverting the pyramid (many UI tests, few unit tests) produces a slow, brittle test suite that gives poor feedback and is expensive to maintain.
8.3 CI/CD Integration¶
# GitHub Actions: automated test pipeline
name: Test Suite
on: [push, pull_request]
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.12'
- name: Install dependencies
run: pip install -r requirements-dev.txt
- name: Run unit tests with coverage
run: |
pytest tests/unit/ -v \
--cov=src \
--cov-report=xml \
--cov-fail-under=85
- name: Upload coverage
uses: codecov/codecov-action@v3
integration-tests:
runs-on: ubuntu-latest
needs: unit-tests # only run if unit tests pass
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: testpass
POSTGRES_DB: testdb
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
- name: Run integration tests
run: pytest tests/integration/ -v
env:
DATABASE_URL: postgresql://postgres:testpass@localhost/testdb
9. Reviews and Inspections¶
Static V&V (examining artifacts without execution) is often more cost-effective than testing for finding certain classes of defects.
9.1 The Fagan Inspection¶
Developed by Michael Fagan at IBM in 1976. Formal, role-based, and highly effective: Fagan's original study reported finding 82% of defects before the first test run.
Roles: | Role | Responsibility | |------|----------------| | Moderator | Plans and facilitates; ensures process is followed | | Author | Created the artifact under review; answers questions | | Reader | Reads/paraphrases the artifact during inspection | | Inspector(s) | Find defects; prepare independently before meeting | | Scribe | Records all defects and decisions |
Phases: 1. Planning: Moderator checks entry criteria; distributes materials 2. Overview: Author explains context and design intent 3. Preparation: Each inspector reviews independently, logs issues 4. Inspection Meeting: Reader reads; inspectors raise issues; scribe records 5. Rework: Author fixes all logged defects 6. Follow-up: Moderator verifies all defects addressed; checks exit criteria
Optimal inspection rate: 100–200 lines of code per hour. Rushing produces negligible results.
9.2 Walkthroughs¶
Less formal than Fagan inspection. The author presents the artifact and walks the team through it, inviting questions and comments. Good for education and knowledge transfer. Less systematic than formal inspection.
9.3 Peer Code Review (Pull Request Review)¶
The most common form of code review in modern teams. Covered in detail in Lesson 07 (Software Quality Assurance, Section 10).
9.4 Inspection vs Testing Effectiveness¶
| Technique | Best at finding | Not good at finding |
|---|---|---|
| Inspection | Logic errors, design problems, standards violations, missing requirements | Timing bugs, performance problems |
| Testing | Integration failures, performance issues, timing bugs | Missing functionality (can't test what isn't there) |
They are complementary. Neither alone is sufficient.
10. Formal Verification¶
Formal verification uses mathematical proof to establish that a system satisfies its specification. It provides the highest assurance level but is expensive and requires specialized expertise.
10.1 Model Checking¶
Model checking exhaustively explores all possible states of a finite-state model of the system. It either confirms a property holds for all states or produces a counterexample trace.
Use cases: Communication protocols (TLS handshake), concurrent systems (race condition checking), hardware design.
Tools: SPIN, TLA+, Alloy, NuSMV.
(* TLA+ specification: a simple mutual exclusion algorithm *)
VARIABLES pc1, pc2, flag1, flag2
Init == pc1 = "start" /\ pc2 = "start"
/\ flag1 = FALSE /\ flag2 = FALSE
(* Process 1 sets its flag and waits for process 2 to clear *)
Next1 == \/ /\ pc1 = "start" /\ pc1' = "try" /\ flag1' = TRUE /\ UNCHANGED <<pc2, flag2>>
\/ /\ pc1 = "try" /\ ~flag2 /\ pc1' = "cs" /\ UNCHANGED <<flag1, pc2, flag2>>
\/ /\ pc1 = "cs" /\ pc1' = "start" /\ flag1' = FALSE /\ UNCHANGED <<pc2, flag2>>
(* Safety: both processes cannot be in the critical section simultaneously *)
MutualExclusion == ~(pc1 = "cs" /\ pc2 = "cs")
10.2 Theorem Proving¶
Theorem provers (Coq, Isabelle/HOL, Lean) require human guidance to construct proofs. They can handle infinite state spaces that model checkers cannot.
Notable applications: - seL4 microkernel: first OS kernel with complete formal correctness proof - CompCert C compiler: formally verified to produce correct machine code from C - CryptoVerif: formal verification of cryptographic protocols
10.3 Where Formal Methods Apply¶
Assurance Cost Typical domain
Level Factor
────────────────────────────────────────────────
Testing 1x All software
Code review 1.5x All software
Fagan insp. 3x High-reliability systems
Model check. 5–10x Protocols, concurrent systems
Theorem prov. 20–50x Safety-critical, cryptographic
Formal methods are practical for: - Small, well-defined components (a scheduler, a protocol state machine) - Security-critical algorithms (cryptographic primitives) - Systems where failure cost is extreme (pacemakers, aircraft flight control)
11. The Bug Lifecycle¶
11.1 Bug States¶
Discovered
│
▼
[New] ─── duplicate? ──▶ [Duplicate] → closed
│
▼
[Assigned] ─── not a bug? ──▶ [Rejected] → closed
│
developer
works on
│
▼
[Fixed]
│
tester
verifies
├──── still fails ──▶ [Reopened] ──▶ [Assigned]
│
▼
[Verified]
│
▼
[Closed]
11.2 Bug Severity vs Priority¶
| Severity | Definition | Example |
|---|---|---|
| Critical | System crash, data loss, security breach — no workaround | Login always crashes the app |
| High | Major feature broken, no workaround | Users cannot complete checkout |
| Medium | Feature broken but workaround exists | Export to PDF fails; users can copy-paste |
| Low | Minor issue; cosmetic | Button misaligned by 2px |
| Priority | Definition | Example |
|---|---|---|
| P1 | Fix immediately — stop the release or rollback | Critical bug in payment processing |
| P2 | Fix in current sprint | High-severity bug blocking key user workflow |
| P3 | Fix in next sprint | Medium bug that has a known workaround |
| P4 | Fix when time allows | Low-severity cosmetic bug |
Severity and priority are independent. A critical bug in a rarely-used admin feature may be P3. A low-severity bug on the landing page seen by all users may be P2.
11.3 Writing an Effective Bug Report¶
Title: Password reset link expires after 1 use (expected: 10 minutes)
ID: BUG-2847
Severity: High
Priority: P2
Reporter: Q. Chen
Assigned to: R. Patel
Version: v2.3.1
Environment: Production (also reproduced on staging)
Steps to Reproduce:
1. Click "Forgot Password" on login page
2. Enter registered email address
3. Check email; click the reset link → redirected to reset form ✓
4. Set new password → success message ✓
5. Within 10 minutes, click the same link from the email again
Expected:
Form displays "This link has expired" (link should be valid for 10 minutes)
Actual:
Server returns HTTP 500 Internal Server Error
Attachments:
- screenshot_500_error.png
- server_error_log_2024-03-15_14:22:07.txt
Additional context:
Only happens when the same link is used a second time. The 500 suggests
the token deletion throws an exception if the token is already deleted.
Possible missing "token exists" check before deletion.
12. Summary¶
Verification and Validation form a complementary system: verification ensures the software is built correctly (according to specification); validation ensures the right product is being built (meets user needs).
Key takeaways:
- Four testing levels — unit, integration, system, acceptance — each with a distinct scope and purpose. Design your test strategy to address all four.
- Black-box techniques — equivalence partitioning, boundary value analysis, decision tables, state transition — provide systematic coverage of specification-level behavior without requiring access to source code.
- White-box techniques — statement, branch, path, condition coverage — provide quantitative measures of structural thoroughness. Aim for ≥80% branch coverage as a practical minimum.
- The test pyramid — many fast unit tests, fewer integration tests, few end-to-end tests — produces a fast, maintainable test suite. Inverting it creates a slow, brittle one.
- Reviews and inspections find different defects than testing; formal Fagan inspection is among the most cost-effective defect-finding techniques ever measured.
- TDD integrates test writing into the development rhythm and produces a regression suite as a free by-product.
- Formal verification provides mathematical certainty at high cost; appropriate for safety-critical and security-critical components.
- Bug reports are communication artifacts: complete, reproducible, specific reports get fixed faster.
13. Practice Exercises¶
Exercise 1 — Equivalence Partitioning and Boundary Value Analysis
A password validation function requires:
- Length: 8–64 characters
- Must contain at least one uppercase letter
- Must contain at least one digit
- Must contain at least one special character from !@#$%^&*()
- Must not contain spaces
(a) Identify all equivalence partitions for each rule. (b) Create a boundary value test set for the length rule. (c) Using a decision table, identify all combinations of condition violations and the expected error message for each.
Exercise 2 — Coverage Analysis
For the following function, draw the control flow graph. Then: (a) Calculate the cyclomatic complexity. (b) Identify a minimal test set that achieves 100% branch coverage. (c) Identify all independent paths (for path coverage). How many test cases does path coverage require?
def shipping_cost(weight_kg, express, country):
if weight_kg <= 0:
raise ValueError("Weight must be positive")
base = weight_kg * 2.50
if express:
base *= 1.75
if country == "domestic":
return base
elif country == "canada":
return base * 1.20
else:
return base * 2.00
Exercise 3 — Test Plan
You are testing a mobile banking application before its v2.0 release. The release includes: - New biometric authentication (fingerprint / Face ID) - Peer-to-peer payment feature (send money to contacts) - Redesigned transaction history screen
Write a test plan outline including: - Scope (what is in and out of scope) - Testing approach (which levels and types of tests, and why) - Entry and exit criteria - At least 3 risks and their mitigation strategies
Exercise 4 — Fagan Inspection
Your team is about to do a Fagan inspection of a 150-line authentication module. The module was written by a senior developer and will be reviewed by four engineers.
(a) How long should the preparation phase and the inspection meeting each take, given recommended inspection rates? (b) Design an inspection checklist with at least 8 items specific to authentication code. (c) During the inspection, the author starts explaining why each design decision was made before inspectors raise issues. What is the moderator's responsibility here, and why?
Exercise 5 — Bug Report
You discover the following behavior in a web application: when you sort the product list by "Price: Low to High," items priced at $0.00 (free) appear at the bottom of the list, not the top.
Write a complete, professional bug report following the template from Section 11.3. Include: - A descriptive title - Appropriate severity and priority ratings with justification - Precise steps to reproduce - Expected vs actual behavior - At least one hypothesis about the root cause
14. Further Reading¶
- Books:
- The Art of Software Testing (3rd ed.) — Glenford Myers, Corey Sandler, Tom Badgett. The classic introduction.
- Software Testing: A Craftsman's Approach (4th ed.) — Paul Jorgensen. Comprehensive coverage of all testing techniques.
- Continuous Delivery — Jez Humble and David Farley. How to automate the entire delivery pipeline including testing.
-
Introduction to the Theory of Computation — Michael Sipser. Background for formal methods.
-
Standards:
- IEEE Std 829-2008 — IEEE Standard for Software and System Test Documentation
- IEEE Std 1012-2016 — IEEE Standard for System, Software, and Hardware Verification and Validation
-
ISO/IEC 29119 — Software testing standard (5-part series)
-
Tools:
- pytest — https://pytest.org/ (Python testing framework)
- Jest — https://jestjs.io/ (JavaScript testing)
- Locust — https://locust.io/ (load testing)
- SPIN — http://spinroot.com/ (model checker for concurrent systems)
-
TLA+ — https://lamport.azurewebsites.net/tla/tla.html (formal specification language)
-
Papers and Articles:
- Fagan, M. E. (1976). "Design and Code Inspections to Reduce Errors in Program Development." IBM Systems Journal.
- Myers, G. J. (1978). "A Controlled Experiment in Program Testing and Code Walkthroughs/Inspections." Communications of the ACM.
- Boehm, B. (1979). "Guidelines for Verifying and Validating Software Requirements and Design Specifications." EURO IFIP.
Previous: 07. Software Quality Assurance | Next: 09. Configuration Management