Object Detection Basics
Object Detection Basics¶
Overview¶
We will learn fundamental methods for detecting specific objects in images. This lesson covers the principles and implementation of traditional object detection techniques including template matching, Haar Cascade, and HOG+SVM.
Difficulty: ***
Prerequisites: Image filtering, edge detection, feature detection
Table of Contents¶
- Template Matching
- Template Matching Methods Comparison
- Multi-scale Template Matching
- Haar Cascade Classifier
- CascadeClassifier Usage
- HOG + SVM Pedestrian Detection
- Practice Problems
1. Template Matching¶
Basic Concept¶
Template Matching: A method that slides a small template image
over a larger image and computes similarity
+-------------------------------+
| Source Image |
| +---------------------+ |
| | | |
| | +----+ | |
| | | T | <- Template| |
| | +----+ position | |
| | search | |
| +---------------------+ |
| |
| Result: Similarity map at |
| each position |
+-------------------------------+
Basic matchTemplate() Usage¶
import cv2
import numpy as np
# Load image and template
img = cv2.imread('image.jpg')
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.imread('template.jpg', cv2.IMREAD_GRAYSCALE)
# Template size
h, w = template.shape
# Perform template matching
result = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
# Find min/max locations
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
# For TM_CCOEFF_NORMED, maximum value is best match
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
# Visualize result
cv2.rectangle(img, top_left, bottom_right, (0, 255, 0), 2)
cv2.imshow('Detected', img)
cv2.waitKey(0)
Understanding Template Matching Results¶
Source Image (W x H) Template (w x h) Result Image
+---------------+ +---+ +-----------+
| | | T | | |
| W | + |w*h| = | (W-w+1) |
| | +---+ | x |
| H | | (H-h+1) |
| | | |
+---------------+ +-----------+
Each pixel in result image = matching score at that location
2. Template Matching Methods Comparison¶
Matching Method Types¶
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load images
img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
template = cv2.imread('template.jpg', cv2.IMREAD_GRAYSCALE)
# 6 matching methods
methods = [
('TM_SQDIFF', cv2.TM_SQDIFF), # Squared difference
('TM_SQDIFF_NORMED', cv2.TM_SQDIFF_NORMED), # Normalized squared difference
('TM_CCORR', cv2.TM_CCORR), # Cross correlation
('TM_CCORR_NORMED', cv2.TM_CCORR_NORMED), # Normalized cross correlation
('TM_CCOEFF', cv2.TM_CCOEFF), # Correlation coefficient
('TM_CCOEFF_NORMED', cv2.TM_CCOEFF_NORMED) # Normalized correlation coefficient
]
h, w = template.shape
for name, method in methods:
result = cv2.matchTemplate(img, template, method)
# SQDIFF uses minimum as best, others use maximum
if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]:
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
top_left = min_loc
else:
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
top_left = max_loc
print(f"{name}: location={top_left}, score={max_val:.4f}")
Method Characteristics¶
+--------------------+-----------------------------------------+
| Method | Characteristics |
+--------------------+-----------------------------------------+
| TM_SQDIFF | Sum of squared differences. Closer to |
| | 0 is better. Sensitive to lighting |
+--------------------+-----------------------------------------+
| TM_SQDIFF_NORMED | Normalized squared difference. 0-1 range|
| | Closer to 0 is better |
+--------------------+-----------------------------------------+
| TM_CCORR | Cross correlation. Higher is better |
| | Can be biased towards bright regions |
+--------------------+-----------------------------------------+
| TM_CCORR_NORMED | Normalized cross correlation. 0-1 range |
| | Higher is better |
+--------------------+-----------------------------------------+
| TM_CCOEFF | Correlation coefficient. Subtracts mean |
| | to handle lighting. Higher is better |
+--------------------+-----------------------------------------+
| TM_CCOEFF_NORMED | Normalized correlation coefficient. |
| | -1 to 1 range. Closer to 1 is better. |
| | Most widely used |
+--------------------+-----------------------------------------+
Multiple Object Detection¶
import cv2
import numpy as np
def find_multiple_matches(img, template, threshold=0.8):
"""Detect multiple identical objects"""
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY) \
if len(template.shape) == 3 else template
h, w = template_gray.shape
# Template matching
result = cv2.matchTemplate(img_gray, template_gray, cv2.TM_CCOEFF_NORMED)
# Find locations above threshold
locations = np.where(result >= threshold)
# Draw results
img_result = img.copy()
matches = []
for pt in zip(*locations[::-1]): # Convert to x, y order
# Simple Non-Maximum Suppression
is_new = True
for existing in matches:
if abs(pt[0] - existing[0]) < w//2 and abs(pt[1] - existing[1]) < h//2:
is_new = False
break
if is_new:
matches.append(pt)
cv2.rectangle(img_result, pt, (pt[0] + w, pt[1] + h), (0, 255, 0), 2)
print(f"Objects detected: {len(matches)}")
return img_result, matches
# Usage example
img = cv2.imread('coins.jpg')
template = cv2.imread('coin_template.jpg')
result, locations = find_multiple_matches(img, template, threshold=0.85)
3. Multi-scale Template Matching¶
Problem and Solution¶
Problem: Template matching is vulnerable to scale changes
Detection fails if source and template sizes differ
Solution: Perform matching at various scales
Source Image Templates at various sizes
+---------+ +--+ +---+ +----+
| ? | |T | | T | | T |
| | x +--+ +---+ +----+
| | small medium large
+---------+
Or
Source at various sizes Template
+---------+
| |
| | +---+
+---------+ | T |
+-------+ x +---+
| |
+-------+
Multi-scale Matching Implementation¶
import cv2
import numpy as np
def multi_scale_template_matching(img, template, scale_range=(0.5, 1.5),
scale_step=0.1, method=cv2.TM_CCOEFF_NORMED):
"""Multi-scale template matching"""
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY) \
if len(template.shape) == 3 else template
best_match = None
best_val = -1
best_scale = 1.0
th, tw = template_gray.shape
# Match at various scales
for scale in np.arange(scale_range[0], scale_range[1] + scale_step, scale_step):
# Resize template
new_w = int(tw * scale)
new_h = int(th * scale)
# Skip if template is larger than image
if new_w > img_gray.shape[1] or new_h > img_gray.shape[0]:
continue
scaled_template = cv2.resize(template_gray, (new_w, new_h))
# Template matching
result = cv2.matchTemplate(img_gray, scaled_template, method)
# Find maximum
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]:
if best_match is None or max_val < best_val:
best_val = max_val
best_match = min_loc
best_scale = scale
else:
if max_val > best_val:
best_val = max_val
best_match = max_loc
best_scale = scale
# Visualize result
if best_match is not None:
result_img = img.copy()
top_left = best_match
bottom_right = (int(top_left[0] + tw * best_scale),
int(top_left[1] + th * best_scale))
cv2.rectangle(result_img, top_left, bottom_right, (0, 255, 0), 2)
print(f"Optimal scale: {best_scale:.2f}")
print(f"Match score: {best_val:.4f}")
print(f"Location: {top_left}")
return result_img, best_match, best_scale, best_val
return img, None, None, None
# Usage example
img = cv2.imread('scene.jpg')
template = cv2.imread('object.jpg')
result, loc, scale, score = multi_scale_template_matching(
img, template,
scale_range=(0.3, 2.0),
scale_step=0.05
)
Pyramid-based Multi-scale Matching¶
def pyramid_template_matching(img, template, levels=5, scale_factor=0.75):
"""Multi-scale matching using image pyramid"""
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY) \
if len(template.shape) == 3 else template
best_result = {
'location': None,
'value': -1,
'scale': 1.0,
'size': template_gray.shape
}
current_scale = 1.0
for level in range(levels):
# Image size at current scale
scaled_img = cv2.resize(img_gray, None,
fx=current_scale, fy=current_scale)
# Stop if template is larger than image
if (scaled_img.shape[0] < template_gray.shape[0] or
scaled_img.shape[1] < template_gray.shape[1]):
break
# Template matching
result = cv2.matchTemplate(scaled_img, template_gray,
cv2.TM_CCOEFF_NORMED)
_, max_val, _, max_loc = cv2.minMaxLoc(result)
if max_val > best_result['value']:
# Convert to original image coordinates
orig_loc = (int(max_loc[0] / current_scale),
int(max_loc[1] / current_scale))
best_result = {
'location': orig_loc,
'value': max_val,
'scale': current_scale,
'size': (int(template_gray.shape[1] / current_scale),
int(template_gray.shape[0] / current_scale))
}
current_scale *= scale_factor
return best_result
# Usage example
img = cv2.imread('scene.jpg')
template = cv2.imread('object.jpg')
result = pyramid_template_matching(img, template, levels=8)
if result['location']:
img_result = img.copy()
x, y = result['location']
w, h = result['size']
cv2.rectangle(img_result, (x, y), (x + w, y + h), (0, 255, 0), 2)
print(f"Detection location: {result['location']}")
print(f"Detection scale: {result['scale']:.3f}")
print(f"Match score: {result['value']:.4f}")
4. Haar Cascade Classifier¶
Understanding Haar Features¶
Haar-like Features: Use difference between bright and dark regions
Basic Haar Features:
+-------------------------------------------------------+
| |
| Edge features |
| +----+----+ +----+ |
| |####| | |####| |
| |####| | +----+ |
| +----+----+ | | |
| +----+ |
| |
| Line features |
| +----+----+----+ +----+ |
| |####| |####| |####| |
| +----+----+----+ +----+ |
| | | |
| +----+ |
| |####| |
| +----+ |
| |
| Center-surround features |
| +----+----+----+ |
| |####| |####| |
| +----+----+----+ |
| |####| |####| |
| +----+----+----+ |
| |
| #### = Black region (sum then subtract) |
| blank = White region (sum) |
| |
| Feature value = sum(white regions) - sum(black) |
+-------------------------------------------------------+
Integral Image¶
Integral Image: Technique for O(1) feature computation
Original Image Integral Image
+---+---+---+ +---+---+---+
| 1 | 2 | 3 | | 1 | 3 | 6 |
+---+---+---+ -> +---+---+---+
| 4 | 5 | 6 | | 5 |12 |21 |
+---+---+---+ +---+---+---+
| 7 | 8 | 9 | |12 |27 |45 |
+---+---+---+ +---+---+---+
Integral image computation:
ii(x,y) = sum i(x',y') for x'<=x, y'<=y
Region sum (only 4 array accesses needed):
A ----- B
| |
| region|
| |
C ----- D
Region sum = ii(D) - ii(B) - ii(C) + ii(A)
Cascade Structure¶
Cascade: Staged classifier
Image window
|
v
+---------+ NO (fast reject)
| Stage 1 | ------------------> Not object
| (simple)|
+----+----+
| YES
v
+---------+ NO
| Stage 2 | ------------------> Not object
| |
+----+----+
| YES
v
...
|
v
+---------+ NO
| Stage N | ------------------> Not object
| (complex)|
+----+----+
| YES
v
Object!
Advantage: Most non-objects rejected quickly at early stages
5. CascadeClassifier Usage¶
Basic Usage¶
import cv2
# Load Haar Cascade classifier
# Use pre-trained classifiers included with OpenCV
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
)
eye_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_eye.xml'
)
# Load image
img = cv2.imread('people.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = face_cascade.detectMultiScale(
gray, # Input image (grayscale)
scaleFactor=1.1, # Image reduction ratio
minNeighbors=5, # Minimum neighbors (higher = stricter)
minSize=(30, 30), # Minimum object size
maxSize=(300, 300) # Maximum object size
)
# Draw detection results
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
# Detect eyes within face region
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray, 1.1, 3)
for (ex, ey, ew, eh) in eyes:
cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)
print(f"Faces detected: {len(faces)}")
cv2.imshow('Face Detection', img)
cv2.waitKey(0)
detectMultiScale Parameters¶
detectMultiScale(image, scaleFactor, minNeighbors, ...)
+-------------------------------------------------------------+
| scaleFactor: Image reduction ratio at each scale |
| |
| scaleFactor = 1.1 (default) |
| +---------+ |
| | 100x100 | -> 91x91 -> 83x83 -> 75x75 -> ... |
| +---------+ |
| Smaller = more precise but slower |
| |
+-------------------------------------------------------------+
| minNeighbors: Minimum detection count to accept as object |
| |
| minNeighbors = 3 |
| +---------------+ |
| | +-+ +-+ | -> 2 detections -> Ignore (< 3) |
| | +-+ +-+ | |
| +---------------+ |
| Higher = fewer false positives, more missed detections |
| |
+-------------------------------------------------------------+
| minSize, maxSize: Object size range to detect |
| |
| minSize=(30, 30) maxSize=(300, 300) |
| Ignore below 30x30 pixels or above 300x300 pixels |
+-------------------------------------------------------------+
Available Cascade Files¶
import cv2
import os
# List available Haar Cascade files
cascade_dir = cv2.data.haarcascades
print("Available Cascade files:")
for f in sorted(os.listdir(cascade_dir)):
if f.endswith('.xml'):
print(f" - {f}")
# Major Cascade files:
# haarcascade_frontalface_default.xml - Frontal face
# haarcascade_frontalface_alt.xml - Frontal face (alternative)
# haarcascade_frontalface_alt2.xml - Frontal face (alternative 2)
# haarcascade_profileface.xml - Profile face
# haarcascade_eye.xml - Eyes
# haarcascade_eye_tree_eyeglasses.xml - Eyes with glasses
# haarcascade_smile.xml - Smile
# haarcascade_fullbody.xml - Full body
# haarcascade_upperbody.xml - Upper body
# haarcascade_lowerbody.xml - Lower body
# haarcascade_frontalcatface.xml - Cat face
# haarcascade_russian_plate_number.xml - Russian license plate
Multiple Cascade Combination¶
import cv2
class FaceFeatureDetector:
"""Face feature detector"""
def __init__(self):
self.face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
self.eye_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_eye.xml')
self.smile_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_smile.xml')
def detect(self, img):
"""Detect face, eyes, and smile"""
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.equalizeHist(gray) # Histogram equalization
results = []
# Detect faces
faces = self.face_cascade.detectMultiScale(gray, 1.1, 5,
minSize=(60, 60))
for (x, y, w, h) in faces:
face_roi_gray = gray[y:y+h, x:x+w]
face_data = {
'bbox': (x, y, w, h),
'eyes': [],
'smiling': False
}
# Detect eyes in top half of face
eye_roi = face_roi_gray[0:h//2, :]
eyes = self.eye_cascade.detectMultiScale(eye_roi, 1.1, 3,
minSize=(20, 20))
for (ex, ey, ew, eh) in eyes:
face_data['eyes'].append((x + ex, y + ey, ew, eh))
# Detect smile in bottom half of face
smile_roi = face_roi_gray[h//2:, :]
smiles = self.smile_cascade.detectMultiScale(smile_roi, 1.7, 20,
minSize=(25, 25))
face_data['smiling'] = len(smiles) > 0
results.append(face_data)
return results
def draw_results(self, img, results):
"""Visualize results"""
output = img.copy()
for face in results:
x, y, w, h = face['bbox']
# Face rectangle
color = (0, 255, 0) if face['smiling'] else (255, 0, 0)
cv2.rectangle(output, (x, y), (x+w, y+h), color, 2)
# Eye circles
for (ex, ey, ew, eh) in face['eyes']:
center = (ex + ew//2, ey + eh//2)
radius = min(ew, eh) // 2
cv2.circle(output, center, radius, (0, 255, 255), 2)
# Smile status
label = "Smiling :)" if face['smiling'] else "Neutral"
cv2.putText(output, label, (x, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
return output
# Usage example
detector = FaceFeatureDetector()
img = cv2.imread('group_photo.jpg')
results = detector.detect(img)
output = detector.draw_results(img, results)
cv2.imshow('Face Features', output)
6. HOG + SVM Pedestrian Detection¶
Understanding HOG (Histogram of Oriented Gradients)¶
HOG: Uses gradient direction distribution in local regions as features
1. Grayscale conversion
2. Gradient computation
+-------------------------------------------+
| Gx = Horizontal gradient (Sobel x) |
| Gy = Vertical gradient (Sobel y) |
| |
| Magnitude: G = sqrt(Gx^2 + Gy^2) |
| Direction: theta = arctan(Gy/Gx) |
+-------------------------------------------+
3. Compute gradient histogram per cell
+-----------------------------------------+
| Divide image into 8x8 pixel cells |
| Direction histogram per cell (9 bins) |
| |
| 0 20 40 60 80 100 120 140 160 |
| +---+---+---+---+---+---+---+---+---+ |
| | |###| | |#####| | | | |
| +---+---+---+---+---+---+---+---+---+ |
+-----------------------------------------+
4. Block normalization
+-----------------------------------------+
| 2x2 cells = 1 block |
| Concatenate histograms in block then |
| normalize |
| |
| +----+----+ |
| |cell|cell| -> [36-dim feature vector] |
| +----+----+ (9 x 4 = 36) |
| |cell|cell| |
| +----+----+ |
+-----------------------------------------+
5. Concatenate all block features for final HOG descriptor
Using HOG Pedestrian Detector¶
import cv2
import numpy as np
# HOG descriptor + SVM classifier
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
# Load image
img = cv2.imread('street.jpg')
img = cv2.resize(img, None, fx=0.5, fy=0.5) # Resize for speed
# Pedestrian detection
# detectMultiScale returns: (detected regions, confidence weights)
boxes, weights = hog.detectMultiScale(
img,
winStride=(8, 8), # Window stride
padding=(4, 4), # Padding
scale=1.05, # Scale factor
hitThreshold=0, # SVM threshold
finalThreshold=2.0 # Final grouping threshold
)
# Draw results
for (x, y, w, h), weight in zip(boxes, weights):
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.putText(img, f'{weight[0]:.2f}', (x, y-5),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)
print(f"Pedestrians detected: {len(boxes)}")
cv2.imshow('Pedestrian Detection', img)
cv2.waitKey(0)
Non-Maximum Suppression (NMS)¶
import cv2
import numpy as np
def non_max_suppression(boxes, scores, threshold=0.5):
"""Non-Maximum Suppression implementation"""
if len(boxes) == 0:
return []
# Convert coordinates to float
boxes = boxes.astype(np.float32)
# Separate coordinates
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 0] + boxes[:, 2]
y2 = boxes[:, 1] + boxes[:, 3]
# Compute areas
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
# Sort by scores (descending)
order = scores.flatten().argsort()[::-1]
keep = []
while order.size > 0:
# Select box with highest score
i = order[0]
keep.append(i)
# Compute IoU with remaining boxes
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0, xx2 - xx1 + 1)
h = np.maximum(0, yy2 - yy1 + 1)
intersection = w * h
iou = intersection / (areas[i] + areas[order[1:]] - intersection)
# Keep only boxes with IoU below threshold
inds = np.where(iou <= threshold)[0]
order = order[inds + 1]
return keep
# HOG detection with NMS
def detect_pedestrians_with_nms(img, nms_threshold=0.3):
"""Pedestrian detection with NMS"""
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
# Detection
boxes, weights = hog.detectMultiScale(img, winStride=(8, 8),
padding=(4, 4), scale=1.05)
if len(boxes) == 0:
return img, []
# Apply NMS
boxes = np.array(boxes)
weights = np.array(weights)
keep = non_max_suppression(boxes, weights, nms_threshold)
# Draw results
result = img.copy()
final_boxes = []
for i in keep:
x, y, w, h = boxes[i]
final_boxes.append((x, y, w, h))
cv2.rectangle(result, (int(x), int(y)), (int(x+w), int(y+h)),
(0, 255, 0), 2)
return result, final_boxes
# Usage example
img = cv2.imread('crowd.jpg')
result, detections = detect_pedestrians_with_nms(img)
print(f"Detections after NMS: {len(detections)}")
HOG Feature Visualization¶
import cv2
import numpy as np
from skimage.feature import hog
from skimage import exposure
def visualize_hog(img):
"""HOG feature visualization"""
# Grayscale conversion
if len(img.shape) == 3:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
else:
gray = img
# Resize (64x128 - HOG pedestrian detection standard size)
resized = cv2.resize(gray, (64, 128))
# Use scikit-image's hog (includes visualization)
features, hog_image = hog(
resized,
orientations=9, # Number of gradient direction bins
pixels_per_cell=(8, 8), # Cell size
cells_per_block=(2, 2), # Cells per block
visualize=True,
block_norm='L2-Hys'
)
# Rescale for visualization
hog_image_rescaled = exposure.rescale_intensity(hog_image,
out_range=(0, 255))
hog_image_rescaled = hog_image_rescaled.astype(np.uint8)
print(f"HOG feature vector size: {features.shape[0]}")
return hog_image_rescaled, features
# Usage example (requires scikit-image: pip install scikit-image)
# img = cv2.imread('person.jpg')
# hog_vis, features = visualize_hog(img)
# cv2.imshow('HOG Visualization', hog_vis)
Custom HOG + SVM Training (Concept)¶
import cv2
import numpy as np
from sklearn import svm
from sklearn.model_selection import train_test_split
def train_hog_svm_classifier(positive_samples, negative_samples):
"""HOG + SVM classifier training (conceptual example)"""
# HOG descriptor setup
win_size = (64, 128)
block_size = (16, 16)
block_stride = (8, 8)
cell_size = (8, 8)
nbins = 9
hog = cv2.HOGDescriptor(win_size, block_size, block_stride,
cell_size, nbins)
# Feature extraction
features = []
labels = []
# Positive samples (images with object)
for img in positive_samples:
img_resized = cv2.resize(img, win_size)
gray = cv2.cvtColor(img_resized, cv2.COLOR_BGR2GRAY)
h = hog.compute(gray)
features.append(h.flatten())
labels.append(1)
# Negative samples (images without object)
for img in negative_samples:
img_resized = cv2.resize(img, win_size)
gray = cv2.cvtColor(img_resized, cv2.COLOR_BGR2GRAY)
h = hog.compute(gray)
features.append(h.flatten())
labels.append(0)
X = np.array(features)
y = np.array(labels)
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# SVM training
clf = svm.LinearSVC(C=0.01)
clf.fit(X_train, y_train)
# Print accuracy
accuracy = clf.score(X_test, y_test)
print(f"Test accuracy: {accuracy:.4f}")
return hog, clf
# Method to set trained SVM to HOGDescriptor
def set_svm_detector(hog, clf):
"""Set trained SVM to HOG detector"""
# Extract LinearSVC coefficients and intercept
sv = clf.coef_.flatten()
rho = -clf.intercept_[0]
# Convert to format expected by HOG descriptor
detector = np.append(sv, rho)
hog.setSVMDetector(detector)
return hog
7. Practice Problems¶
Problem 1: Multiple Template Matching¶
Write a program that matches multiple different templates simultaneously.
Requirements: - Use 3 or more different template images - Display detection results in different colors for each template - Output match score for each template
Hint
templates = [
('template1.jpg', (255, 0, 0)), # Blue
('template2.jpg', (0, 255, 0)), # Green
('template3.jpg', (0, 0, 255)) # Red
]
for template_path, color in templates:
template = cv2.imread(template_path, cv2.IMREAD_GRAYSCALE)
result = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
# ... matching and drawing
Problem 2: Rotation-Invariant Template Matching¶
Implement a program that matches templates rotated at various angles.
Requirements: - Rotate template from 0 to 360 degrees in 10-degree intervals - Record highest match score at each rotation angle - Output optimal rotation angle and location
Hint
def rotate_image(img, angle):
h, w = img.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated = cv2.warpAffine(img, M, (w, h))
return rotated
for angle in range(0, 360, 10):
rotated_template = rotate_image(template, angle)
# Perform template matching
Problem 3: Real-time Face Detection Optimization¶
Detect faces in real-time from webcam while maintaining 30+ FPS.
Requirements: - Adjust frame size - Optimize detectMultiScale parameters - Display FPS
Hint
# Optimization tips:
# 1. Reduce frame to half size
# 2. Increase scaleFactor to 1.2~1.3
# 3. Lower minNeighbors to 3
# 4. Set appropriate minSize
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
small_frame = cv2.resize(frame, None, fx=0.5, fy=0.5)
# Detect then scale coordinates by 2
Problem 4: HOG Visualization Tool¶
Write a program that visualizes HOG features in real-time.
Requirements: - Adjust HOG parameters with trackbars (cell_size, nbins) - Display original image and HOG visualization side by side - Show feature vector dimensions
Hint
def on_trackbar(val):
cell_size = cv2.getTrackbarPos('Cell Size', 'HOG')
if cell_size < 4:
cell_size = 4
# Recompute and visualize HOG
Problem 5: License Plate Detector¶
Implement a program that detects car license plates using Haar Cascade or template matching.
Requirements: - Detect license plate region - Crop and save detected region - Display confidence score
Hint
# haarcascade_russian_plate_number.xml or
# Use custom trained cascade
# Or detection using license plate characteristics:
# 1. Edge detection
# 2. Rectangular contour detection
# 3. Aspect ratio filtering (plates are typically 4:1 ~ 5:1)
Next Steps¶
- 16_Face_Detection.md - dlib, face_recognition, real-time face recognition
References¶
- OpenCV Template Matching
- OpenCV Cascade Classifier
- HOG Tutorial - Learn OpenCV
- Dalal, N., & Triggs, B. (2005). "Histograms of Oriented Gradients for Human Detection"
- Viola, P., & Jones, M. (2001). "Rapid Object Detection using a Boosted Cascade of Simple Features"