Geometric Transformations
Geometric Transformations¶
Overview¶
Geometric transformations are operations that change the spatial position of an image. This includes resizing, rotation, translation, flipping, and perspective transformations. In this document, we'll learn about OpenCV's geometric transformation functions and practical examples.
Difficulty: ββ (Beginner-Intermediate)
Learning Objectives:
- Understand cv2.resize() and interpolation methods
- Use rotation and flipping functions
- Apply affine transformation (warpAffine)
- Apply perspective transformation (warpPerspective)
- Implement document scanning/correction examples
Table of Contents¶
- Image Resizing - resize()
- Flipping and Rotation - flip(), rotate()
- Affine Transformation - warpAffine()
- Perspective Transformation - warpPerspective()
- Document Correction Example
- Practice Problems
- Next Steps
- References
1. Image Resizing - resize()¶
Basic Usage¶
import cv2
img = cv2.imread('image.jpg')
h, w = img.shape[:2]
# Method 1: Specify size directly (width, height order!)
resized = cv2.resize(img, (640, 480))
# Method 2: Specify by ratio
resized = cv2.resize(img, None, fx=0.5, fy=0.5) # Reduce to 50%
# Method 3: Maintain aspect ratio based on one dimension
new_width = 800
ratio = new_width / w
new_height = int(h * ratio)
resized = cv2.resize(img, (new_width, new_height))
Interpolation Methods¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Interpolation Comparison β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Method Features Use Cases β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β INTER_NEAREST Nearest neighbor Fast, low qual. β
β (Nearest interpolation) Blocky artifacts Real-time proc. β
β β
β INTER_LINEAR Linear (default) Balanced choice β
β (Bilinear interpolation) Smooth results General resizing β
β β
β INTER_AREA Area interpolation Best for shrink β
β (Area-based) Prevents moirΓ© Downsampling β
β β
β INTER_CUBIC Cubic interpolation Good for enlarg.β
β (Bicubic) Smooth and sharp Quality focus β
β β
β INTER_LANCZOS4 Lanczos interpolation Best quality β
β (8x8 neighbors) Sharpest Slow speed β
β β
β Recommendations: β
β - Shrinking: INTER_AREA β
β - Enlarging: INTER_CUBIC or INTER_LANCZOS4 β
β - Real-time: INTER_LINEAR or INTER_NEAREST β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Interpolation Comparison Example¶
import cv2
import matplotlib.pyplot as plt
img = cv2.imread('image.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Shrink first, then enlarge to compare differences
small = cv2.resize(img, None, fx=0.1, fy=0.1) # Reduce to 10%
interpolations = [
('NEAREST', cv2.INTER_NEAREST),
('LINEAR', cv2.INTER_LINEAR),
('AREA', cv2.INTER_AREA),
('CUBIC', cv2.INTER_CUBIC),
('LANCZOS4', cv2.INTER_LANCZOS4),
]
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()
axes[0].imshow(img)
axes[0].set_title('Original')
for ax, (name, interp) in zip(axes[1:], interpolations):
enlarged = cv2.resize(small, img.shape[:2][::-1], interpolation=interp)
ax.imshow(enlarged)
ax.set_title(f'{name}')
for ax in axes:
ax.axis('off')
plt.tight_layout()
plt.show()
Aspect Ratio-Preserving Resize Functions¶
import cv2
def resize_with_aspect_ratio(img, width=None, height=None, inter=cv2.INTER_AREA):
"""Resize while maintaining aspect ratio"""
h, w = img.shape[:2]
if width is None and height is None:
return img
if width is None:
ratio = height / h
new_size = (int(w * ratio), height)
else:
ratio = width / w
new_size = (width, int(h * ratio))
return cv2.resize(img, new_size, interpolation=inter)
def resize_to_fit(img, max_width, max_height, inter=cv2.INTER_AREA):
"""Fit within maximum size while maintaining aspect ratio"""
h, w = img.shape[:2]
ratio_w = max_width / w
ratio_h = max_height / h
ratio = min(ratio_w, ratio_h)
if ratio >= 1: # Already small enough
return img
new_size = (int(w * ratio), int(h * ratio))
return cv2.resize(img, new_size, interpolation=inter)
# Usage example
img = cv2.imread('large_image.jpg')
img_fit = resize_to_fit(img, 800, 600)
img_width = resize_with_aspect_ratio(img, width=640)
2. Flipping and Rotation - flip(), rotate()¶
cv2.flip()¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β flip() Operation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β flipCode = 1 (horizontal) flipCode = 0 (vertical) flipCode = -1β
β β
β Original Result Original Result Original Resultβ
β βββββ βββββ βββββ βββββ βββββ βββββ β
β β1 2β β2 1β β1 2β β3 4β β1 2β β4 3β β
β β3 4β β4 3β β3 4β β1 2β β3 4β β2 1β β
β βββββ βββββ βββββ βββββ βββββ βββββ β
β β
β Left-right flip Top-bottom flip Both flips β
β (Mirror effect) (Water reflection) (180Β° rotation)β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
import cv2
img = cv2.imread('image.jpg')
# Horizontal flip (left-right)
flipped_h = cv2.flip(img, 1)
# Vertical flip (top-bottom)
flipped_v = cv2.flip(img, 0)
# Both directions (equivalent to 180Β° rotation)
flipped_both = cv2.flip(img, -1)
# Also possible with NumPy
import numpy as np
flipped_h_np = img[:, ::-1] # Horizontal
flipped_v_np = img[::-1, :] # Vertical
flipped_both_np = img[::-1, ::-1] # Both
cv2.rotate()¶
import cv2
img = cv2.imread('image.jpg')
# 90 degrees clockwise
rotated_90_cw = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
# 90 degrees counter-clockwise
rotated_90_ccw = cv2.rotate(img, cv2.ROTATE_90_COUNTERCLOCKWISE)
# 180 degrees
rotated_180 = cv2.rotate(img, cv2.ROTATE_180)
# Check image size changes
print(f"Original: {img.shape}") # (H, W, C)
print(f"90Β°: {rotated_90_cw.shape}") # (W, H, C) - swapped
print(f"180Β°: {rotated_180.shape}") # (H, W, C) - same
Arbitrary Angle Rotation¶
import cv2
def rotate_image(img, angle, center=None, scale=1.0):
"""Rotate image by arbitrary angle"""
h, w = img.shape[:2]
if center is None:
center = (w // 2, h // 2)
# Create rotation matrix
M = cv2.getRotationMatrix2D(center, angle, scale)
# Apply rotation
rotated = cv2.warpAffine(img, M, (w, h))
return rotated
def rotate_image_full(img, angle):
"""Rotate image without cropping (expand canvas)"""
h, w = img.shape[:2]
center = (w // 2, h // 2)
# Rotation matrix
M = cv2.getRotationMatrix2D(center, angle, 1.0)
# Calculate new bounds after rotation
cos = abs(M[0, 0])
sin = abs(M[0, 1])
new_w = int(h * sin + w * cos)
new_h = int(h * cos + w * sin)
# Adjust translation
M[0, 2] += (new_w - w) / 2
M[1, 2] += (new_h - h) / 2
rotated = cv2.warpAffine(img, M, (new_w, new_h))
return rotated
# Usage examples
img = cv2.imread('image.jpg')
rotated_30 = rotate_image(img, 30) # 30Β° rotation (partially cropped)
rotated_45_full = rotate_image_full(img, 45) # 45Β° rotation (fully preserved)
3. Affine Transformation - warpAffine()¶
What is Affine Transformation?¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Affine Transformation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Affine transformation preserves lines as lines and parallel β
β lines as parallel lines β
β β
β Included transformations: β
β - Translation β
β - Rotation β
β - Scale β
β - Shear β
β β
β Transformation matrix (2x3): β
β β β β β β
β β a b txβ β scale*cos -sin txβ β
β β c d tyβ = β sin scale*cos tyβ β
β β β β β β
β β
β [x'] [a b tx] [x] β
β [y'] = [c d ty] Γ [y] β
β [1] β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Translation¶
import cv2
import numpy as np
img = cv2.imread('image.jpg')
h, w = img.shape[:2]
# Translation matrix: move 100 in x, 50 in y
tx, ty = 100, 50
M = np.float32([
[1, 0, tx],
[0, 1, ty]
])
translated = cv2.warpAffine(img, M, (w, h))
cv2.imshow('Original', img)
cv2.imshow('Translated', translated)
cv2.waitKey(0)
cv2.destroyAllWindows()
Rotation + Scale¶
import cv2
img = cv2.imread('image.jpg')
h, w = img.shape[:2]
# getRotationMatrix2D(center, angle, scale)
center = (w // 2, h // 2)
angle = 45 # 45 degrees counter-clockwise
scale = 0.7 # 70% size
M = cv2.getRotationMatrix2D(center, angle, scale)
rotated = cv2.warpAffine(img, M, (w, h))
Shear Transformation¶
import cv2
import numpy as np
img = cv2.imread('image.jpg')
h, w = img.shape[:2]
# Horizontal shear
shear_x = 0.3
M_shear_x = np.float32([
[1, shear_x, 0],
[0, 1, 0]
])
sheared_x = cv2.warpAffine(img, M_shear_x, (int(w + h * shear_x), h))
# Vertical shear
shear_y = 0.3
M_shear_y = np.float32([
[1, 0, 0],
[shear_y, 1, 0]
])
sheared_y = cv2.warpAffine(img, M_shear_y, (w, int(h + w * shear_y)))
Affine Transformation Using 3 Points¶
import cv2
import numpy as np
img = cv2.imread('image.jpg')
h, w = img.shape[:2]
# 3 points from source
src_pts = np.float32([
[0, 0], # Top-left
[w-1, 0], # Top-right
[0, h-1] # Bottom-left
])
# 3 points after transformation
dst_pts = np.float32([
[50, 50], # Top-left
[w-50, 30], # Top-right
[30, h-50] # Bottom-left
])
# Calculate affine transformation matrix
M = cv2.getAffineTransform(src_pts, dst_pts)
# Apply transformation
result = cv2.warpAffine(img, M, (w, h))
# Mark points
for pt in src_pts.astype(int):
cv2.circle(img, tuple(pt), 5, (0, 0, 255), -1)
for pt in dst_pts.astype(int):
cv2.circle(result, tuple(pt), 5, (0, 255, 0), -1)
4. Perspective Transformation - warpPerspective()¶
What is Perspective Transformation?¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Perspective Transformation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Perspective transformation converts trapezoid to rectangle β
β (or vice versa). Transforms images captured in 3D space as if β
β viewed from the front β
β β
β Practical applications: β
β - Document scanning (tilted document β front view) β
β - Lane detection (Bird's eye view) β
β - QR code recognition β
β - Image rectification β
β β
β Transformation matrix (3x3): β
β β β β
β β h11 h12 h13 β β
β β h21 h22 h23 β β
β β h31 h32 h33 β β
β β β β
β β
β Source (trapezoid) Result (rectangle) β
β βββββββββββββββ βββββββββββββββββββ β
β β βββββββββββ β β βββββββββββββββ β β
β β β β β ββββΆ β β β β β
β β β Documentβ β β β Document β β β
β β β β β β β β β β
β β ββββββββββββββ β βββββββββββββββ β β
β βββββββββββββββ βββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Perspective Transformation Using 4 Points¶
import cv2
import numpy as np
img = cv2.imread('tilted_document.jpg')
h, w = img.shape[:2]
# 4 points from source (document corners)
src_pts = np.float32([
[100, 50], # Top-left
[500, 80], # Top-right
[550, 400], # Bottom-right
[50, 380] # Bottom-left
])
# 4 points after transformation (front-facing rectangle)
dst_pts = np.float32([
[0, 0],
[500, 0],
[500, 400],
[0, 400]
])
# Calculate perspective transformation matrix
M = cv2.getPerspectiveTransform(src_pts, dst_pts)
# Apply transformation
result = cv2.warpPerspective(img, M, (500, 400))
# Mark points
img_with_pts = img.copy()
for i, pt in enumerate(src_pts.astype(int)):
cv2.circle(img_with_pts, tuple(pt), 10, (0, 0, 255), -1)
cv2.putText(img_with_pts, str(i+1), tuple(pt),
cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
cv2.imshow('Original with points', img_with_pts)
cv2.imshow('Warped', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Bird's Eye View¶
import cv2
import numpy as np
def get_birds_eye_view(img, src_pts, width, height):
"""
Create bird's eye view using perspective transformation
Parameters:
- img: Input image
- src_pts: 4 points from source (top-left, top-right, bottom-right, bottom-left)
- width, height: Output image size
"""
dst_pts = np.float32([
[0, 0],
[width, 0],
[width, height],
[0, height]
])
M = cv2.getPerspectiveTransform(src_pts, dst_pts)
warped = cv2.warpPerspective(img, M, (width, height))
return warped, M
# Example for lane detection
img = cv2.imread('road.jpg')
h, w = img.shape[:2]
# 4 points of road area (trapezoid)
road_pts = np.float32([
[w * 0.4, h * 0.6], # Top-left
[w * 0.6, h * 0.6], # Top-right
[w * 0.9, h * 0.95], # Bottom-right
[w * 0.1, h * 0.95] # Bottom-left
])
birds_eye, M = get_birds_eye_view(img, road_pts, 400, 600)
5. Document Correction Example¶
Automated Document Scan Pipeline¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Document Scan Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Input image β
β β β
β βΌ β
β Preprocessing (grayscale, blur, edge) β
β β β
β βΌ β
β Contour detection (findContours) β
β β β
β βΌ β
β Rectangle detection (approximate to 4 points with approxPolyDP)β
β β β
β βΌ β
β Order corners (top-left, top-right, bottom-right, bottom-left)β
β β β
β βΌ β
β Perspective transformation (warpPerspective) β
β β β
β βΌ β
β Post-processing (binarization, sharpening) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation Code¶
import cv2
import numpy as np
def order_points(pts):
"""Order 4 points: top-left, top-right, bottom-right, bottom-left"""
rect = np.zeros((4, 2), dtype=np.float32)
# Point with smallest sum = top-left
# Point with largest sum = bottom-right
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
# Point with smallest difference = top-right
# Point with largest difference = bottom-left
d = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(d)]
rect[3] = pts[np.argmax(d)]
return rect
def four_point_transform(img, pts):
"""Perspective transformation using 4 points"""
rect = order_points(pts)
(tl, tr, br, bl) = rect
# Calculate width of new image
width_top = np.linalg.norm(tr - tl)
width_bottom = np.linalg.norm(br - bl)
max_width = int(max(width_top, width_bottom))
# Calculate height of new image
height_left = np.linalg.norm(bl - tl)
height_right = np.linalg.norm(br - tr)
max_height = int(max(height_left, height_right))
# Destination points
dst = np.float32([
[0, 0],
[max_width - 1, 0],
[max_width - 1, max_height - 1],
[0, max_height - 1]
])
# Perspective transformation
M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(img, M, (max_width, max_height))
return warped
def find_document(img):
"""Automatically detect document region in image"""
# Preprocessing
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(blurred, 75, 200)
# Contour detection
contours, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)[:5]
doc_contour = None
for contour in contours:
# Approximate contour
peri = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
# Consider as document if 4 points
if len(approx) == 4:
doc_contour = approx
break
return doc_contour
def scan_document(img):
"""Main document scan function"""
# Save original size
orig = img.copy()
ratio = img.shape[0] / 500.0
# Resize (improve processing speed)
img = cv2.resize(img, (int(img.shape[1] / ratio), 500))
# Detect document
doc_contour = find_document(img)
if doc_contour is None:
print("Document not found.")
return None
# Adjust coordinates to original size
doc_contour = doc_contour.reshape(4, 2) * ratio
# Perspective transformation
warped = four_point_transform(orig, doc_contour)
# Post-processing (optional)
# Grayscale + adaptive binarization
warped_gray = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
warped_binary = cv2.adaptiveThreshold(
warped_gray, 255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 11, 10
)
return warped, warped_binary
# Usage example
img = cv2.imread('document_photo.jpg')
result_color, result_binary = scan_document(img)
if result_color is not None:
cv2.imshow('Original', img)
cv2.imshow('Scanned (Color)', result_color)
cv2.imshow('Scanned (Binary)', result_binary)
cv2.waitKey(0)
cv2.destroyAllWindows()
Manual 4-Point Selection (Mouse Click)¶
import cv2
import numpy as np
points = []
def click_event(event, x, y, flags, param):
global points
if event == cv2.EVENT_LBUTTONDOWN:
if len(points) < 4:
points.append([x, y])
cv2.circle(param, (x, y), 5, (0, 0, 255), -1)
cv2.imshow('Select 4 corners', param)
if len(points) == 4:
print("4 points selected! Press 's' to transform.")
def manual_perspective_transform(img):
"""Select 4 points with mouse for perspective transformation"""
global points
points = []
img_display = img.copy()
cv2.imshow('Select 4 corners', img_display)
cv2.setMouseCallback('Select 4 corners', click_event, img_display)
print("Click 4 corners of document clockwise (starting from top-left)")
while True:
key = cv2.waitKey(1) & 0xFF
if key == ord('s') and len(points) == 4:
break
elif key == ord('r'): # Reset
points = []
img_display = img.copy()
cv2.imshow('Select 4 corners', img_display)
elif key == 27: # ESC
cv2.destroyAllWindows()
return None
cv2.destroyAllWindows()
pts = np.array(points, dtype=np.float32)
result = four_point_transform(img, pts)
return result
# Usage example
img = cv2.imread('document.jpg')
result = manual_perspective_transform(img)
if result is not None:
cv2.imshow('Result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
6. Practice Problems¶
Exercise 1: Batch Resize¶
Write a script that resizes all images in a folder to 800px width (maintaining aspect ratio) and saves them as JPEG with 90% quality.
# Hint
import os
import glob
def batch_resize(input_folder, output_folder, max_width=800):
# Use os.listdir or glob.glob
pass
Exercise 2: Image Rotation Animation¶
Write a program that shows an animation rotating an image from 0 to 360 degrees in 5-degree increments. Expand the canvas so the image doesn't get cropped.
Exercise 3: ID Card Scanner¶
Implement an ID card scanner with the following features: 1. Select 4 points with mouse 2. Generate front view with perspective transformation 3. Output in standard ID card size (85.6mm x 54mm) ratio
Exercise 4: Image Mosaic¶
Write a function that arranges multiple images in an N x M grid. Each image should be resized to the same size.
def create_mosaic(images, rows, cols, cell_size=(200, 200)):
"""Arrange images in rows x cols grid"""
pass
Exercise 5: AR Card Effect¶
Implement a simple AR effect that detects a rectangular card in an image and overlays another image on top of it.
# Hint: Use reverse perspective transformation
# 1. Detect card region
# 2. Transform overlay image to fit card region
# 3. Composite with original
7. Next Steps¶
In 05_Image_Filtering.md, you'll learn image filtering techniques including blur, sharpening, and custom filters!
Next topics: - Kernel and convolution concepts - Blur filters (average, Gaussian, median, bilateral) - Sharpening filters - Custom filters (filter2D)
8. References¶
Official Documentation¶
Related Learning Materials¶
| Folder | Related Content |
|---|---|
| 03_Color_Spaces.md | Color conversion, edge detection preprocessing |
| 09_Contours.md | Used for document region detection |