Geometric Transformations

Geometric Transformations

Overview

Geometric transformations are operations that change the spatial position of an image. This includes resizing, rotation, translation, flipping, and perspective transformations. In this document, we'll learn about OpenCV's geometric transformation functions and practical examples.

Difficulty: ⭐⭐ (Beginner-Intermediate)

Learning Objectives: - Understand cv2.resize() and interpolation methods - Use rotation and flipping functions - Apply affine transformation (warpAffine) - Apply perspective transformation (warpPerspective) - Implement document scanning/correction examples


Table of Contents

  1. Image Resizing - resize()
  2. Flipping and Rotation - flip(), rotate()
  3. Affine Transformation - warpAffine()
  4. Perspective Transformation - warpPerspective()
  5. Document Correction Example
  6. Practice Problems
  7. Next Steps
  8. References

1. Image Resizing - resize()

Basic Usage

import cv2

img = cv2.imread('image.jpg')
h, w = img.shape[:2]

# Method 1: Specify size directly (width, height order!)
resized = cv2.resize(img, (640, 480))

# Method 2: Specify by ratio
resized = cv2.resize(img, None, fx=0.5, fy=0.5)  # Reduce to 50%

# Method 3: Maintain aspect ratio based on one dimension
new_width = 800
ratio = new_width / w
new_height = int(h * ratio)
resized = cv2.resize(img, (new_width, new_height))

Interpolation Methods

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       Interpolation Comparison                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Method                   Features              Use Cases      β”‚
β”‚   ───────────────────────────────────────────────────────────   β”‚
β”‚   INTER_NEAREST           Nearest neighbor      Fast, low qual. β”‚
β”‚   (Nearest interpolation) Blocky artifacts      Real-time proc. β”‚
β”‚                                                                β”‚
β”‚   INTER_LINEAR            Linear (default)      Balanced choice β”‚
β”‚   (Bilinear interpolation) Smooth results      General resizing β”‚
β”‚                                                                β”‚
β”‚   INTER_AREA              Area interpolation    Best for shrink β”‚
β”‚   (Area-based)             Prevents moirΓ©       Downsampling    β”‚
β”‚                                                                β”‚
β”‚   INTER_CUBIC             Cubic interpolation   Good for enlarg.β”‚
β”‚   (Bicubic)                Smooth and sharp     Quality focus   β”‚
β”‚                                                                β”‚
β”‚   INTER_LANCZOS4          Lanczos interpolation Best quality    β”‚
β”‚   (8x8 neighbors)          Sharpest            Slow speed      β”‚
β”‚                                                                β”‚
β”‚   Recommendations:                                              β”‚
β”‚   - Shrinking: INTER_AREA                                       β”‚
β”‚   - Enlarging: INTER_CUBIC or INTER_LANCZOS4                    β”‚
β”‚   - Real-time: INTER_LINEAR or INTER_NEAREST                    β”‚
β”‚                                                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Interpolation Comparison Example

import cv2
import matplotlib.pyplot as plt

img = cv2.imread('image.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Shrink first, then enlarge to compare differences
small = cv2.resize(img, None, fx=0.1, fy=0.1)  # Reduce to 10%

interpolations = [
    ('NEAREST', cv2.INTER_NEAREST),
    ('LINEAR', cv2.INTER_LINEAR),
    ('AREA', cv2.INTER_AREA),
    ('CUBIC', cv2.INTER_CUBIC),
    ('LANCZOS4', cv2.INTER_LANCZOS4),
]

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

axes[0].imshow(img)
axes[0].set_title('Original')

for ax, (name, interp) in zip(axes[1:], interpolations):
    enlarged = cv2.resize(small, img.shape[:2][::-1], interpolation=interp)
    ax.imshow(enlarged)
    ax.set_title(f'{name}')

for ax in axes:
    ax.axis('off')

plt.tight_layout()
plt.show()

Aspect Ratio-Preserving Resize Functions

import cv2

def resize_with_aspect_ratio(img, width=None, height=None, inter=cv2.INTER_AREA):
    """Resize while maintaining aspect ratio"""
    h, w = img.shape[:2]

    if width is None and height is None:
        return img

    if width is None:
        ratio = height / h
        new_size = (int(w * ratio), height)
    else:
        ratio = width / w
        new_size = (width, int(h * ratio))

    return cv2.resize(img, new_size, interpolation=inter)


def resize_to_fit(img, max_width, max_height, inter=cv2.INTER_AREA):
    """Fit within maximum size while maintaining aspect ratio"""
    h, w = img.shape[:2]

    ratio_w = max_width / w
    ratio_h = max_height / h
    ratio = min(ratio_w, ratio_h)

    if ratio >= 1:  # Already small enough
        return img

    new_size = (int(w * ratio), int(h * ratio))
    return cv2.resize(img, new_size, interpolation=inter)


# Usage example
img = cv2.imread('large_image.jpg')
img_fit = resize_to_fit(img, 800, 600)
img_width = resize_with_aspect_ratio(img, width=640)

2. Flipping and Rotation - flip(), rotate()

cv2.flip()

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        flip() Operation                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   flipCode = 1 (horizontal)   flipCode = 0 (vertical)  flipCode = -1β”‚
β”‚                                                                 β”‚
β”‚   Original  Result            Original  Result         Original Resultβ”‚
β”‚   β”Œβ”€β”€β”€β”   β”Œβ”€β”€β”€β”           β”Œβ”€β”€β”€β”   β”Œβ”€β”€β”€β”         β”Œβ”€β”€β”€β”  β”Œβ”€β”€β”€β”  β”‚
β”‚   β”‚1 2β”‚   β”‚2 1β”‚           β”‚1 2β”‚   β”‚3 4β”‚         β”‚1 2β”‚  β”‚4 3β”‚  β”‚
β”‚   β”‚3 4β”‚   β”‚4 3β”‚           β”‚3 4β”‚   β”‚1 2β”‚         β”‚3 4β”‚  β”‚2 1β”‚  β”‚
β”‚   β””β”€β”€β”€β”˜   β””β”€β”€β”€β”˜           β””β”€β”€β”€β”˜   β””β”€β”€β”€β”˜         β””β”€β”€β”€β”˜  β””β”€β”€β”€β”˜  β”‚
β”‚                                                                 β”‚
β”‚   Left-right flip         Top-bottom flip         Both flips   β”‚
β”‚   (Mirror effect)         (Water reflection)      (180Β° rotation)β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
import cv2

img = cv2.imread('image.jpg')

# Horizontal flip (left-right)
flipped_h = cv2.flip(img, 1)

# Vertical flip (top-bottom)
flipped_v = cv2.flip(img, 0)

# Both directions (equivalent to 180Β° rotation)
flipped_both = cv2.flip(img, -1)

# Also possible with NumPy
import numpy as np
flipped_h_np = img[:, ::-1]      # Horizontal
flipped_v_np = img[::-1, :]      # Vertical
flipped_both_np = img[::-1, ::-1]  # Both

cv2.rotate()

import cv2

img = cv2.imread('image.jpg')

# 90 degrees clockwise
rotated_90_cw = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)

# 90 degrees counter-clockwise
rotated_90_ccw = cv2.rotate(img, cv2.ROTATE_90_COUNTERCLOCKWISE)

# 180 degrees
rotated_180 = cv2.rotate(img, cv2.ROTATE_180)

# Check image size changes
print(f"Original: {img.shape}")           # (H, W, C)
print(f"90Β°: {rotated_90_cw.shape}") # (W, H, C) - swapped
print(f"180Β°: {rotated_180.shape}")  # (H, W, C) - same

Arbitrary Angle Rotation

import cv2

def rotate_image(img, angle, center=None, scale=1.0):
    """Rotate image by arbitrary angle"""
    h, w = img.shape[:2]

    if center is None:
        center = (w // 2, h // 2)

    # Create rotation matrix
    M = cv2.getRotationMatrix2D(center, angle, scale)

    # Apply rotation
    rotated = cv2.warpAffine(img, M, (w, h))

    return rotated


def rotate_image_full(img, angle):
    """Rotate image without cropping (expand canvas)"""
    h, w = img.shape[:2]
    center = (w // 2, h // 2)

    # Rotation matrix
    M = cv2.getRotationMatrix2D(center, angle, 1.0)

    # Calculate new bounds after rotation
    cos = abs(M[0, 0])
    sin = abs(M[0, 1])
    new_w = int(h * sin + w * cos)
    new_h = int(h * cos + w * sin)

    # Adjust translation
    M[0, 2] += (new_w - w) / 2
    M[1, 2] += (new_h - h) / 2

    rotated = cv2.warpAffine(img, M, (new_w, new_h))

    return rotated


# Usage examples
img = cv2.imread('image.jpg')
rotated_30 = rotate_image(img, 30)       # 30Β° rotation (partially cropped)
rotated_45_full = rotate_image_full(img, 45)  # 45Β° rotation (fully preserved)

3. Affine Transformation - warpAffine()

What is Affine Transformation?

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Affine Transformation                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Affine transformation preserves lines as lines and parallel   β”‚
β”‚   lines as parallel lines                                       β”‚
β”‚                                                                 β”‚
β”‚   Included transformations:                                     β”‚
β”‚   - Translation                                                 β”‚
β”‚   - Rotation                                                    β”‚
β”‚   - Scale                                                       β”‚
β”‚   - Shear                                                       β”‚
β”‚                                                                 β”‚
β”‚   Transformation matrix (2x3):                                  β”‚
β”‚   β”Œ         ┐   β”Œ                    ┐                         β”‚
β”‚   β”‚ a  b  txβ”‚   β”‚ scale*cos  -sin  txβ”‚                         β”‚
β”‚   β”‚ c  d  tyβ”‚ = β”‚ sin   scale*cos  tyβ”‚                         β”‚
β”‚   β””         β”˜   β””                    β”˜                         β”‚
β”‚                                                                 β”‚
β”‚   [x']   [a b tx]   [x]                                         β”‚
β”‚   [y'] = [c d ty] Γ— [y]                                         β”‚
β”‚                     [1]                                         β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Translation

import cv2
import numpy as np

img = cv2.imread('image.jpg')
h, w = img.shape[:2]

# Translation matrix: move 100 in x, 50 in y
tx, ty = 100, 50
M = np.float32([
    [1, 0, tx],
    [0, 1, ty]
])

translated = cv2.warpAffine(img, M, (w, h))

cv2.imshow('Original', img)
cv2.imshow('Translated', translated)
cv2.waitKey(0)
cv2.destroyAllWindows()

Rotation + Scale

import cv2

img = cv2.imread('image.jpg')
h, w = img.shape[:2]

# getRotationMatrix2D(center, angle, scale)
center = (w // 2, h // 2)
angle = 45  # 45 degrees counter-clockwise
scale = 0.7  # 70% size

M = cv2.getRotationMatrix2D(center, angle, scale)

rotated = cv2.warpAffine(img, M, (w, h))

Shear Transformation

import cv2
import numpy as np

img = cv2.imread('image.jpg')
h, w = img.shape[:2]

# Horizontal shear
shear_x = 0.3
M_shear_x = np.float32([
    [1, shear_x, 0],
    [0, 1, 0]
])
sheared_x = cv2.warpAffine(img, M_shear_x, (int(w + h * shear_x), h))

# Vertical shear
shear_y = 0.3
M_shear_y = np.float32([
    [1, 0, 0],
    [shear_y, 1, 0]
])
sheared_y = cv2.warpAffine(img, M_shear_y, (w, int(h + w * shear_y)))

Affine Transformation Using 3 Points

import cv2
import numpy as np

img = cv2.imread('image.jpg')
h, w = img.shape[:2]

# 3 points from source
src_pts = np.float32([
    [0, 0],      # Top-left
    [w-1, 0],    # Top-right
    [0, h-1]     # Bottom-left
])

# 3 points after transformation
dst_pts = np.float32([
    [50, 50],    # Top-left
    [w-50, 30],  # Top-right
    [30, h-50]   # Bottom-left
])

# Calculate affine transformation matrix
M = cv2.getAffineTransform(src_pts, dst_pts)

# Apply transformation
result = cv2.warpAffine(img, M, (w, h))

# Mark points
for pt in src_pts.astype(int):
    cv2.circle(img, tuple(pt), 5, (0, 0, 255), -1)

for pt in dst_pts.astype(int):
    cv2.circle(result, tuple(pt), 5, (0, 255, 0), -1)

4. Perspective Transformation - warpPerspective()

What is Perspective Transformation?

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       Perspective Transformation                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Perspective transformation converts trapezoid to rectangle    β”‚
β”‚   (or vice versa). Transforms images captured in 3D space as if β”‚
β”‚   viewed from the front                                         β”‚
β”‚                                                                 β”‚
β”‚   Practical applications:                                       β”‚
β”‚   - Document scanning (tilted document β†’ front view)            β”‚
β”‚   - Lane detection (Bird's eye view)                            β”‚
β”‚   - QR code recognition                                         β”‚
β”‚   - Image rectification                                         β”‚
β”‚                                                                 β”‚
β”‚   Transformation matrix (3x3):                                  β”‚
β”‚   β”Œ             ┐                                               β”‚
β”‚   β”‚ h11 h12 h13 β”‚                                               β”‚
β”‚   β”‚ h21 h22 h23 β”‚                                               β”‚
β”‚   β”‚ h31 h32 h33 β”‚                                               β”‚
β”‚   β””             β”˜                                               β”‚
β”‚                                                                 β”‚
β”‚   Source (trapezoid)           Result (rectangle)               β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚   β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚         β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚                   β”‚
β”‚   β”‚ β”‚         β”‚ β”‚   ───▢  β”‚ β”‚             β”‚ β”‚                   β”‚
β”‚   β”‚ β”‚ Documentβ”‚ β”‚         β”‚ β”‚   Document  β”‚ β”‚                   β”‚
β”‚   β”‚ β”‚         β”‚ β”‚         β”‚ β”‚             β”‚ β”‚                   β”‚
β”‚   β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚         β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚                   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Perspective Transformation Using 4 Points

import cv2
import numpy as np

img = cv2.imread('tilted_document.jpg')
h, w = img.shape[:2]

# 4 points from source (document corners)
src_pts = np.float32([
    [100, 50],    # Top-left
    [500, 80],    # Top-right
    [550, 400],   # Bottom-right
    [50, 380]     # Bottom-left
])

# 4 points after transformation (front-facing rectangle)
dst_pts = np.float32([
    [0, 0],
    [500, 0],
    [500, 400],
    [0, 400]
])

# Calculate perspective transformation matrix
M = cv2.getPerspectiveTransform(src_pts, dst_pts)

# Apply transformation
result = cv2.warpPerspective(img, M, (500, 400))

# Mark points
img_with_pts = img.copy()
for i, pt in enumerate(src_pts.astype(int)):
    cv2.circle(img_with_pts, tuple(pt), 10, (0, 0, 255), -1)
    cv2.putText(img_with_pts, str(i+1), tuple(pt),
                cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)

cv2.imshow('Original with points', img_with_pts)
cv2.imshow('Warped', result)
cv2.waitKey(0)
cv2.destroyAllWindows()

Bird's Eye View

import cv2
import numpy as np

def get_birds_eye_view(img, src_pts, width, height):
    """
    Create bird's eye view using perspective transformation

    Parameters:
    - img: Input image
    - src_pts: 4 points from source (top-left, top-right, bottom-right, bottom-left)
    - width, height: Output image size
    """
    dst_pts = np.float32([
        [0, 0],
        [width, 0],
        [width, height],
        [0, height]
    ])

    M = cv2.getPerspectiveTransform(src_pts, dst_pts)
    warped = cv2.warpPerspective(img, M, (width, height))

    return warped, M


# Example for lane detection
img = cv2.imread('road.jpg')
h, w = img.shape[:2]

# 4 points of road area (trapezoid)
road_pts = np.float32([
    [w * 0.4, h * 0.6],   # Top-left
    [w * 0.6, h * 0.6],   # Top-right
    [w * 0.9, h * 0.95],  # Bottom-right
    [w * 0.1, h * 0.95]   # Bottom-left
])

birds_eye, M = get_birds_eye_view(img, road_pts, 400, 600)

5. Document Correction Example

Automated Document Scan Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Document Scan Pipeline                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Input image                                                   β”‚
β”‚       β”‚                                                         β”‚
β”‚       β–Ό                                                         β”‚
β”‚   Preprocessing (grayscale, blur, edge)                         β”‚
β”‚       β”‚                                                         β”‚
β”‚       β–Ό                                                         β”‚
β”‚   Contour detection (findContours)                              β”‚
β”‚       β”‚                                                         β”‚
β”‚       β–Ό                                                         β”‚
β”‚   Rectangle detection (approximate to 4 points with approxPolyDP)β”‚
β”‚       β”‚                                                         β”‚
β”‚       β–Ό                                                         β”‚
β”‚   Order corners (top-left, top-right, bottom-right, bottom-left)β”‚
β”‚       β”‚                                                         β”‚
β”‚       β–Ό                                                         β”‚
β”‚   Perspective transformation (warpPerspective)                  β”‚
β”‚       β”‚                                                         β”‚
β”‚       β–Ό                                                         β”‚
β”‚   Post-processing (binarization, sharpening)                    β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Code

import cv2
import numpy as np

def order_points(pts):
    """Order 4 points: top-left, top-right, bottom-right, bottom-left"""
    rect = np.zeros((4, 2), dtype=np.float32)

    # Point with smallest sum = top-left
    # Point with largest sum = bottom-right
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]

    # Point with smallest difference = top-right
    # Point with largest difference = bottom-left
    d = np.diff(pts, axis=1)
    rect[1] = pts[np.argmin(d)]
    rect[3] = pts[np.argmax(d)]

    return rect


def four_point_transform(img, pts):
    """Perspective transformation using 4 points"""
    rect = order_points(pts)
    (tl, tr, br, bl) = rect

    # Calculate width of new image
    width_top = np.linalg.norm(tr - tl)
    width_bottom = np.linalg.norm(br - bl)
    max_width = int(max(width_top, width_bottom))

    # Calculate height of new image
    height_left = np.linalg.norm(bl - tl)
    height_right = np.linalg.norm(br - tr)
    max_height = int(max(height_left, height_right))

    # Destination points
    dst = np.float32([
        [0, 0],
        [max_width - 1, 0],
        [max_width - 1, max_height - 1],
        [0, max_height - 1]
    ])

    # Perspective transformation
    M = cv2.getPerspectiveTransform(rect, dst)
    warped = cv2.warpPerspective(img, M, (max_width, max_height))

    return warped


def find_document(img):
    """Automatically detect document region in image"""
    # Preprocessing
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edged = cv2.Canny(blurred, 75, 200)

    # Contour detection
    contours, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL,
                                    cv2.CHAIN_APPROX_SIMPLE)
    contours = sorted(contours, key=cv2.contourArea, reverse=True)[:5]

    doc_contour = None
    for contour in contours:
        # Approximate contour
        peri = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.02 * peri, True)

        # Consider as document if 4 points
        if len(approx) == 4:
            doc_contour = approx
            break

    return doc_contour


def scan_document(img):
    """Main document scan function"""
    # Save original size
    orig = img.copy()
    ratio = img.shape[0] / 500.0

    # Resize (improve processing speed)
    img = cv2.resize(img, (int(img.shape[1] / ratio), 500))

    # Detect document
    doc_contour = find_document(img)

    if doc_contour is None:
        print("Document not found.")
        return None

    # Adjust coordinates to original size
    doc_contour = doc_contour.reshape(4, 2) * ratio

    # Perspective transformation
    warped = four_point_transform(orig, doc_contour)

    # Post-processing (optional)
    # Grayscale + adaptive binarization
    warped_gray = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
    warped_binary = cv2.adaptiveThreshold(
        warped_gray, 255,
        cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
        cv2.THRESH_BINARY, 11, 10
    )

    return warped, warped_binary


# Usage example
img = cv2.imread('document_photo.jpg')
result_color, result_binary = scan_document(img)

if result_color is not None:
    cv2.imshow('Original', img)
    cv2.imshow('Scanned (Color)', result_color)
    cv2.imshow('Scanned (Binary)', result_binary)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

Manual 4-Point Selection (Mouse Click)

import cv2
import numpy as np

points = []

def click_event(event, x, y, flags, param):
    global points
    if event == cv2.EVENT_LBUTTONDOWN:
        if len(points) < 4:
            points.append([x, y])
            cv2.circle(param, (x, y), 5, (0, 0, 255), -1)
            cv2.imshow('Select 4 corners', param)

            if len(points) == 4:
                print("4 points selected! Press 's' to transform.")


def manual_perspective_transform(img):
    """Select 4 points with mouse for perspective transformation"""
    global points
    points = []

    img_display = img.copy()
    cv2.imshow('Select 4 corners', img_display)
    cv2.setMouseCallback('Select 4 corners', click_event, img_display)

    print("Click 4 corners of document clockwise (starting from top-left)")

    while True:
        key = cv2.waitKey(1) & 0xFF
        if key == ord('s') and len(points) == 4:
            break
        elif key == ord('r'):  # Reset
            points = []
            img_display = img.copy()
            cv2.imshow('Select 4 corners', img_display)
        elif key == 27:  # ESC
            cv2.destroyAllWindows()
            return None

    cv2.destroyAllWindows()

    pts = np.array(points, dtype=np.float32)
    result = four_point_transform(img, pts)

    return result


# Usage example
img = cv2.imread('document.jpg')
result = manual_perspective_transform(img)

if result is not None:
    cv2.imshow('Result', result)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

6. Practice Problems

Exercise 1: Batch Resize

Write a script that resizes all images in a folder to 800px width (maintaining aspect ratio) and saves them as JPEG with 90% quality.

# Hint
import os
import glob

def batch_resize(input_folder, output_folder, max_width=800):
    # Use os.listdir or glob.glob
    pass

Exercise 2: Image Rotation Animation

Write a program that shows an animation rotating an image from 0 to 360 degrees in 5-degree increments. Expand the canvas so the image doesn't get cropped.

Exercise 3: ID Card Scanner

Implement an ID card scanner with the following features: 1. Select 4 points with mouse 2. Generate front view with perspective transformation 3. Output in standard ID card size (85.6mm x 54mm) ratio

Exercise 4: Image Mosaic

Write a function that arranges multiple images in an N x M grid. Each image should be resized to the same size.

def create_mosaic(images, rows, cols, cell_size=(200, 200)):
    """Arrange images in rows x cols grid"""
    pass

Exercise 5: AR Card Effect

Implement a simple AR effect that detects a rectangular card in an image and overlays another image on top of it.

# Hint: Use reverse perspective transformation
# 1. Detect card region
# 2. Transform overlay image to fit card region
# 3. Composite with original

7. Next Steps

In 05_Image_Filtering.md, you'll learn image filtering techniques including blur, sharpening, and custom filters!

Next topics: - Kernel and convolution concepts - Blur filters (average, Gaussian, median, bilateral) - Sharpening filters - Custom filters (filter2D)


8. References

Official Documentation

Folder Related Content
03_Color_Spaces.md Color conversion, edge detection preprocessing
09_Contours.md Used for document region detection

Additional References

to navigate between lessons