Image Basic Operations
Image Basic Operations¶
Overview¶
The foundation of image processing is reading, displaying, and saving image files. This document covers OpenCV's basic I/O functions, pixel-level access, and Region of Interest (ROI) setup.
Difficulty: β (Beginner)
Learning Objectives:
- Master cv2.imread(), cv2.imshow(), cv2.imwrite() functions
- Understand and utilize IMREAD flags
- Understand image coordinate system (y, x order)
- Access and modify pixels
- Set ROI (Region of Interest) and copy images
Table of Contents¶
- Reading Images - imread()
- Displaying Images - imshow()
- Saving Images - imwrite()
- Checking Image Properties
- Coordinate System and Pixel Access
- ROI and Image Copying
- Practice Problems
- Next Steps
- References
1. Reading Images - imread()¶
Basic Usage¶
import cv2
# Basic usage (read as color)
img = cv2.imread('image.jpg')
# Check if read failed (always do this!)
if img is None:
print("Error: Cannot read image.")
else:
print(f"Image loaded successfully: {img.shape}")
IMREAD Flags¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β IMREAD Flag Comparison β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Original Image (PNG with alpha channel) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β R G B A β R G B A β R G B A β β
β β 255 100 50 200 β 255 100 50 200 β 255 100 50 200 β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββΌββββββββββββββββββ β
β βΌ βΌ βΌ β
β β
β IMREAD_COLOR IMREAD_GRAYSCALE IMREAD_UNCHANGED β
β βββββββββββββ βββββββββββββ βββββββββββββββββ β
β β B G R β β Gray β β B G R A β β
β β 50 100 255β β 123 β β 50 100 255 200β β
β βββββββββββββ βββββββββββββ βββββββββββββββββ β
β shape: (H,W,3) shape: (H,W) shape: (H,W,4) β
β 3-channel BGR 2D, single value Alpha channel preserved β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Flag Details¶
import cv2
# 1. IMREAD_COLOR (default, 1)
# - Read as color (ignore alpha channel)
# - Always converts to 3-channel BGR
img_color = cv2.imread('image.png', cv2.IMREAD_COLOR)
img_color = cv2.imread('image.png', 1) # Same
img_color = cv2.imread('image.png') # Can omit (default)
# 2. IMREAD_GRAYSCALE (0)
# - Read as grayscale
# - Returns 2D array
img_gray = cv2.imread('image.png', cv2.IMREAD_GRAYSCALE)
img_gray = cv2.imread('image.png', 0) # Same
# 3. IMREAD_UNCHANGED (-1)
# - Read as original (including alpha channel)
# - Use when PNG transparency information is needed
img_unchanged = cv2.imread('image.png', cv2.IMREAD_UNCHANGED)
img_unchanged = cv2.imread('image.png', -1) # Same
# Compare results
print(f"COLOR: {img_color.shape}") # (H, W, 3)
print(f"GRAYSCALE: {img_gray.shape}") # (H, W)
print(f"UNCHANGED: {img_unchanged.shape}") # (H, W, 4) - for PNG
Additional Flags¶
import cv2
# IMREAD_ANYDEPTH: Load 16-bit/32-bit images as is
img_depth = cv2.imread('depth_map.png', cv2.IMREAD_ANYDEPTH)
# IMREAD_ANYCOLOR: Maintain possible color formats
img_any = cv2.imread('image.jpg', cv2.IMREAD_ANYCOLOR)
# Combining flags
# 16-bit grayscale + maintain color format
img_combined = cv2.imread('image.tiff',
cv2.IMREAD_ANYDEPTH | cv2.IMREAD_ANYCOLOR)
Various Image Formats¶
import cv2
# Supported major formats
formats = [
'image.jpg', # JPEG
'image.png', # PNG (alpha channel supported)
'image.bmp', # BMP
'image.tiff', # TIFF
'image.webp', # WebP
'image.ppm', # PPM/PGM/PBM
]
# Read by format
for filepath in formats:
img = cv2.imread(filepath)
if img is not None:
print(f"{filepath}: {img.shape}")
2. Displaying Images - imshow()¶
Basic Usage¶
import cv2
img = cv2.imread('image.jpg')
# Display image in window
cv2.imshow('Window Name', img)
# Wait for key press
key = cv2.waitKey(0) # 0 = wait indefinitely
# Close all windows
cv2.destroyAllWindows()
waitKey() Details¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β waitKey() Behavior β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β cv2.waitKey(delay) β
β β
β delay = 0 β Wait indefinitely until key press β
β delay > 0 β Wait delay milliseconds then proceed β
β delay = 1 β Minimum wait (often used for video playback) β
β β
β Return value: ASCII code of pressed key (-1 = timeout) β
β β
β Examples: β
β key = cv2.waitKey(0) β
β if key == 27: # ESC key β
β break β
β elif key == ord('q'): # 'q' key β
β break β
β elif key == ord('s'): # 's' key β
β cv2.imwrite('saved.jpg', img) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Managing Multiple Windows¶
import cv2
img1 = cv2.imread('image1.jpg')
img2 = cv2.imread('image2.jpg')
# Display multiple windows
cv2.imshow('Image 1', img1)
cv2.imshow('Image 2', img2)
# Set window position
cv2.namedWindow('Positioned', cv2.WINDOW_NORMAL)
cv2.moveWindow('Positioned', 100, 100) # x=100, y=100 position
cv2.imshow('Positioned', img1)
# Make window resizable
cv2.namedWindow('Resizable', cv2.WINDOW_NORMAL)
cv2.resizeWindow('Resizable', 800, 600)
cv2.imshow('Resizable', img1)
cv2.waitKey(0)
# Close specific window
cv2.destroyWindow('Image 1')
# Close all windows
cv2.destroyAllWindows()
Key Input Handling Pattern¶
import cv2
img = cv2.imread('image.jpg')
original = img.copy()
while True:
cv2.imshow('Interactive', img)
key = cv2.waitKey(1) & 0xFF # Use only lower 8 bits
if key == 27: # ESC
break
elif key == ord('r'): # 'r' - restore original
img = original.copy()
print("Restored to original")
elif key == ord('g'): # 'g' - grayscale
img = cv2.cvtColor(original, cv2.COLOR_BGR2GRAY)
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
print("Applied grayscale")
elif key == ord('s'): # 's' - save
cv2.imwrite('output.jpg', img)
print("Saved")
cv2.destroyAllWindows()
Displaying Images in Jupyter Notebook¶
import cv2
import matplotlib.pyplot as plt
img = cv2.imread('image.jpg')
# Using matplotlib (need BGR β RGB conversion)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10, 6))
plt.imshow(img_rgb)
plt.title('Image Display in Jupyter')
plt.axis('off')
plt.show()
# Display multiple images simultaneously
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].imshow(img_rgb)
axes[0].set_title('Original')
axes[0].axis('off')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
axes[1].imshow(gray, cmap='gray')
axes[1].set_title('Grayscale')
axes[1].axis('off')
# Split B, G, R channels
b, g, r = cv2.split(img)
axes[2].imshow(r, cmap='gray')
axes[2].set_title('Red Channel')
axes[2].axis('off')
plt.tight_layout()
plt.show()
3. Saving Images - imwrite()¶
Basic Usage¶
import cv2
img = cv2.imread('input.jpg')
# Basic save
success = cv2.imwrite('output.jpg', img)
if success:
print("Save successful!")
else:
print("Save failed!")
# Save with format conversion
cv2.imwrite('output.png', img) # JPEG β PNG
cv2.imwrite('output.bmp', img) # JPEG β BMP
Setting Compression Quality¶
import cv2
img = cv2.imread('input.jpg')
# JPEG quality (0-100, default 95)
cv2.imwrite('high_quality.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 95])
cv2.imwrite('low_quality.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 30])
# PNG compression level (0-9, default 3)
# 0 = no compression (fast, large file)
# 9 = maximum compression (slow, small file)
cv2.imwrite('fast_compress.png', img, [cv2.IMWRITE_PNG_COMPRESSION, 0])
cv2.imwrite('max_compress.png', img, [cv2.IMWRITE_PNG_COMPRESSION, 9])
# WebP quality (0-100)
cv2.imwrite('output.webp', img, [cv2.IMWRITE_WEBP_QUALITY, 80])
Comparing File Sizes¶
import cv2
import os
img = cv2.imread('input.jpg')
# Save with various qualities
qualities = [10, 30, 50, 70, 90]
for q in qualities:
filename = f'quality_{q}.jpg'
cv2.imwrite(filename, img, [cv2.IMWRITE_JPEG_QUALITY, q])
size_kb = os.path.getsize(filename) / 1024
print(f"Quality {q}: {size_kb:.1f} KB")
4. Checking Image Properties¶
shape, dtype, size¶
import cv2
img = cv2.imread('image.jpg')
# shape: (height, width, channels)
print(f"Shape: {img.shape}")
height, width, channels = img.shape
print(f"Height: {height}px")
print(f"Width: {width}px")
print(f"Channels: {channels}")
# dtype: data type
print(f"Data type: {img.dtype}") # uint8
# size: total number of elements
print(f"Total elements: {img.size}") # H * W * C
# Grayscale image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
print(f"Gray Shape: {gray.shape}") # (height, width) - no channels
# Safely check channel count
if len(img.shape) == 3:
h, w, c = img.shape
else:
h, w = img.shape
c = 1
Image Info Utility Function¶
import cv2
import os
def get_image_info(filepath):
"""Returns detailed image file information as dictionary"""
info = {'filepath': filepath}
# Check file exists
if not os.path.exists(filepath):
info['error'] = 'File does not exist'
return info
# File size
info['file_size_kb'] = os.path.getsize(filepath) / 1024
# Load image
img = cv2.imread(filepath, cv2.IMREAD_UNCHANGED)
if img is None:
info['error'] = 'Cannot read image'
return info
# Basic info
info['shape'] = img.shape
info['dtype'] = str(img.dtype)
info['height'] = img.shape[0]
info['width'] = img.shape[1]
info['channels'] = img.shape[2] if len(img.shape) == 3 else 1
# Statistics
info['min_value'] = int(img.min())
info['max_value'] = int(img.max())
info['mean_value'] = float(img.mean())
return info
# Usage example
info = get_image_info('sample.jpg')
for key, value in info.items():
print(f"{key}: {value}")
5. Coordinate System and Pixel Access¶
OpenCV Coordinate System¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenCV Coordinate System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β (0,0) βββββββββββββββββββββββββββββββββΆ x (width, columns) β
β β β
β β βββββββββββββββββββββββββββββ β
β β β (0,0) (1,0) (2,0) ... β β
β β β (0,1) (1,1) (2,1) ... β β
β β β (0,2) (1,2) (2,2) ... β β
β β β ... ... ... ... β β
β β βββββββββββββββββββββββββββββ β
β βΌ β
β y (height, rows) β
β β
β Important! Array indexing: img[y, x] or img[row, column] β
β OpenCV functions: (x, y) order β
β β
β e.g.: img[100, 200] β pixel at y=100, x=200 β
β cv2.circle(img, (200, 100), ...) β at x=200, y=100 β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Pixel Access¶
import cv2
import numpy as np
img = cv2.imread('image.jpg')
# Read single pixel (y, x order!)
pixel = img[100, 200] # position y=100, x=200
print(f"Pixel value (BGR): {pixel}") # [B, G, R]
# Access individual channels
b = img[100, 200, 0] # Blue
g = img[100, 200, 1] # Green
r = img[100, 200, 2] # Red
print(f"B={b}, G={g}, R={r}")
# Grayscale image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
pixel_gray = gray[100, 200] # single value
print(f"Grayscale value: {pixel_gray}")
Modifying Pixels¶
import cv2
import numpy as np
img = cv2.imread('image.jpg')
# Modify single pixel
img[100, 200] = [255, 0, 0] # Change to blue
# Modify region (100x100 region to red)
img[0:100, 0:100] = [0, 0, 255] # Red in BGR
# Modify specific channel only
img[0:100, 100:200, 0] = 0 # Blue channel to 0
img[0:100, 100:200, 1] = 0 # Green channel to 0
img[0:100, 100:200, 2] = 255 # Red channel to 255
cv2.imshow('Modified', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
item() and itemset() (for single pixel, faster)¶
import cv2
img = cv2.imread('image.jpg')
# item(): access single value (faster)
b = img.item(100, 200, 0)
g = img.item(100, 200, 1)
r = img.item(100, 200, 2)
# itemset(): modify single value (faster)
img.itemset((100, 200, 0), 255) # Blue = 255
img.itemset((100, 200, 1), 0) # Green = 0
img.itemset((100, 200, 2), 0) # Red = 0
# Performance comparison
import time
# Regular indexing
start = time.time()
for i in range(10000):
val = img[100, 200, 0]
print(f"Regular indexing: {time.time() - start:.4f}s")
# Using item()
start = time.time()
for i in range(10000):
val = img.item(100, 200, 0)
print(f"item(): {time.time() - start:.4f}s")
6. ROI and Image Copying¶
ROI (Region of Interest)¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ROI Concept β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Original Image (img) β
β ββββββββββββββββββββββββββββββββββββββ β
β β β β
β β y1βββββββββββββββ β β
β β β ROI β β β
β β β β β β
β β β β β β
β β y2βββββββββββββββ β β
β β x1 x2 β β
β β β β
β ββββββββββββββββββββββββββββββββββββββ β
β β
β roi = img[y1:y2, x1:x2] β
β β
β Note: NumPy slicing returns a view! β
β roi modification β original also modified β
β Use .copy() if copy is needed β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Setting and Using ROI¶
import cv2
img = cv2.imread('image.jpg')
# Extract ROI (y1:y2, x1:x2)
# From top-left (100, 50) to bottom-right (300, 250)
roi = img[50:250, 100:300]
print(f"Original size: {img.shape}")
print(f"ROI size: {roi.shape}") # (200, 200, 3)
# Display ROI
cv2.imshow('Original', img)
cv2.imshow('ROI', roi)
cv2.waitKey(0)
cv2.destroyAllWindows()
Copying and Pasting ROI¶
import cv2
img = cv2.imread('image.jpg')
# Copy ROI (important: use .copy())
roi = img[50:150, 100:200].copy()
# Paste to another location
img[200:300, 300:400] = roi # Sizes must match!
# Copy region within image
# Copy top-left 100x100 to bottom-right
src_region = img[0:100, 0:100].copy()
img[-100:, -100:] = src_region
cv2.imshow('Result', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
View vs Copy¶
import cv2
import numpy as np
img = cv2.imread('image.jpg')
original_value = img[100, 100, 0]
# View - shares memory with original
roi_view = img[50:150, 50:150]
roi_view[:] = 0 # Make ROI black
print(f"Original modified: {img[100, 100, 0]}") # 0
# Restore original
img = cv2.imread('image.jpg')
# Copy - independent memory
roi_copy = img[50:150, 50:150].copy()
roi_copy[:] = 0 # Only copy becomes black
print(f"Original preserved: {img[100, 100, 0]}") # Original value
Copying Entire Image¶
import cv2
img = cv2.imread('image.jpg')
# Method 1: .copy() method
img_copy1 = img.copy()
# Method 2: NumPy copy
import numpy as np
img_copy2 = np.copy(img)
# Method 3: Slicing then copy (not recommended)
img_copy3 = img[:].copy()
# Wrong copy (creates view)
img_wrong = img # Same object reference!
img_wrong[0, 0] = [0, 0, 0]
print(f"Original also changed: {img[0, 0]}") # [0, 0, 0]
Practical ROI Examples¶
import cv2
def extract_face_region(img, x, y, w, h):
"""Extract face region (with boundary check)"""
h_img, w_img = img.shape[:2]
# Boundary check
x1 = max(0, x)
y1 = max(0, y)
x2 = min(w_img, x + w)
y2 = min(h_img, y + h)
return img[y1:y2, x1:x2].copy()
def apply_mosaic(img, x, y, w, h, ratio=0.1):
"""Apply mosaic to specific region"""
roi = img[y:y+h, x:x+w]
# Shrink then enlarge (mosaic effect)
small = cv2.resize(roi, None, fx=ratio, fy=ratio,
interpolation=cv2.INTER_NEAREST)
mosaic = cv2.resize(small, (w, h),
interpolation=cv2.INTER_NEAREST)
img[y:y+h, x:x+w] = mosaic
return img
# Usage example
img = cv2.imread('image.jpg')
img = apply_mosaic(img, 100, 100, 200, 200, ratio=0.05)
cv2.imshow('Mosaic', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
7. Practice Problems¶
Exercise 1: Compare Image Reading Modes¶
Read one image in three modes (COLOR, GRAYSCALE, UNCHANGED) and compare their shapes. Test with both PNG (with transparency) and JPEG files.
# Hint
import cv2
filepath = 'test.png'
# Read in COLOR, GRAYSCALE, UNCHANGED
# Compare shapes
Exercise 2: Image Quality Analyzer¶
Save a JPEG image at various qualities (10, 30, 50, 70, 90) and calculate file size and PSNR (Peak Signal-to-Noise Ratio) for each.
# Hint: PSNR calculation
def calculate_psnr(original, compressed):
mse = np.mean((original.astype(float) - compressed.astype(float)) ** 2)
if mse == 0:
return float('inf')
max_pixel = 255.0
psnr = 20 * np.log10(max_pixel / np.sqrt(mse))
return psnr
Exercise 3: Create Color Grid¶
Create a 400x400 image and divide it into 16 cells of 100x100 size, filling each with a different color. Use ROI.
ββββββ¬βββββ¬βββββ¬βββββ
βRed βYellβGrenβCyanβ
ββββββΌβββββΌβββββΌβββββ€
βBlueβPrplβWht βBlckβ
ββββββΌβββββΌβββββΌβββββ€
β... β... β... β... β
ββββββ΄βββββ΄βββββ΄βββββ
Exercise 4: Add Image Border¶
Write a function to add a 10-pixel thick border around an image. (Image size should increase)
def add_border(img, thickness=10, color=(0, 0, 255)):
"""Add border to image"""
# Hint: use numpy.pad or cv2.copyMakeBorder
pass
Exercise 5: Pixel-Based Gradient¶
Create a 300x300 image with a horizontal gradient from black (left) to white (right). Use NumPy broadcasting without loops.
# Hint
import numpy as np
gradient = np.linspace(0, 255, 300) # 300 values from 0~255
8. Next Steps¶
In 03_Color_Spaces.md, you'll learn about various color spaces like BGR, RGB, HSV, LAB and color-based object tracking!
Topics to Learn Next:
- BGR vs RGB differences
- Understanding HSV color space
- Color space conversion with cv2.cvtColor()
- Color-based object tracking
9. References¶
Official Documentation¶
Related Learning Materials¶
| Folder | Related Content |
|---|---|
| Python/ | NumPy slicing, array operations |
| 01_Environment_Setup.md | Installation and basic concepts |