3D Vision Basics
3D Vision Basics¶
Overview¶
3D vision is the technology for extracting and reconstructing three-dimensional information from 2D images. This covers the fundamentals of stereo vision, depth maps, point cloud processing, and 3D reconstruction.
Difficulty: ββββ
Prerequisites: Camera calibration, feature detection/matching, linear algebra
Table of Contents¶
- 3D Vision Overview
- Stereo Vision Principles
- Depth Map Generation
- Point Clouds
- Open3D Basics
- 3D Reconstruction
- Exercises
1. 3D Vision Overview¶
Goals of 3D Vision¶
3D Vision Pipeline:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β 2D Image ββββββΆ Depth Estimation ββββββΆ 3D Reconstruction β
β β β
β β βββββββββββββββ β
β ββββββββββββΆβ Depth Info ββββββββΆ Point Cloud β
β βββββββββββββββ β β
β β β
β βΌ β
β βββββββββββββββ β
β β 3D Mesh β β
β β 3D Model β β
β βββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Depth Extraction Methods:
βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β Method β Description β
βββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ€
β Stereo Vision β Calculate depth from disparity of 2 cams β
β Structured Light β Measure depth by projecting known patternβ
β ToF (Time-of-Flight)β Measure distance by light travel time β
β Monocular Depth Est.β Predict depth with single cam + DL β
β LiDAR β Precise depth measurement by laser scan β
βββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββ
Understanding Coordinate Systems¶
Camera Coordinate System:
Y (up)
β
β
β
β_________ X (right)
/
/
Z (camera forward direction)
World Coordinate System β Camera Coordinate System Transform:
P_cam = R * P_world + t
Image Coordinate System:
βββββββββββββββββββββββΆ u (horizontal, pixels)
β
β β (cx, cy) principal point
β
βΌ
v (vertical, pixels)
3D β 2D Projection:
u = fx * (X/Z) + cx
v = fy * (Y/Z) + cy
2. Stereo Vision Principles¶
Epipolar Geometry¶
Epipolar Geometry:
Epipole (e)
β
ββββββββββββΌβββββββββββ
β β β
β βββββββΌβββββββββββΌββββββ Epipolar line
β P β β P'
β β β
ββββββββββββ΄βββββββββββ
Left Right
Image Image
If 3D point P projects to point p in the left image,
it projects to p' somewhere on the epipolar line in the right image.
Key Matrices:
βββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
β Matrix β Description β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββ€
β Essential Matrix β Geometric relationship in normalized β
β (E) β coordinates. E = [t]x * R β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββ€
β Fundamental Matrixβ Geometric relationship in pixel β
β (F) β coordinates. F = K'^(-T) * E * K^(-1) β
β β p'^T * F * p = 0 β
βββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββ
Disparity and Depth¶
Stereo Disparity:
Left Camera Right Camera
C_L βββββββββββββββ C_R
β β
β b (baseline) β
β βββββββββββββββΊ β
β β
β β
βΌ βΌ
p_L d p_R
βββββββββββββββββββββββ
β β
β Disparity (d) β
β d = x_L - x_R β
Depth Calculation:
Z = (f * b) / d
Where:
- Z: Depth (distance from camera)
- f: Focal length
- b: Baseline (distance between two cameras)
- d: Disparity (in pixels)
Disparity Range Example:
βββββββββββββββββββββββββββββββββββββββββββ
β Distance β Disparity (f=500, b=0.1m) β
ββββββββββββΌβββββββββββββββββββββββββββββββ€
β 1m β 50 pixels β
β 5m β 10 pixels β
β 10m β 5 pixels β
β Infinity β 0 pixels β
βββββββββββββββββββββββββββββββββββββββββββ
Stereo Rectification¶
import cv2
import numpy as np
def stereo_calibrate(obj_points, img_points_left, img_points_right,
K1, D1, K2, D2, img_size):
"""Stereo camera calibration"""
flags = (cv2.CALIB_FIX_INTRINSIC +
cv2.CALIB_RATIONAL_MODEL)
ret, K1, D1, K2, D2, R, T, E, F = cv2.stereoCalibrate(
obj_points,
img_points_left,
img_points_right,
K1, D1,
K2, D2,
img_size,
flags=flags
)
print(f"Stereo calibration RMS error: {ret:.4f}")
print(f"\nRotation matrix R:\n{R}")
print(f"\nTranslation vector T:\n{T.ravel()}")
print(f"\nBaseline: {np.linalg.norm(T):.4f} units")
return R, T, E, F
def stereo_rectify(K1, D1, K2, D2, img_size, R, T):
"""Stereo Rectification"""
# Calculate rectification transform
R1, R2, P1, P2, Q, roi1, roi2 = cv2.stereoRectify(
K1, D1,
K2, D2,
img_size,
R, T,
alpha=0, # 0: valid pixels only, 1: all pixels
newImageSize=img_size
)
# Q matrix: used for disparity β 3D conversion
# [X Y Z W]^T = Q * [x y disparity 1]^T
print("Q matrix (disparity β 3D transform):")
print(Q)
return R1, R2, P1, P2, Q, roi1, roi2
def create_rectification_maps(K, D, R, P, img_size):
"""Generate rectification maps"""
map1, map2 = cv2.initUndistortRectifyMap(
K, D, R, P, img_size, cv2.CV_32FC1
)
return map1, map2
def rectify_stereo_pair(img_left, img_right, maps_left, maps_right):
"""Rectify stereo image pair"""
rect_left = cv2.remap(img_left, maps_left[0], maps_left[1],
cv2.INTER_LINEAR)
rect_right = cv2.remap(img_right, maps_right[0], maps_right[1],
cv2.INTER_LINEAR)
return rect_left, rect_right
3. Depth Map Generation¶
StereoBM (Block Matching)¶
import cv2
import numpy as np
def compute_disparity_bm(left, right, num_disparities=64, block_size=15):
"""Compute disparity map using StereoBM"""
# Convert to grayscale
if len(left.shape) == 3:
left = cv2.cvtColor(left, cv2.COLOR_BGR2GRAY)
right = cv2.cvtColor(right, cv2.COLOR_BGR2GRAY)
# Create StereoBM
stereo = cv2.StereoBM_create(
numDisparities=num_disparities, # Multiple of 16
blockSize=block_size # Odd number, 5~21
)
# Parameter tuning (optional)
stereo.setMinDisparity(0)
stereo.setSpeckleWindowSize(100)
stereo.setSpeckleRange(32)
stereo.setPreFilterType(cv2.STEREO_BM_PREFILTER_NORMALIZED_RESPONSE)
stereo.setPreFilterSize(9)
stereo.setPreFilterCap(31)
stereo.setTextureThreshold(10)
stereo.setUniquenessRatio(15)
# Compute disparity
disparity = stereo.compute(left, right)
# Normalize disparity values (scaled by 16)
disparity = disparity.astype(np.float32) / 16.0
return disparity
def visualize_disparity(disparity):
"""Visualize disparity map"""
# Use only valid disparity
valid_mask = disparity > 0
# Normalize
disp_vis = np.zeros_like(disparity)
if np.any(valid_mask):
disp_min = np.min(disparity[valid_mask])
disp_max = np.max(disparity[valid_mask])
disp_vis = (disparity - disp_min) / (disp_max - disp_min) * 255
disp_vis = disp_vis.astype(np.uint8)
# Apply colormap
disp_color = cv2.applyColorMap(disp_vis, cv2.COLORMAP_JET)
# Black out invalid regions
disp_color[~valid_mask] = [0, 0, 0]
return disp_color
StereoSGBM (Semi-Global Block Matching)¶
def compute_disparity_sgbm(left, right, num_disparities=64, block_size=5):
"""Compute disparity map using StereoSGBM"""
# Convert to grayscale
if len(left.shape) == 3:
gray_left = cv2.cvtColor(left, cv2.COLOR_BGR2GRAY)
gray_right = cv2.cvtColor(right, cv2.COLOR_BGR2GRAY)
else:
gray_left, gray_right = left, right
# SGBM parameters
# P1, P2: Penalty for disparity difference between adjacent pixels
P1 = 8 * 3 * block_size ** 2
P2 = 32 * 3 * block_size ** 2
stereo = cv2.StereoSGBM_create(
minDisparity=0,
numDisparities=num_disparities,
blockSize=block_size,
P1=P1,
P2=P2,
disp12MaxDiff=1,
uniquenessRatio=10,
speckleWindowSize=100,
speckleRange=32,
preFilterCap=63,
mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
)
# Compute disparity
disparity = stereo.compute(gray_left, gray_right)
disparity = disparity.astype(np.float32) / 16.0
return disparity
def disparity_to_depth(disparity, Q):
"""Convert disparity map to depth map"""
# 3D reprojection using Q matrix
# points_3d[y, x] = [X, Y, Z, W]
points_3d = cv2.reprojectImageTo3D(disparity, Q)
# Extract Z value (depth)
depth = points_3d[:, :, 2]
# Filter invalid depth
valid_mask = (disparity > 0) & (depth > 0) & (depth < 10000)
depth[~valid_mask] = 0
return depth, points_3d
def create_depth_colormap(depth, max_depth=10.0):
"""Visualize depth map"""
# Clip depth
depth_clipped = np.clip(depth, 0, max_depth)
# Normalize (0-255)
depth_norm = (depth_clipped / max_depth * 255).astype(np.uint8)
# Apply colormap (close = red, far = blue)
depth_color = cv2.applyColorMap(255 - depth_norm, cv2.COLORMAP_JET)
# Mask invalid regions
depth_color[depth <= 0] = [0, 0, 0]
return depth_color
Improved Disparity with WLS Filter¶
def compute_disparity_with_wls(left, right, num_disparities=64):
"""Compute improved disparity map with WLS filter"""
# Grayscale
gray_left = cv2.cvtColor(left, cv2.COLOR_BGR2GRAY)
gray_right = cv2.cvtColor(right, cv2.COLOR_BGR2GRAY)
# Left matcher
left_matcher = cv2.StereoSGBM_create(
minDisparity=0,
numDisparities=num_disparities,
blockSize=5,
P1=8 * 3 * 5 ** 2,
P2=32 * 3 * 5 ** 2,
disp12MaxDiff=1,
uniquenessRatio=15,
speckleWindowSize=0,
speckleRange=2,
preFilterCap=63,
mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
)
# Right matcher (for left-right consistency check)
right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)
# Compute disparity
left_disp = left_matcher.compute(gray_left, gray_right)
right_disp = right_matcher.compute(gray_right, gray_left)
# WLS filter
wls_filter = cv2.ximgproc.createDisparityWLSFilter(left_matcher)
wls_filter.setLambda(80000)
wls_filter.setSigmaColor(1.2)
# Apply filter
filtered_disp = wls_filter.filter(left_disp, left, None, right_disp)
filtered_disp = filtered_disp.astype(np.float32) / 16.0
return filtered_disp
4. Point Clouds¶
Point Cloud Generation¶
import cv2
import numpy as np
def create_point_cloud(depth, rgb, K):
"""Create point cloud from depth map and RGB image"""
h, w = depth.shape
fx, fy = K[0, 0], K[1, 1]
cx, cy = K[0, 2], K[1, 2]
# Pixel coordinate grid
u = np.arange(w)
v = np.arange(h)
u, v = np.meshgrid(u, v)
# Valid depth mask
valid = depth > 0
# Calculate 3D coordinates
Z = depth[valid]
X = (u[valid] - cx) * Z / fx
Y = (v[valid] - cy) * Z / fy
# Point cloud (N x 3)
points = np.stack([X, Y, Z], axis=-1)
# Color information (N x 3)
if len(rgb.shape) == 3:
colors = rgb[valid]
else:
colors = np.stack([rgb[valid]] * 3, axis=-1)
return points, colors
def subsample_point_cloud(points, colors, voxel_size=0.01):
"""Downsample point cloud using voxel grid"""
# Calculate voxel indices
voxel_indices = np.floor(points / voxel_size).astype(int)
# Select only unique voxels
_, unique_indices = np.unique(
voxel_indices, axis=0, return_index=True
)
return points[unique_indices], colors[unique_indices]
def save_point_cloud_ply(filename, points, colors):
"""Save point cloud in PLY format"""
n_points = len(points)
# PLY header
header = f"""ply
format ascii 1.0
element vertex {n_points}
property float x
property float y
property float z
property uchar red
property uchar green
property uchar blue
end_header
"""
with open(filename, 'w') as f:
f.write(header)
for i in range(n_points):
x, y, z = points[i]
r, g, b = colors[i]
f.write(f"{x:.6f} {y:.6f} {z:.6f} {int(r)} {int(g)} {int(b)}\n")
print(f"Saved: {filename} ({n_points} points)")
Point Cloud Processing¶
def remove_outliers_statistical(points, colors, nb_neighbors=20, std_ratio=2.0):
"""Statistical outlier removal"""
from scipy.spatial import KDTree
# Build KD-Tree
tree = KDTree(points)
# Calculate k-NN distance for each point
distances, _ = tree.query(points, k=nb_neighbors + 1)
mean_distances = np.mean(distances[:, 1:], axis=1) # Exclude self
# Global mean and standard deviation
global_mean = np.mean(mean_distances)
global_std = np.std(mean_distances)
# Outlier mask
threshold = global_mean + std_ratio * global_std
inlier_mask = mean_distances < threshold
print(f"Outlier removal: {len(points)} β {np.sum(inlier_mask)} points")
return points[inlier_mask], colors[inlier_mask]
def estimate_normals(points, k=30):
"""Estimate point cloud normal vectors"""
from scipy.spatial import KDTree
from numpy.linalg import eig
tree = KDTree(points)
normals = np.zeros_like(points)
for i, point in enumerate(points):
# k-NN search
_, indices = tree.query(point, k=k)
neighbors = points[indices]
# Covariance matrix
centered = neighbors - np.mean(neighbors, axis=0)
cov = np.dot(centered.T, centered) / k
# Eigenvector of smallest eigenvalue is the normal
eigenvalues, eigenvectors = eig(cov)
min_idx = np.argmin(eigenvalues)
normals[i] = eigenvectors[:, min_idx]
return normals
5. Open3D Basics¶
Open3D Installation and Basic Usage¶
# pip install open3d
import open3d as o3d
import numpy as np
def create_open3d_point_cloud(points, colors=None):
"""Create Open3D point cloud"""
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)
if colors is not None:
# Normalize colors to 0-1 range
if colors.max() > 1:
colors = colors / 255.0
pcd.colors = o3d.utility.Vector3dVector(colors)
return pcd
def visualize_point_cloud(pcd):
"""Visualize point cloud"""
# Add coordinate frame
coordinate_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(
size=0.5, origin=[0, 0, 0]
)
o3d.visualization.draw_geometries(
[pcd, coordinate_frame],
window_name="Point Cloud",
width=1280,
height=720,
point_show_normal=False
)
def process_point_cloud_open3d(pcd):
"""Process point cloud with Open3D"""
print(f"Original point count: {len(pcd.points)}")
# 1. Downsampling
pcd_down = pcd.voxel_down_sample(voxel_size=0.02)
print(f"After downsampling: {len(pcd_down.points)}")
# 2. Outlier removal
pcd_clean, _ = pcd_down.remove_statistical_outlier(
nb_neighbors=20,
std_ratio=2.0
)
print(f"After outlier removal: {len(pcd_clean.points)}")
# 3. Normal estimation
pcd_clean.estimate_normals(
search_param=o3d.geometry.KDTreeSearchParamHybrid(
radius=0.1, max_nn=30
)
)
# 4. Orient normals consistently
pcd_clean.orient_normals_consistent_tangent_plane(k=15)
return pcd_clean
Mesh Reconstruction¶
def reconstruct_mesh_poisson(pcd, depth=9):
"""Poisson surface reconstruction"""
# Normals required
if not pcd.has_normals():
pcd.estimate_normals()
pcd.orient_normals_consistent_tangent_plane(k=15)
# Poisson reconstruction
mesh, densities = o3d.geometry.TriangleMesh.create_from_point_cloud_poisson(
pcd, depth=depth
)
# Remove low-density regions
densities = np.asarray(densities)
density_threshold = np.quantile(densities, 0.01)
vertices_to_remove = densities < density_threshold
mesh.remove_vertices_by_mask(vertices_to_remove)
print(f"Mesh vertices: {len(mesh.vertices)}")
print(f"Mesh triangles: {len(mesh.triangles)}")
return mesh
def reconstruct_mesh_ball_pivoting(pcd):
"""Ball pivoting surface reconstruction"""
if not pcd.has_normals():
pcd.estimate_normals()
# Estimate radii
distances = pcd.compute_nearest_neighbor_distance()
avg_dist = np.mean(distances)
radii = [avg_dist, avg_dist * 2, avg_dist * 4]
mesh = o3d.geometry.TriangleMesh.create_from_point_cloud_ball_pivoting(
pcd, o3d.utility.DoubleVector(radii)
)
return mesh
def save_mesh(mesh, filename):
"""Save mesh"""
o3d.io.write_triangle_mesh(filename, mesh)
print(f"Mesh saved: {filename}")
RGBD Image Processing¶
def create_rgbd_from_opencv(color_img, depth_img, K):
"""Convert OpenCV images to Open3D RGBD"""
# BGR β RGB
color_rgb = cv2.cvtColor(color_img, cv2.COLOR_BGR2RGB)
# Convert to Open3D images
color_o3d = o3d.geometry.Image(color_rgb)
depth_o3d = o3d.geometry.Image(depth_img.astype(np.float32))
# Create RGBD image
rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(
color_o3d, depth_o3d,
depth_scale=1000.0, # mm β m
depth_trunc=3.0, # Maximum depth
convert_rgb_to_intensity=False
)
return rgbd
def rgbd_to_point_cloud(rgbd, K, width, height):
"""Create point cloud from RGBD image"""
# Open3D camera parameters
intrinsic = o3d.camera.PinholeCameraIntrinsic(
width, height,
K[0, 0], K[1, 1], # fx, fy
K[0, 2], K[1, 2] # cx, cy
)
# Create point cloud
pcd = o3d.geometry.PointCloud.create_from_rgbd_image(
rgbd, intrinsic
)
return pcd
6. 3D Reconstruction¶
Multi-View Stereo (MVS) Concept¶
Multi-View Stereo Pipeline:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β 1. Image Acquisition β
β Capture subject from multiple angles β
β π· π· π· π· π· β
β β
β 2. Feature Detection and Matching β
β Find correspondences between images using SIFT, ORB, etc. β
β β βββββββββββ β β
β β
β 3. Structure from Motion (SfM) β
β Camera pose estimation + sparse point cloud β
β π·βββββ β β
β π·βββββΌβββββ β β
β π·βββββ β β
β β
β 4. Dense Reconstruction β
β Estimate depth for all pixels β
β [:::::::::::] β
β β
β 5. Mesh Generation β
β Point cloud β Triangle mesh β
β β²β²β²β²β²β²β²β² β
β β
β 6. Texture Mapping β
β Apply texture to mesh using original images β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Essential Matrix-based Pose Estimation¶
import cv2
import numpy as np
def estimate_pose_from_essential(pts1, pts2, K):
"""Estimate relative pose from Essential Matrix"""
# Calculate Essential Matrix
E, mask = cv2.findEssentialMat(
pts1, pts2, K,
method=cv2.RANSAC,
prob=0.999,
threshold=1.0
)
print(f"Inlier ratio: {np.sum(mask) / len(mask) * 100:.1f}%")
# Recover R, t from Essential Matrix
_, R, t, mask = cv2.recoverPose(E, pts1, pts2, K)
print(f"\nRotation matrix R:\n{R}")
print(f"\nTranslation vector t (unit vector):\n{t.ravel()}")
return R, t
def triangulate_points(pts1, pts2, K, R, t):
"""Triangulate 3D points from two views"""
# Construct projection matrices
P1 = K @ np.hstack([np.eye(3), np.zeros((3, 1))])
P2 = K @ np.hstack([R, t])
# Triangulation
pts1_h = pts1.T # (2, N)
pts2_h = pts2.T
points_4d = cv2.triangulatePoints(P1, P2, pts1_h, pts2_h)
# Homogeneous coordinates β 3D coordinates
points_3d = points_4d[:3] / points_4d[3]
return points_3d.T # (N, 3)
def incremental_sfm(images, K):
"""Incremental SfM (simple version)"""
# SIFT detector
sift = cv2.SIFT_create()
# Initialize with first two images
kp1, desc1 = sift.detectAndCompute(images[0], None)
kp2, desc2 = sift.detectAndCompute(images[1], None)
# Matching
bf = cv2.BFMatcher()
matches = bf.knnMatch(desc1, desc2, k=2)
# Ratio test
good_matches = []
for m, n in matches:
if m.distance < 0.75 * n.distance:
good_matches.append(m)
pts1 = np.float32([kp1[m.queryIdx].pt for m in good_matches])
pts2 = np.float32([kp2[m.trainIdx].pt for m in good_matches])
# Initial pose and 3D points
R, t = estimate_pose_from_essential(pts1, pts2, K)
points_3d = triangulate_points(pts1, pts2, K, R, t)
# Store camera poses
camera_poses = [
{'R': np.eye(3), 't': np.zeros((3, 1))}, # First camera
{'R': R, 't': t} # Second camera
]
print(f"Initial 3D points: {len(points_3d)}")
# Add subsequent images (estimate pose using PnP)
for i in range(2, len(images)):
kp_new, desc_new = sift.detectAndCompute(images[i], None)
# Match with previous image
matches = bf.knnMatch(desc2, desc_new, k=2)
good_matches = []
for m, n in matches:
if m.distance < 0.75 * n.distance:
good_matches.append(m)
# 3D-2D correspondences
obj_points = points_3d[[m.queryIdx for m in good_matches]]
img_points = np.float32([kp_new[m.trainIdx].pt for m in good_matches])
# Estimate pose using PnP
success, rvec, tvec, inliers = cv2.solvePnPRansac(
obj_points, img_points, K, None
)
if success:
R_new, _ = cv2.Rodrigues(rvec)
camera_poses.append({'R': R_new, 't': tvec})
print(f"Image {i} registered (inliers: {len(inliers)})")
# Update for next iteration
desc2 = desc_new
return points_3d, camera_poses
Bundle Adjustment¶
Bundle Adjustment:
Simultaneously optimize camera parameters and 3D point positions
Minimization Objective:
E = Ξ£_i Ξ£_j || x_ij - Ο(K, R_i, t_i, X_j) ||Β²
Where:
- x_ij: 2D coordinates of point j observed in image i
- Ο(): 3D β 2D projection function
- K: Camera intrinsic parameters
- R_i, t_i: Camera i's pose
- X_j: 3D coordinates of point j
Optimization Tools:
- Ceres Solver
- g2o
- SciPy (for small problems)
7. Exercises¶
Exercise 1: Stereo Depth Estimation¶
Generate a depth map from a stereo image pair.
Requirements: - Compare StereoBM and StereoSGBM - Visualize disparity map - Convert to depth map - Quality improvement (filtering)
Hint
# Parameter tuning needed
stereo = cv2.StereoSGBM_create(
numDisparities=128,
blockSize=5,
P1=8 * 3 * 5 ** 2,
P2=32 * 3 * 5 ** 2
)
# Improve with WLS filter
wls_filter = cv2.ximgproc.createDisparityWLSFilter(stereo)
Exercise 2: Point Cloud Filtering¶
Refine a noisy point cloud.
Requirements: - Statistical outlier removal - Voxel downsampling - Extract planar regions - Visualize results
Hint
import open3d as o3d
# Outlier removal
pcd_clean, _ = pcd.remove_statistical_outlier(
nb_neighbors=20, std_ratio=2.0
)
# Downsampling
pcd_down = pcd_clean.voxel_down_sample(0.02)
# Plane extraction (RANSAC)
plane_model, inliers = pcd_down.segment_plane(
distance_threshold=0.01,
ransac_n=3,
num_iterations=1000
)
Exercise 3: 3D Reconstruction from Two Views¶
Reconstruct 3D points from two images.
Requirements: - Feature detection and matching - Essential Matrix calculation - Camera pose recovery - Generate 3D points via triangulation
Hint
# Essential Matrix
E, mask = cv2.findEssentialMat(pts1, pts2, K)
# Pose recovery
_, R, t, _ = cv2.recoverPose(E, pts1, pts2, K)
# Triangulation
points_4d = cv2.triangulatePoints(P1, P2, pts1.T, pts2.T)
points_3d = points_4d[:3] / points_4d[3]
Exercise 4: Mesh Reconstruction¶
Generate a 3D mesh from a point cloud.
Requirements: - Point cloud preprocessing - Normal vector estimation - Poisson or ball pivoting reconstruction - Save and visualize results
Hint
# Normal estimation
pcd.estimate_normals()
pcd.orient_normals_consistent_tangent_plane(k=15)
# Poisson reconstruction
mesh, densities = o3d.geometry.TriangleMesh.create_from_point_cloud_poisson(
pcd, depth=9
)
# Remove low-density regions
densities = np.asarray(densities)
mesh.remove_vertices_by_mask(densities < np.quantile(densities, 0.01))
Exercise 5: Real-time Stereo Vision¶
Implement real-time depth estimation using webcam or stereo camera.
Requirements: - Apply camera calibration - Real-time disparity calculation - Depth visualization - FPS measurement
Hint
# Pre-compute remapping maps
map1_left, map2_left = cv2.initUndistortRectifyMap(...)
map1_right, map2_right = cv2.initUndistortRectifyMap(...)
while True:
# Rectification
rect_left = cv2.remap(left, map1_left, map2_left, cv2.INTER_LINEAR)
rect_right = cv2.remap(right, map1_right, map2_right, cv2.INTER_LINEAR)
# Disparity calculation (SGBM)
disparity = stereo.compute(rect_left, rect_right)
Next Steps¶
- 22_Depth_Estimation.md - Monocular depth estimation, MiDaS, DPT, Structure from Motion