01. Tensors and Autograd¶

Next: Neural Network Basics

PyTorch 2.x Note: This lesson is based on PyTorch 2.0+ (2023~).

Key PyTorch 2.0 features: - torch.compile(): Graph compilation for significant training/inference speedup - torch.func: Function transforms (vmap, grad, jacrev, etc.) - Enhanced CUDA graph support

Installation: pip install torch>=2.0

Learning Objectives¶

Understand the concept of tensors and their differences from NumPy arrays
Understand PyTorch's automatic differentiation (Autograd) system
Learn the basics of GPU operations
(PyTorch 2.x) torch.compile basics

1. What is a Tensor?¶

A tensor is a generalized concept of multi-dimensional arrays.

Dimension	Name	Example
0D	Scalar	Single number (5)
1D	Vector	[1, 2, 3]
2D	Matrix	[[1,2], [3,4]]
3D	3D Tensor	Image (H, W, C)
4D	4D Tensor	Batch of images (N, C, H, W)

2. NumPy vs PyTorch Tensor Comparison¶

Creation¶

import numpy as np
import torch

# NumPy
np_arr = np.array([1, 2, 3])
np_zeros = np.zeros((3, 4))
np_rand = np.random.randn(3, 4)

# PyTorch
pt_tensor = torch.tensor([1, 2, 3])
pt_zeros = torch.zeros(3, 4)
pt_rand = torch.randn(3, 4)

Conversion¶

# NumPy → PyTorch
tensor = torch.from_numpy(np_arr)

# PyTorch → NumPy
array = tensor.numpy()  # Only works for CPU tensors

Key Differences¶

Feature	NumPy	PyTorch
GPU Support	❌	✅ (`tensor.to('cuda')`)
Automatic Differentiation	❌	✅ (`requires_grad=True`)
Default Type	float64	float32
Memory Sharing	-	`from_numpy` shares memory

3. Automatic Differentiation (Autograd)¶

A core feature of PyTorch that automatically computes backpropagation.

Basic Usage¶

# Enable gradient tracking with requires_grad=True
x = torch.tensor([2.0], requires_grad=True)
y = x ** 2 + 3 * x + 1  # y = x² + 3x + 1

# Backpropagation (compute dy/dx)
y.backward()

# Check gradient
print(x.grad)  # tensor([7.])  # dy/dx = 2x + 3 = 2*2 + 3 = 7

Computational Graph¶

    x ─────┐
           │
    x² ────┼──▶ + ──▶ y
           │
    3x ────┘

Forward pass: Computation from input → output
Backward pass: Gradient computation from output → input

Gradient Accumulation and Initialization¶

# Gradients accumulate
x.grad.zero_()  # Always initialize in training loop

4. Operations and Broadcasting¶

a = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
b = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)

# Basic operations
c = a + b           # Element-wise addition
c = a * b           # Element-wise multiplication (Hadamard product)
c = a @ b           # Matrix multiplication
c = torch.matmul(a, b)  # Matrix multiplication

# Broadcasting
a = torch.tensor([[1], [2], [3]])  # (3, 1)
b = torch.tensor([10, 20, 30])     # (3,)
c = a + b  # (3, 3) automatic expansion

5. GPU Operations¶

# Check GPU availability
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Move tensor to GPU
x = torch.randn(1000, 1000)
x_gpu = x.to(device)
# Or
x_gpu = x.cuda()

# Operations (performed on the same device)
y_gpu = x_gpu @ x_gpu

# Bring result back to CPU
y_cpu = y_gpu.cpu()

6. Exercise: NumPy vs PyTorch Automatic Differentiation Comparison¶

Problem: Find the derivative of f(x) = x³ + 2x² - 5x + 3 at x=2¶

Mathematical solution: - f'(x) = 3x² + 4x - 5 - f'(2) = 3(4) + 4(2) - 5 = 12 + 8 - 5 = 15

NumPy (Manual Differentiation)¶

import numpy as np

def f(x):
    return x**3 + 2*x**2 - 5*x + 3

def df(x):
    """Manually compute derivative"""
    return 3*x**2 + 4*x - 5

x = 2.0
print(f"f({x}) = {f(x)}")
print(f"f'({x}) = {df(x)}")  # 15.0

PyTorch (Automatic Differentiation)¶

import torch

x = torch.tensor([2.0], requires_grad=True)
y = x**3 + 2*x**2 - 5*x + 3

y.backward()
print(f"f({x.item()}) = {y.item()}")
print(f"f'({x.item()}) = {x.grad.item()}")  # 15.0

7. Important Notes¶

In-place Operations¶

# In-place operations can conflict with autograd
x = torch.tensor([1.0], requires_grad=True)
# x += 1  # May cause error
x = x + 1  # Create new tensor (safe)

Disabling Gradient Tracking¶

# Save memory during inference
with torch.no_grad():
    y = model(x)  # No gradient computation

# Or
x.requires_grad = False

detach()¶

# Detach from computational graph
y = x.detach()  # y does not track gradients

8. PyTorch 2.x New Features¶

torch.compile()¶

The flagship feature of PyTorch 2.0, compiling models for improved performance.

import torch

# Define model
model = MyModel()

# Compile the model (PyTorch 2.0+)
compiled_model = torch.compile(model)

# Usage is the same
output = compiled_model(input_data)

Compilation Modes¶

# Default mode (balanced)
model = torch.compile(model)

# Maximum performance mode
model = torch.compile(model, mode="max-autotune")

# Memory-saving mode
model = torch.compile(model, mode="reduce-overhead")

torch.func (Function Transforms)¶

from torch.func import vmap, grad, jacrev

# vmap: Automatic batch operations
def single_fn(x):
    return x ** 2

batched_fn = vmap(single_fn)
result = batched_fn(torch.randn(10, 3))  # Batch processing

# grad: Functional gradients
def f(x):
    return (x ** 2).sum()

grad_f = grad(f)
x = torch.randn(3)
print(grad_f(x))  # 2 * x

Notes¶

# torch.compile has compilation overhead on first run
# Warm-up recommended for production

# Dynamic shapes may cause recompilation
# Mitigate with dynamic=True option
model = torch.compile(model, dynamic=True)

Summary¶

What to Understand from NumPy¶

Tensors are multi-dimensional arrays
Matrix operations (multiplication, transpose, broadcasting)

What PyTorch Adds¶

requires_grad: Enable automatic differentiation
backward(): Perform backpropagation
grad: Computed gradients
GPU acceleration

PyTorch 2.x Additions¶

torch.compile(): Performance optimization
torch.func: Function transforms (vmap, grad)

Next Steps¶

In 02_Neural_Network_Basics.md, we'll use these tensors and automatic differentiation to build neural networks.