05. Linear & Logistic Regression
05. Linear & Logistic Regression¶
Previous: Training Techniques | Next: Multi-Layer Perceptron (MLP)
Overview¶
Linear regression and logistic regression are the most fundamental building blocks of deep learning. Each layer of a neural network is essentially a combination of linear transformation + nonlinear activation.
Learning Objectives¶
- Mathematical Understanding
- Gradient Descent principles
- Loss Functions (MSE, Cross-Entropy)
-
Matrix differentiation
-
Implementation Skills
- Direct implementation of Forward/Backward pass
- Weight initialization
-
Writing training loops
-
Practice
- MNIST binary classification
- Overfitting/regularization experiments
Mathematical Background¶
1. Linear Regression¶
Model: ลท = Xw + b
Loss: L = (1/2n) ฮฃ(y - ลท)ยฒ (MSE)
Gradients:
โL/โw = (1/n) X^T (ลท - y)
โL/โb = (1/n) ฮฃ(ลท - y)
Update:
w โ w - ฮท ร โL/โw
b โ b - ฮท ร โL/โb
2. Logistic Regression¶
Model: z = Xw + b
ลท = ฯ(z) = 1/(1 + e^(-z))
Loss: L = -(1/n) ฮฃ[yยทlog(ลท) + (1-y)ยทlog(1-ลท)] (BCE)
Gradients:
โL/โw = (1/n) X^T (ลท - y) โ Surprisingly, same form as Linear!
โL/โb = (1/n) ฮฃ(ลท - y)
File Structure¶
01_Linear_Logistic/
โโโ README.md # This file
โโโ theory.md # Detailed theory (mathematical derivations)
โโโ numpy/
โ โโโ linear_numpy.py # Linear Regression (NumPy)
โ โโโ logistic_numpy.py # Logistic Regression (NumPy)
โ โโโ test_numpy.py # Unit tests
โโโ pytorch_lowlevel/
โ โโโ linear_lowlevel.py # Using PyTorch basic ops
โ โโโ logistic_lowlevel.py
โโโ paper/
โ โโโ linear_paper.py # Clean nn.Module implementation
โโโ exercises/
โโโ 01_regularization.md # Add L1/L2 regularization
โโโ 02_softmax.md # Extend to Softmax
Quick Start¶
Running NumPy Implementation¶
cd numpy/
python linear_numpy.py # Train linear regression
python logistic_numpy.py # Train logistic regression
python test_numpy.py # Run tests
Running PyTorch Implementation¶
cd pytorch_lowlevel/
python linear_lowlevel.py
Core Concepts¶
1. Gradient Descent¶
# Basic algorithm
for epoch in range(n_epochs):
# Forward
y_pred = model.forward(X)
# Loss
loss = compute_loss(y, y_pred)
# Backward (compute gradients)
gradients = compute_gradients(y, y_pred)
# Update
model.weights -= learning_rate * gradients
2. Matrix Differentiation (Important!)¶
โ(Xw)/โw = X^T
โ(w^T X^T)/โw = X
โ(||Xw - y||ยฒ)/โw = 2 X^T (Xw - y)
3. Sigmoid and Its Derivative¶
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(z):
s = sigmoid(z)
return s * (1 - s) # ฯ(z)(1 - ฯ(z))
Practice Problems¶
Basic¶
- Implement Linear Regression without bias
- Observe convergence speed with different learning rates (lr)
- Compare Batch vs Stochastic Gradient Descent
Intermediate¶
- Add L2 regularization (Ridge)
- Add L1 regularization (Lasso)
- Implement Mini-batch GD
Advanced¶
- Implement Momentum, Adam optimizers
- Implement Early Stopping
- Extend to Softmax Regression (multi-class)
References¶
- CS229 (Stanford) Lecture Notes
- Deep Learning Book Chapter 5, 6
- Coursera ML - Andrew Ng