Machine Learning Learning Guide

Machine Learning Learning Guide

Overview

Machine learning is a collection of algorithms that learn patterns from data to make predictions or decisions. This learning material systematically covers from basic concepts of machine learning to key algorithms and practical applications.


Learning Roadmap

ML Overview β†’ Linear Regression β†’ Logistic Regression β†’ Model Evaluation β†’ Cross-Validation/Hyperparameters
                                                ↓
                Practical Projects ← Pipelines ← Dimensionality Reduction ← Clustering ← k-NN/Naive Bayes
                                                                                        ↑
        Decision Trees β†’ Ensemble(Bagging) β†’ Ensemble(Boosting) β†’ SVM β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

File List

File Topic Key Content
01_ML_Overview.md ML Overview Supervised/Unsupervised/Reinforcement Learning, ML Workflow, Bias-Variance Tradeoff
02_Linear_Regression.md Linear Regression Simple/Multiple Regression, Gradient Descent, Regularization (Ridge/Lasso)
03_Logistic_Regression.md Logistic Regression Binary Classification, Sigmoid Function, Multiclass (Softmax)
04_Model_Evaluation.md Model Evaluation Accuracy, Precision, Recall, F1-score, ROC-AUC
05_Cross_Validation_Hyperparameters.md Cross-Validation & Hyperparameters K-Fold CV, GridSearchCV, RandomizedSearchCV
06_Decision_Trees.md Decision Trees CART, Entropy, Gini Impurity, Pruning
07_Ensemble_Bagging.md Ensemble - Bagging Random Forest, Feature Importance, OOB Error
08_Ensemble_Boosting.md Ensemble - Boosting AdaBoost, Gradient Boosting, XGBoost, LightGBM
09_SVM.md SVM Support Vectors, Margin, Kernel Trick
10_kNN_and_Naive_Bayes.md k-NN & Naive Bayes Distance-based Classification, Probability-based Classification
11_Clustering.md Clustering K-Means, DBSCAN, Hierarchical Clustering
12_Dimensionality_Reduction.md Dimensionality Reduction PCA, t-SNE, Feature Selection
13_Pipelines_and_Practice.md Pipelines & Practice sklearn Pipeline, ColumnTransformer, Model Saving
14_Practical_Projects.md Practical Projects Kaggle Problem Solving, Classification/Regression Practice

Environment Setup

Install Required Libraries

# Using pip
pip install numpy pandas matplotlib seaborn scikit-learn

# Additional libraries (boosting)
pip install xgboost lightgbm catboost

# Jupyter Notebook (recommended)
pip install jupyter
jupyter notebook

Version Check

import sklearn
import xgboost
import lightgbm

print(f"scikit-learn: {sklearn.__version__}")
print(f"XGBoost: {xgboost.__version__}")
print(f"LightGBM: {lightgbm.__version__}")
  • Python: 3.9+
  • scikit-learn: 1.2+
  • XGBoost: 1.7+
  • LightGBM: 3.3+

Stage 1: Basic Theory (01-04)

  • Understand machine learning concepts
  • Basics of regression and classification
  • Model evaluation methods

Stage 2: Model Tuning (05)

  • Cross-validation
  • Hyperparameter optimization

Stage 3: Tree-based Models (06-08)

  • Decision trees
  • Ensemble techniques

Stage 4: Other Algorithms (09-10)

  • SVM
  • k-NN, Naive Bayes

Stage 5: Unsupervised Learning (11-12)

  • Clustering
  • Dimensionality reduction

Stage 6: Practice & Projects (13-14)

  • Building pipelines
  • Real-world problem solving

Algorithm Selection Guide

Identify Problem Type
    β”‚
    β”œβ”€β”€ Has Labels (Supervised Learning)
    β”‚       β”œβ”€β”€ Continuous Target β†’ Regression
    β”‚       β”‚       β”œβ”€β”€ Linear Relationship β†’ Linear Regression
    β”‚       β”‚       β”œβ”€β”€ Non-linear β†’ Trees, Ensemble
    β”‚       β”‚       └── Interpretability Important β†’ Linear Regression, Decision Trees
    β”‚       β”‚
    β”‚       └── Categorical Target β†’ Classification
    β”‚               β”œβ”€β”€ Binary Classification β†’ Logistic, SVM, Trees
    β”‚               β”œβ”€β”€ Multiclass β†’ Logistic (softmax), Trees
    β”‚               └── Need Probabilities β†’ Logistic, Naive Bayes
    β”‚
    └── No Labels (Unsupervised Learning)
            β”œβ”€β”€ Grouping β†’ Clustering
            β”‚       β”œβ”€β”€ Spherical Clusters β†’ K-Means
            β”‚       └── Arbitrary Shapes β†’ DBSCAN
            β”‚
            └── Dimensionality Reduction β†’ PCA, t-SNE

References

Official Documentation

  • "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" - AurΓ©lien GΓ©ron
  • "An Introduction to Statistical Learning" - James et al.
to navigate between lessons