Machine Learning Learning Guide
Machine Learning Learning Guide¶
Overview¶
Machine learning is a collection of algorithms that learn patterns from data to make predictions or decisions. This learning material systematically covers from basic concepts of machine learning to key algorithms and practical applications.
Learning Roadmap¶
ML Overview β Linear Regression β Logistic Regression β Model Evaluation β Cross-Validation/Hyperparameters
β
Practical Projects β Pipelines β Dimensionality Reduction β Clustering β k-NN/Naive Bayes
β
Decision Trees β Ensemble(Bagging) β Ensemble(Boosting) β SVM βββββββββββββββββ
File List¶
| File | Topic | Key Content |
|---|---|---|
| 01_ML_Overview.md | ML Overview | Supervised/Unsupervised/Reinforcement Learning, ML Workflow, Bias-Variance Tradeoff |
| 02_Linear_Regression.md | Linear Regression | Simple/Multiple Regression, Gradient Descent, Regularization (Ridge/Lasso) |
| 03_Logistic_Regression.md | Logistic Regression | Binary Classification, Sigmoid Function, Multiclass (Softmax) |
| 04_Model_Evaluation.md | Model Evaluation | Accuracy, Precision, Recall, F1-score, ROC-AUC |
| 05_Cross_Validation_Hyperparameters.md | Cross-Validation & Hyperparameters | K-Fold CV, GridSearchCV, RandomizedSearchCV |
| 06_Decision_Trees.md | Decision Trees | CART, Entropy, Gini Impurity, Pruning |
| 07_Ensemble_Bagging.md | Ensemble - Bagging | Random Forest, Feature Importance, OOB Error |
| 08_Ensemble_Boosting.md | Ensemble - Boosting | AdaBoost, Gradient Boosting, XGBoost, LightGBM |
| 09_SVM.md | SVM | Support Vectors, Margin, Kernel Trick |
| 10_kNN_and_Naive_Bayes.md | k-NN & Naive Bayes | Distance-based Classification, Probability-based Classification |
| 11_Clustering.md | Clustering | K-Means, DBSCAN, Hierarchical Clustering |
| 12_Dimensionality_Reduction.md | Dimensionality Reduction | PCA, t-SNE, Feature Selection |
| 13_Pipelines_and_Practice.md | Pipelines & Practice | sklearn Pipeline, ColumnTransformer, Model Saving |
| 14_Practical_Projects.md | Practical Projects | Kaggle Problem Solving, Classification/Regression Practice |
Environment Setup¶
Install Required Libraries¶
# Using pip
pip install numpy pandas matplotlib seaborn scikit-learn
# Additional libraries (boosting)
pip install xgboost lightgbm catboost
# Jupyter Notebook (recommended)
pip install jupyter
jupyter notebook
Version Check¶
import sklearn
import xgboost
import lightgbm
print(f"scikit-learn: {sklearn.__version__}")
print(f"XGBoost: {xgboost.__version__}")
print(f"LightGBM: {lightgbm.__version__}")
Recommended Versions¶
- Python: 3.9+
- scikit-learn: 1.2+
- XGBoost: 1.7+
- LightGBM: 3.3+
Recommended Learning Order¶
Stage 1: Basic Theory (01-04)¶
- Understand machine learning concepts
- Basics of regression and classification
- Model evaluation methods
Stage 2: Model Tuning (05)¶
- Cross-validation
- Hyperparameter optimization
Stage 3: Tree-based Models (06-08)¶
- Decision trees
- Ensemble techniques
Stage 4: Other Algorithms (09-10)¶
- SVM
- k-NN, Naive Bayes
Stage 5: Unsupervised Learning (11-12)¶
- Clustering
- Dimensionality reduction
Stage 6: Practice & Projects (13-14)¶
- Building pipelines
- Real-world problem solving
Algorithm Selection Guide¶
Identify Problem Type
β
βββ Has Labels (Supervised Learning)
β βββ Continuous Target β Regression
β β βββ Linear Relationship β Linear Regression
β β βββ Non-linear β Trees, Ensemble
β β βββ Interpretability Important β Linear Regression, Decision Trees
β β
β βββ Categorical Target β Classification
β βββ Binary Classification β Logistic, SVM, Trees
β βββ Multiclass β Logistic (softmax), Trees
β βββ Need Probabilities β Logistic, Naive Bayes
β
βββ No Labels (Unsupervised Learning)
βββ Grouping β Clustering
β βββ Spherical Clusters β K-Means
β βββ Arbitrary Shapes β DBSCAN
β
βββ Dimensionality Reduction β PCA, t-SNE
References¶
Official Documentation¶
Recommended Datasets¶
Recommended Books¶
- "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" - AurΓ©lien GΓ©ron
- "An Introduction to Statistical Learning" - James et al.