README.md - Examples

  1# Statistics Examples
  2
  3This directory contains 10 standalone Python scripts demonstrating key statistical concepts and methods.
  4
  5## Files Overview
  6
  7### 01_probability_review.py (~250 lines)
  8- Normal, Binomial, Poisson, and Exponential distributions
  9- PDF/CDF plotting and calculations
 10- Central Limit Theorem demonstration
 11- Law of Large Numbers
 12- Random sampling with numpy/scipy.stats
 13
 14### 02_sampling_estimation.py (~260 lines)
 15- Simple random sampling
 16- Stratified sampling (with comparison to simple random)
 17- Bootstrap estimation and resampling
 18- Confidence intervals for mean
 19- Bias and variance of estimators
 20- Maximum Likelihood Estimation (MLE) for normal distribution
 21
 22### 03_hypothesis_testing.py (~280 lines)
 23- One-sample, two-sample, and paired t-tests
 24- Chi-square tests (goodness of fit and independence)
 25- One-way ANOVA
 26- p-value computation
 27- Statistical power analysis
 28- Multiple testing correction (Bonferroni and FDR)
 29
 30### 04_regression_analysis.py (~260 lines)
 31- OLS regression from scratch (simple and multiple)
 32- Polynomial regression with model selection
 33- Residual analysis (normality, heteroscedasticity, autocorrelation)
 34- R-squared and adjusted R-squared
 35- Confidence and prediction intervals
 36- AIC/BIC model comparison
 37
 38### 05_bayesian_basics.py (~240 lines)
 39- Bayes' theorem with discrete examples
 40- Beta-Binomial conjugate prior
 41- Normal-Normal conjugate prior
 42- Prior influence on posterior
 43- Credible vs confidence intervals
 44- Sequential Bayesian updating
 45
 46### 06_bayesian_inference.py (~280 lines)
 47- Metropolis-Hastings MCMC algorithm
 48- Gibbs sampling for bivariate normal
 49- Bayesian linear regression (closed-form posterior)
 50- Convergence diagnostics (trace plots, Gelman-Rubin R̂)
 51- Posterior sampling and credible intervals
 52
 53### 07_glm.py (~300 lines)
 54- Logistic regression from scratch (MLE optimization)
 55- Poisson regression
 56- Link functions (logit, probit, log, identity)
 57- Deviance analysis
 58- AIC/BIC model comparison for GLMs
 59- Likelihood ratio tests
 60
 61### 08_time_series.py (~270 lines)
 62- Moving average smoothing
 63- Exponential smoothing (different alpha values)
 64- Autocorrelation function (ACF)
 65- Stationarity testing concepts
 66- AR, MA, and ARMA models
 67- Simple forecasting methods (naive, mean, AR)
 68
 69### 09_multivariate.py (~300 lines)
 70- PCA from scratch (eigendecomposition)
 71- PCA for dimensionality reduction
 72- Multivariate normal distribution
 73- Mahalanobis distance and outlier detection
 74- Canonical correlation analysis (CCA concept)
 75
 76### 10_nonparametric.py (~330 lines)
 77- Mann-Whitney U test (non-normal two-sample)
 78- Wilcoxon signed-rank test (paired non-normal)
 79- Kruskal-Wallis test (nonparametric ANOVA)
 80- Kolmogorov-Smirnov test (goodness of fit, two-sample)
 81- Kernel density estimation (KDE with different bandwidths)
 82- Bootstrap confidence intervals (percentile method)
 83
 84## Usage
 85
 86Each script is self-contained and can be run independently:
 87
 88```bash
 89python 01_probability_review.py
 90python 02_sampling_estimation.py
 91# ... and so on
 92```
 93
 94All scripts:
 95- Print detailed results to stdout
 96- Generate synthetic data (no external data files needed)
 97- Include optional matplotlib visualizations (saved to /tmp/)
 98- Use numpy and scipy.stats for computations
 99- Are ~150-330 lines each
100- Include docstrings and section headers
101
102## Dependencies
103
104- Python 3.7+
105- numpy
106- scipy
107- matplotlib (optional, for plots)
108
109Install dependencies:
110```bash
111pip install numpy scipy matplotlib
112```
113
114## Topics Covered
115
1161. **Probability**: Distributions, CLT, LLN
1172. **Sampling**: Random, stratified, bootstrap
1183. **Hypothesis Testing**: t-tests, ANOVA, chi-square, power, multiple testing
1194. **Regression**: OLS, polynomial, residuals, intervals
1205. **Bayesian Basics**: Conjugate priors, credible intervals
1216. **Bayesian Inference**: MCMC, Gibbs, Bayesian regression
1227. **GLM**: Logistic, Poisson, link functions, deviance
1238. **Time Series**: MA, ES, ACF, AR/MA/ARMA, forecasting
1249. **Multivariate**: PCA, Mahalanobis, CCA
12510. **Nonparametric**: Rank tests, KS, KDE, bootstrap
126
127## License
128
129MIT License (code examples)