1# Statistics Examples
2
3This directory contains 10 standalone Python scripts demonstrating key statistical concepts and methods.
4
5## Files Overview
6
7### 01_probability_review.py (~250 lines)
8- Normal, Binomial, Poisson, and Exponential distributions
9- PDF/CDF plotting and calculations
10- Central Limit Theorem demonstration
11- Law of Large Numbers
12- Random sampling with numpy/scipy.stats
13
14### 02_sampling_estimation.py (~260 lines)
15- Simple random sampling
16- Stratified sampling (with comparison to simple random)
17- Bootstrap estimation and resampling
18- Confidence intervals for mean
19- Bias and variance of estimators
20- Maximum Likelihood Estimation (MLE) for normal distribution
21
22### 03_hypothesis_testing.py (~280 lines)
23- One-sample, two-sample, and paired t-tests
24- Chi-square tests (goodness of fit and independence)
25- One-way ANOVA
26- p-value computation
27- Statistical power analysis
28- Multiple testing correction (Bonferroni and FDR)
29
30### 04_regression_analysis.py (~260 lines)
31- OLS regression from scratch (simple and multiple)
32- Polynomial regression with model selection
33- Residual analysis (normality, heteroscedasticity, autocorrelation)
34- R-squared and adjusted R-squared
35- Confidence and prediction intervals
36- AIC/BIC model comparison
37
38### 05_bayesian_basics.py (~240 lines)
39- Bayes' theorem with discrete examples
40- Beta-Binomial conjugate prior
41- Normal-Normal conjugate prior
42- Prior influence on posterior
43- Credible vs confidence intervals
44- Sequential Bayesian updating
45
46### 06_bayesian_inference.py (~280 lines)
47- Metropolis-Hastings MCMC algorithm
48- Gibbs sampling for bivariate normal
49- Bayesian linear regression (closed-form posterior)
50- Convergence diagnostics (trace plots, Gelman-Rubin RĖ)
51- Posterior sampling and credible intervals
52
53### 07_glm.py (~300 lines)
54- Logistic regression from scratch (MLE optimization)
55- Poisson regression
56- Link functions (logit, probit, log, identity)
57- Deviance analysis
58- AIC/BIC model comparison for GLMs
59- Likelihood ratio tests
60
61### 08_time_series.py (~270 lines)
62- Moving average smoothing
63- Exponential smoothing (different alpha values)
64- Autocorrelation function (ACF)
65- Stationarity testing concepts
66- AR, MA, and ARMA models
67- Simple forecasting methods (naive, mean, AR)
68
69### 09_multivariate.py (~300 lines)
70- PCA from scratch (eigendecomposition)
71- PCA for dimensionality reduction
72- Multivariate normal distribution
73- Mahalanobis distance and outlier detection
74- Canonical correlation analysis (CCA concept)
75
76### 10_nonparametric.py (~330 lines)
77- Mann-Whitney U test (non-normal two-sample)
78- Wilcoxon signed-rank test (paired non-normal)
79- Kruskal-Wallis test (nonparametric ANOVA)
80- Kolmogorov-Smirnov test (goodness of fit, two-sample)
81- Kernel density estimation (KDE with different bandwidths)
82- Bootstrap confidence intervals (percentile method)
83
84## Usage
85
86Each script is self-contained and can be run independently:
87
88```bash
89python 01_probability_review.py
90python 02_sampling_estimation.py
91# ... and so on
92```
93
94All scripts:
95- Print detailed results to stdout
96- Generate synthetic data (no external data files needed)
97- Include optional matplotlib visualizations (saved to /tmp/)
98- Use numpy and scipy.stats for computations
99- Are ~150-330 lines each
100- Include docstrings and section headers
101
102## Dependencies
103
104- Python 3.7+
105- numpy
106- scipy
107- matplotlib (optional, for plots)
108
109Install dependencies:
110```bash
111pip install numpy scipy matplotlib
112```
113
114## Topics Covered
115
1161. **Probability**: Distributions, CLT, LLN
1172. **Sampling**: Random, stratified, bootstrap
1183. **Hypothesis Testing**: t-tests, ANOVA, chi-square, power, multiple testing
1194. **Regression**: OLS, polynomial, residuals, intervals
1205. **Bayesian Basics**: Conjugate priors, credible intervals
1216. **Bayesian Inference**: MCMC, Gibbs, Bayesian regression
1227. **GLM**: Logistic, Poisson, link functions, deviance
1238. **Time Series**: MA, ES, ACF, AR/MA/ARMA, forecasting
1249. **Multivariate**: PCA, Mahalanobis, CCA
12510. **Nonparametric**: Rank tests, KS, KDE, bootstrap
126
127## License
128
129MIT License (code examples)