08. Probability for Machine Learning

08. Probability for Machine Learning

Learning Objectives

  • Understand and apply basic probability axioms, conditional probability, and Bayes' theorem
  • Learn the concept of random variables and the differences between discrete and continuous distributions
  • Calculate and interpret key statistics of random variables such as expectation, variance, and covariance
  • Learn the characteristics and applications of probability distributions commonly used in machine learning
  • Implement probabilistic inference and Bayesian updates using Bayes' theorem
  • Understand the difference between generative and discriminative models from a probabilistic perspective

1. Foundations of Probability

1.1 Axioms of Probability

Sample Space $\Omega$: set of all possible outcomes

Event $A$: subset of the sample space

Probability Measure $P$ satisfies the following axioms:

  1. Non-negativity: $P(A) \geq 0$ for all $A$
  2. Normalization: $P(\Omega) = 1$
  3. Countable Additivity: For mutually exclusive events $A_1, A_2, \ldots$ $$P\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty P(A_i)$$

1.2 Conditional Probability

Probability of event $A$ occurring given that event $B$ has occurred:

$$ P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad \text{if } P(B) > 0 $$

Intuition: When we know $B$ has occurred, the sample space shrinks from $\Omega$ to $B$.

1.3 Independence

Events $A$ and $B$ are independent if:

$$ P(A \cap B) = P(A) \cdot P(B) $$

or equivalently: $$ P(A|B) = P(A) $$

1.4 Law of Total Probability

If $B_1, \ldots, B_n$ form a partition of the sample space:

$$ P(A) = \sum_{i=1}^n P(A|B_i)P(B_i) $$

1.5 Bayes' Theorem

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

or using the law of total probability:

$$ P(A|B) = \frac{P(B|A)P(A)}{\sum_{i} P(B|A_i)P(A_i)} $$

Terminology: - $P(A)$: prior probability - $P(B|A)$: likelihood - $P(A|B)$: posterior probability - $P(B)$: marginal probability or evidence

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ ์˜ˆ์ œ: ์˜๋ฃŒ ์ง„๋‹จ
# ์งˆ๋ณ‘ ์œ ๋ณ‘๋ฅ : 1%
P_disease = 0.01
P_no_disease = 1 - P_disease

# ๊ฒ€์‚ฌ ์ •ํ™•๋„
# ๋ฏผ๊ฐ๋„ (sensitivity): ๋ณ‘์ด ์žˆ์„ ๋•Œ ์–‘์„ฑ ํ™•๋ฅ 
P_positive_given_disease = 0.95
# ํŠน์ด๋„ (specificity): ๋ณ‘์ด ์—†์„ ๋•Œ ์Œ์„ฑ ํ™•๋ฅ 
P_negative_given_no_disease = 0.95
P_positive_given_no_disease = 1 - P_negative_given_no_disease

# ์ „ํ™•๋ฅ : ์–‘์„ฑ ๊ฒ€์‚ฌ ํ™•๋ฅ 
P_positive = (P_positive_given_disease * P_disease +
              P_positive_given_no_disease * P_no_disease)

# ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ: ์–‘์„ฑ์ผ ๋•Œ ์‹ค์ œ ๋ณ‘์ด ์žˆ์„ ํ™•๋ฅ 
P_disease_given_positive = (P_positive_given_disease * P_disease) / P_positive

print("์˜๋ฃŒ ์ง„๋‹จ ์˜ˆ์ œ (๋ฒ ์ด์ฆˆ ์ •๋ฆฌ)")
print(f"์งˆ๋ณ‘ ์œ ๋ณ‘๋ฅ  (์‚ฌ์ „ ํ™•๋ฅ ): {P_disease:.1%}")
print(f"๊ฒ€์‚ฌ ๋ฏผ๊ฐ๋„: {P_positive_given_disease:.1%}")
print(f"๊ฒ€์‚ฌ ํŠน์ด๋„: {P_negative_given_no_disease:.1%}")
print(f"\n์–‘์„ฑ ๊ฒ€์‚ฌ ํ™•๋ฅ  (์ „ํ™•๋ฅ ): {P_positive:.4f}")
print(f"์–‘์„ฑ์ผ ๋•Œ ์‹ค์ œ ๋ณ‘์ด ์žˆ์„ ํ™•๋ฅ  (์‚ฌํ›„ ํ™•๋ฅ ): {P_disease_given_positive:.1%}")
print(f"\nํ•ด์„: ๊ฒ€์‚ฌ๊ฐ€ ์–‘์„ฑ์ด์–ด๋„ ์‹ค์ œ ๋ณ‘์ด ์žˆ์„ ํ™•๋ฅ ์€ {P_disease_given_positive:.1%}์— ๋ถˆ๊ณผ")
print("       (๋‚ฎ์€ ์œ ๋ณ‘๋ฅ ๋กœ ์ธํ•ด ์œ„์–‘์„ฑ์ด ๋งŽ์Œ)")

# ์‹œ๊ฐํ™”: ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ
fig, ax = plt.subplots(figsize=(12, 6))

categories = ['์‚ฌ์ „ ํ™•๋ฅ \n(๋ณ‘ ์žˆ์Œ)', '์šฐ๋„\n(์–‘์„ฑ|๋ณ‘)', '์‚ฌํ›„ ํ™•๋ฅ \n(๋ณ‘|์–‘์„ฑ)']
probabilities = [P_disease, P_positive_given_disease, P_disease_given_positive]
colors = ['skyblue', 'lightgreen', 'salmon']

bars = ax.bar(categories, probabilities, color=colors, edgecolor='black', linewidth=2)

# ๊ฐ’ ํ‘œ์‹œ
for bar, prob in zip(bars, probabilities):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{prob:.1%}', ha='center', va='bottom', fontsize=14, fontweight='bold')

ax.set_ylabel('ํ™•๋ฅ ', fontsize=13)
ax.set_title('๋ฒ ์ด์ฆˆ ์ •๋ฆฌ: ์˜๋ฃŒ ์ง„๋‹จ ์˜ˆ์ œ', fontsize=15)
ax.set_ylim(0, 1.0)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('bayes_theorem_medical.png', dpi=150)
plt.show()

2. Random Variables

2.1 Definition of Random Variables

Random Variable: a function from the sample space to real numbers $$X: \Omega \to \mathbb{R}$$

Discrete random variable: takes countable values (e.g., dice, coins) Continuous random variable: takes continuous values (e.g., height, temperature)

2.2 Probability Mass Function (PMF)

For a discrete random variable $X$:

$$ p_X(x) = P(X = x) $$

Properties: - $p_X(x) \geq 0$ for all $x$ - $\sum_{x} p_X(x) = 1$

2.3 Probability Density Function (PDF)

For a continuous random variable $X$:

$$ P(a \leq X \leq b) = \int_a^b f_X(x) dx $$

Properties: - $f_X(x) \geq 0$ for all $x$ - $\int_{-\infty}^{\infty} f_X(x) dx = 1$ - $P(X = x) = 0$ (probability at a single point is 0)

2.4 Cumulative Distribution Function (CDF)

$$ F_X(x) = P(X \leq x) $$

Properties: - Non-decreasing function - $\lim_{x \to -\infty} F_X(x) = 0$, $\lim_{x \to \infty} F_X(x) = 1$ - For continuous random variables: $f_X(x) = \frac{d}{dx}F_X(x)$

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

fig, axes = plt.subplots(2, 3, figsize=(18, 10))

# 1. ์ด์‚ฐ: ์ดํ•ญ ๋ถ„ํฌ
n, p = 10, 0.5
x_binom = np.arange(0, n+1)
pmf_binom = stats.binom.pmf(x_binom, n, p)
cdf_binom = stats.binom.cdf(x_binom, n, p)

axes[0, 0].bar(x_binom, pmf_binom, color='skyblue', edgecolor='black')
axes[0, 0].set_title('์ดํ•ญ ๋ถ„ํฌ PMF\n$n=10, p=0.5$', fontsize=12)
axes[0, 0].set_xlabel('x')
axes[0, 0].set_ylabel('P(X=x)')
axes[0, 0].grid(True, alpha=0.3)

axes[1, 0].step(x_binom, cdf_binom, where='post', linewidth=2, color='blue')
axes[1, 0].set_title('์ดํ•ญ ๋ถ„ํฌ CDF', fontsize=12)
axes[1, 0].set_xlabel('x')
axes[1, 0].set_ylabel('P(Xโ‰คx)')
axes[1, 0].grid(True, alpha=0.3)

# 2. ์—ฐ์†: ์ •๊ทœ ๋ถ„ํฌ
mu, sigma = 0, 1
x_norm = np.linspace(-4, 4, 1000)
pdf_norm = stats.norm.pdf(x_norm, mu, sigma)
cdf_norm = stats.norm.cdf(x_norm, mu, sigma)

axes[0, 1].plot(x_norm, pdf_norm, linewidth=2, color='red')
axes[0, 1].fill_between(x_norm, pdf_norm, alpha=0.3, color='red')
axes[0, 1].set_title('์ •๊ทœ ๋ถ„ํฌ PDF\n$\mu=0, \sigma=1$', fontsize=12)
axes[0, 1].set_xlabel('x')
axes[0, 1].set_ylabel('f(x)')
axes[0, 1].grid(True, alpha=0.3)

axes[1, 1].plot(x_norm, cdf_norm, linewidth=2, color='darkred')
axes[1, 1].set_title('์ •๊ทœ ๋ถ„ํฌ CDF', fontsize=12)
axes[1, 1].set_xlabel('x')
axes[1, 1].set_ylabel('F(x)')
axes[1, 1].grid(True, alpha=0.3)

# 3. ์—ฐ์†: ์ง€์ˆ˜ ๋ถ„ํฌ
lam = 1.0
x_exp = np.linspace(0, 5, 1000)
pdf_exp = stats.expon.pdf(x_exp, scale=1/lam)
cdf_exp = stats.expon.cdf(x_exp, scale=1/lam)

axes[0, 2].plot(x_exp, pdf_exp, linewidth=2, color='green')
axes[0, 2].fill_between(x_exp, pdf_exp, alpha=0.3, color='green')
axes[0, 2].set_title('์ง€์ˆ˜ ๋ถ„ํฌ PDF\n$\lambda=1$', fontsize=12)
axes[0, 2].set_xlabel('x')
axes[0, 2].set_ylabel('f(x)')
axes[0, 2].grid(True, alpha=0.3)

axes[1, 2].plot(x_exp, cdf_exp, linewidth=2, color='darkgreen')
axes[1, 2].set_title('์ง€์ˆ˜ ๋ถ„ํฌ CDF', fontsize=12)
axes[1, 2].set_xlabel('x')
axes[1, 2].set_ylabel('F(x)')
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('pmf_pdf_cdf.png', dpi=150)
plt.show()

print("PMF vs PDF:")
print("  PMF (์ด์‚ฐ): ํŠน์ • ๊ฐ’์˜ ํ™•๋ฅ  P(X=x)")
print("  PDF (์—ฐ์†): ํ™•๋ฅ  ๋ฐ€๋„, ๊ตฌ๊ฐ„ ํ™•๋ฅ ์€ ์ ๋ถ„์œผ๋กœ ๊ณ„์‚ฐ")
print("  CDF: ๋ˆ„์  ํ™•๋ฅ  P(Xโ‰คx), ์ด์‚ฐ/์—ฐ์† ๋ชจ๋‘ ์ •์˜")

2.5 Joint, Marginal, and Conditional Distributions

Joint Distribution: $$P(X = x, Y = y)$$ or $$f_{X,Y}(x, y)$$

Marginal Distribution: $$p_X(x) = \sum_y p_{X,Y}(x, y)$$ or $$f_X(x) = \int f_{X,Y}(x, y) dy$$

Conditional Distribution: $$p_{X|Y}(x|y) = \frac{p_{X,Y}(x, y)}{p_Y(y)}$$

# ๊ฒฐํ•ฉ ๋ถ„ํฌ ์˜ˆ์ œ: ์ด๋ณ€๋Ÿ‰ ์ •๊ทœ๋ถ„ํฌ
from scipy.stats import multivariate_normal

# ํŒŒ๋ผ๋ฏธํ„ฐ
mu = np.array([0, 0])
cov = np.array([[1, 0.7],
                [0.7, 1]])

# ๊ทธ๋ฆฌ๋“œ
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
pos = np.dstack((X, Y))

# ๊ฒฐํ•ฉ PDF
rv = multivariate_normal(mu, cov)
Z = rv.pdf(pos)

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# ๋“ฑ๊ณ ์„ 
ax = axes[0]
contour = ax.contourf(X, Y, Z, levels=15, cmap='viridis')
plt.colorbar(contour, ax=ax)
ax.set_xlabel('X', fontsize=12)
ax.set_ylabel('Y', fontsize=12)
ax.set_title('๊ฒฐํ•ฉ ๋ถ„ํฌ $f_{X,Y}(x,y)$ (์ด๋ณ€๋Ÿ‰ ์ •๊ทœ)', fontsize=14)
ax.grid(True, alpha=0.3)

# ์ฃผ๋ณ€ ๋ถ„ํฌ
ax = axes[1]
marginal_X = stats.norm.pdf(x, mu[0], np.sqrt(cov[0, 0]))
marginal_Y = stats.norm.pdf(y, mu[1], np.sqrt(cov[1, 1]))
ax.plot(x, marginal_X, linewidth=3, label='์ฃผ๋ณ€ ๋ถ„ํฌ $f_X(x)$', color='blue')
ax.plot(y, marginal_Y, linewidth=3, label='์ฃผ๋ณ€ ๋ถ„ํฌ $f_Y(y)$', color='red')
ax.set_xlabel('๊ฐ’', fontsize=12)
ax.set_ylabel('๋ฐ€๋„', fontsize=12)
ax.set_title('์ฃผ๋ณ€ ๋ถ„ํฌ', fontsize=14)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('joint_marginal_distributions.png', dpi=150)
plt.show()

print(f"๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ:\n{cov}")
print(f"์ƒ๊ด€๊ณ„์ˆ˜: {cov[0,1] / np.sqrt(cov[0,0] * cov[1,1]):.2f}")

3. Expectation and Variance

3.1 Expectation

Discrete: $$\mathbb{E}[X] = \sum_x x \cdot p_X(x)$$

Continuous: $$\mathbb{E}[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx$$

Expectation of a function (LOTUS - Law of the Unconscious Statistician): $$\mathbb{E}[g(X)] = \sum_x g(x) \cdot p_X(x) \quad \text{or} \quad \int g(x) \cdot f_X(x) dx$$

3.2 Properties of Expectation

  1. Linearity: $$\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]$$

  2. Product of independent variables: If $X, Y$ independent then $\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]$

3.3 Variance

$$ \text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 $$

Standard Deviation: $$\sigma_X = \sqrt{\text{Var}(X)}$$

Properties of variance: - $\text{Var}(aX + b) = a^2 \text{Var}(X)$ - If $X, Y$ independent then $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$

3.4 Covariance

$$ \text{Cov}(X, Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])] = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] $$

Correlation Coefficient: $$ \rho_{X,Y} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} \in [-1, 1] $$

import numpy as np
import matplotlib.pyplot as plt

# ๋ชฌํ…Œ์นด๋ฅผ๋กœ๋กœ ๊ธฐ๋Œ“๊ฐ’๊ณผ ๋ถ„์‚ฐ ์ถ”์ •
np.random.seed(42)

# ์ •๊ทœ ๋ถ„ํฌ ์ƒ˜ํ”Œ๋ง
mu, sigma = 2, 1.5
samples = np.random.normal(mu, sigma, 10000)

# ๊ธฐ๋Œ“๊ฐ’๊ณผ ๋ถ„์‚ฐ ์ถ”์ •
estimated_mean = np.mean(samples)
estimated_var = np.var(samples, ddof=0)
estimated_std = np.std(samples, ddof=0)

print("๋ชฌํ…Œ์นด๋ฅผ๋กœ ์ถ”์ •")
print(f"์ด๋ก ์  ํ‰๊ท : {mu}, ์ถ”์ • ํ‰๊ท : {estimated_mean:.4f}")
print(f"์ด๋ก ์  ๋ถ„์‚ฐ: {sigma**2}, ์ถ”์ • ๋ถ„์‚ฐ: {estimated_var:.4f}")
print(f"์ด๋ก ์  ํ‘œ์ค€ํŽธ์ฐจ: {sigma}, ์ถ”์ • ํ‘œ์ค€ํŽธ์ฐจ: {estimated_std:.4f}")

# ์‹œ๊ฐํ™”
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# ํžˆ์Šคํ† ๊ทธ๋žจ + ์ด๋ก ์  PDF
ax = axes[0]
ax.hist(samples, bins=50, density=True, alpha=0.7, color='skyblue',
        edgecolor='black', label='์ƒ˜ํ”Œ ํžˆ์Šคํ† ๊ทธ๋žจ')
x = np.linspace(mu - 4*sigma, mu + 4*sigma, 1000)
pdf = stats.norm.pdf(x, mu, sigma)
ax.plot(x, pdf, linewidth=3, color='red', label='์ด๋ก ์  PDF')
ax.axvline(estimated_mean, color='green', linestyle='--', linewidth=2,
           label=f'์ถ”์ • ํ‰๊ท  = {estimated_mean:.2f}')
ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('๋ฐ€๋„', fontsize=12)
ax.set_title(f'์ •๊ทœ ๋ถ„ํฌ ์ƒ˜ํ”Œ๋ง (ฮผ={mu}, ฯƒ={sigma})', fontsize=14)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

# ์ƒ˜ํ”Œ ํฌ๊ธฐ์— ๋”ฐ๋ฅธ ์ˆ˜๋ ด
ax = axes[1]
sample_sizes = np.arange(10, 10001, 10)
running_means = [np.mean(samples[:n]) for n in sample_sizes]

ax.plot(sample_sizes, running_means, linewidth=2, color='blue',
        label='๋ˆ„์  ํ‰๊ท ')
ax.axhline(mu, color='red', linestyle='--', linewidth=2, label=f'์ด๋ก ์  ํ‰๊ท  = {mu}')
ax.set_xlabel('์ƒ˜ํ”Œ ํฌ๊ธฐ', fontsize=12)
ax.set_ylabel('๋ˆ„์  ํ‰๊ท ', fontsize=12)
ax.set_title('๋Œ€์ˆ˜์˜ ๋ฒ•์น™ (Law of Large Numbers)', fontsize=14)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('expectation_variance_estimation.png', dpi=150)
plt.show()

# ๊ณต๋ถ„์‚ฐ ์˜ˆ์ œ
np.random.seed(42)
n = 1000

# ์–‘์˜ ์ƒ๊ด€๊ด€๊ณ„
X1 = np.random.randn(n)
Y1 = 0.8 * X1 + 0.3 * np.random.randn(n)

# ์Œ์˜ ์ƒ๊ด€๊ด€๊ณ„
X2 = np.random.randn(n)
Y2 = -0.8 * X2 + 0.3 * np.random.randn(n)

# ๋…๋ฆฝ
X3 = np.random.randn(n)
Y3 = np.random.randn(n)

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

datasets = [(X1, Y1, '์–‘์˜ ์ƒ๊ด€'), (X2, Y2, '์Œ์˜ ์ƒ๊ด€'), (X3, Y3, '๋…๋ฆฝ (์ƒ๊ด€ ์—†์Œ)')]
for idx, (X, Y, title) in enumerate(datasets):
    ax = axes[idx]
    ax.scatter(X, Y, alpha=0.5, s=20, edgecolors='k', linewidths=0.5)

    # ํ†ต๊ณ„๋Ÿ‰ ๊ณ„์‚ฐ
    cov = np.cov(X, Y)[0, 1]
    corr = np.corrcoef(X, Y)[0, 1]

    ax.set_xlabel('X', fontsize=12)
    ax.set_ylabel('Y', fontsize=12)
    ax.set_title(f'{title}\nCov={cov:.3f}, ฯ={corr:.3f}', fontsize=13)
    ax.grid(True, alpha=0.3)
    ax.set_aspect('equal')

plt.tight_layout()
plt.savefig('covariance_correlation.png', dpi=150)
plt.show()

print("\n๊ณต๋ถ„์‚ฐ๊ณผ ์ƒ๊ด€๊ณ„์ˆ˜:")
print("  Cov > 0: ์–‘์˜ ๊ด€๊ณ„ (X ์ฆ๊ฐ€ โ†’ Y ์ฆ๊ฐ€)")
print("  Cov < 0: ์Œ์˜ ๊ด€๊ณ„ (X ์ฆ๊ฐ€ โ†’ Y ๊ฐ์†Œ)")
print("  Cov = 0: ์„ ํ˜• ๊ด€๊ณ„ ์—†์Œ (๋…๋ฆฝ์ด๋ฉด Cov=0, ์—ญ์€ ์„ฑ๋ฆฝ ์•ˆ ํ•จ)")
print("  ฯ โˆˆ [-1, 1]: ์ •๊ทœํ™”๋œ ๊ณต๋ถ„์‚ฐ (๋‹จ์œ„ ๋ฌด๊ด€)")

4. Common Probability Distributions

4.1 Discrete Distributions

Bernoulli Distribution: $$X \sim \text{Ber}(p), \quad P(X=1) = p, \; P(X=0) = 1-p$$ - Mean: $p$, Variance: $p(1-p)$

Binomial Distribution: $$X \sim \text{Bin}(n, p), \quad P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$$ - Number of successes in $n$ independent Bernoulli trials - Mean: $np$, Variance: $np(1-p)$

Poisson Distribution: $$X \sim \text{Pois}(\lambda), \quad P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}$$ - Number of events in unit time/space - Mean: $\lambda$, Variance: $\lambda$

Categorical Distribution: $$X \sim \text{Cat}(p_1, \ldots, p_K), \quad P(X=k) = p_k, \; \sum p_k = 1$$ - Basic distribution for multiclass classification

4.2 Continuous Distributions

Uniform Distribution: $$X \sim \text{Unif}(a, b), \quad f(x) = \frac{1}{b-a} \text{ for } x \in [a, b]$$ - Mean: $\frac{a+b}{2}$, Variance: $\frac{(b-a)^2}{12}$

Normal/Gaussian Distribution: $$X \sim \mathcal{N}(\mu, \sigma^2), \quad f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$ - Mean: $\mu$, Variance: $\sigma^2$ - Arises naturally via Central Limit Theorem

Exponential Distribution: $$X \sim \text{Exp}(\lambda), \quad f(x) = \lambda e^{-\lambda x} \text{ for } x \geq 0$$ - Waiting time in Poisson process - Mean: $1/\lambda$, Variance: $1/\lambda^2$

Beta Distribution: $$X \sim \text{Beta}(\alpha, \beta), \quad f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)} \text{ for } x \in [0, 1]$$ - Distribution of probabilities (prior in Bayesian inference)

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt

fig, axes = plt.subplots(3, 3, figsize=(18, 15))

# 1. ๋ฒ ๋ฅด๋ˆ„์ด
ax = axes[0, 0]
p = 0.7
x = [0, 1]
pmf = [1-p, p]
ax.bar(x, pmf, color='skyblue', edgecolor='black', width=0.4)
ax.set_title(f'๋ฒ ๋ฅด๋ˆ„์ด (p={p})', fontsize=12)
ax.set_xticks([0, 1])
ax.set_ylabel('P(X=x)')

# 2. ์ดํ•ญ
ax = axes[0, 1]
n, p = 20, 0.5
x = np.arange(0, n+1)
pmf = stats.binom.pmf(x, n, p)
ax.bar(x, pmf, color='lightgreen', edgecolor='black')
ax.set_title(f'์ดํ•ญ (n={n}, p={p})', fontsize=12)
ax.set_xlabel('x')

# 3. ํฌ์•„์†ก
ax = axes[0, 2]
lam = 5
x = np.arange(0, 20)
pmf = stats.poisson.pmf(x, lam)
ax.bar(x, pmf, color='salmon', edgecolor='black')
ax.set_title(f'ํฌ์•„์†ก (ฮป={lam})', fontsize=12)
ax.set_xlabel('x')

# 4. ๊ท ๋“ฑ
ax = axes[1, 0]
a, b = 0, 1
x = np.linspace(-0.5, 1.5, 1000)
pdf = stats.uniform.pdf(x, a, b-a)
ax.plot(x, pdf, linewidth=3, color='blue')
ax.fill_between(x, pdf, alpha=0.3, color='blue')
ax.set_title(f'๊ท ๋“ฑ (a={a}, b={b})', fontsize=12)
ax.set_ylabel('f(x)')

# 5. ์ •๊ทœ (์—ฌ๋Ÿฌ ํŒŒ๋ผ๋ฏธํ„ฐ)
ax = axes[1, 1]
x = np.linspace(-5, 5, 1000)
params = [(0, 1), (0, 0.5), (1, 1)]
for mu, sigma in params:
    pdf = stats.norm.pdf(x, mu, sigma)
    ax.plot(x, pdf, linewidth=2, label=f'ฮผ={mu}, ฯƒ={sigma}')
ax.set_title('์ •๊ทœ ๋ถ„ํฌ', fontsize=12)
ax.legend(fontsize=9)

# 6. ์ง€์ˆ˜
ax = axes[1, 2]
x = np.linspace(0, 5, 1000)
lambdas = [0.5, 1, 2]
for lam in lambdas:
    pdf = stats.expon.pdf(x, scale=1/lam)
    ax.plot(x, pdf, linewidth=2, label=f'ฮป={lam}')
ax.set_title('์ง€์ˆ˜ ๋ถ„ํฌ', fontsize=12)
ax.legend(fontsize=9)

# 7. ๊ฐ๋งˆ
ax = axes[2, 0]
x = np.linspace(0, 20, 1000)
params = [(1, 1), (2, 2), (5, 1)]
for k, theta in params:
    pdf = stats.gamma.pdf(x, k, scale=theta)
    ax.plot(x, pdf, linewidth=2, label=f'k={k}, ฮธ={theta}')
ax.set_title('๊ฐ๋งˆ ๋ถ„ํฌ', fontsize=12)
ax.set_ylabel('f(x)')
ax.legend(fontsize=9)

# 8. ๋ฒ ํƒ€
ax = axes[2, 1]
x = np.linspace(0, 1, 1000)
params = [(0.5, 0.5), (2, 2), (5, 2)]
for alpha, beta in params:
    pdf = stats.beta.pdf(x, alpha, beta)
    ax.plot(x, pdf, linewidth=2, label=f'ฮฑ={alpha}, ฮฒ={beta}')
ax.set_title('๋ฒ ํƒ€ ๋ถ„ํฌ', fontsize=12)
ax.legend(fontsize=9)

# 9. ์นด์ด์ œ๊ณฑ
ax = axes[2, 2]
x = np.linspace(0, 15, 1000)
dfs = [2, 4, 6]
for df in dfs:
    pdf = stats.chi2.pdf(x, df)
    ax.plot(x, pdf, linewidth=2, label=f'df={df}')
ax.set_title('์นด์ด์ œ๊ณฑ ๋ถ„ํฌ', fontsize=12)
ax.legend(fontsize=9)

for ax in axes.flat:
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('common_distributions.png', dpi=150)
plt.show()

print("๋จธ์‹ ๋Ÿฌ๋‹์—์„œ์˜ ํ™œ์šฉ:")
print("  ๋ฒ ๋ฅด๋ˆ„์ด/์ดํ•ญ: ์ด์ง„ ๋ถ„๋ฅ˜")
print("  ์นดํ…Œ๊ณ ๋ฆฌ์ปฌ: ๋‹คํ•ญ ๋ถ„๋ฅ˜")
print("  ์ •๊ทœ: ์—ฐ์† ๋ฐ์ดํ„ฐ, ์˜ค์ฐจ ๋ชจ๋ธ, VAE ์ž ์žฌ ๊ณต๊ฐ„")
print("  ํฌ์•„์†ก: ์นด์šดํŠธ ๋ฐ์ดํ„ฐ (์ถ”์ฒœ ์‹œ์Šคํ…œ, ์›น ํŠธ๋ž˜ํ”ฝ)")
print("  ๋ฒ ํƒ€: ๋ฒ ์ด์ง€์•ˆ ์ถ”๋ก ์˜ ์‚ฌ์ „๋ถ„ํฌ")
print("  ์ง€์ˆ˜/๊ฐ๋งˆ: ๋Œ€๊ธฐ ์‹œ๊ฐ„, ์ƒ์กด ๋ถ„์„")

4.3 Multivariate Normal Distribution

$$ \mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}), \quad f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^d |\boldsymbol{\Sigma}|}}\exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right) $$

  • $\boldsymbol{\mu} \in \mathbb{R}^d$: mean vector
  • $\boldsymbol{\Sigma} \in \mathbb{R}^{d \times d}$: covariance matrix (positive definite)
from scipy.stats import multivariate_normal
import numpy as np
import matplotlib.pyplot as plt

# ๋‹ค๋ณ€๋Ÿ‰ ์ •๊ทœ๋ถ„ํฌ ์‹œ๊ฐํ™”
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

mu = np.array([0, 0])
covs = [
    np.array([[1, 0], [0, 1]]),      # ๋…๋ฆฝ
    np.array([[1, 0.8], [0.8, 1]]),  # ์–‘์˜ ์ƒ๊ด€
    np.array([[1, -0.8], [-0.8, 1]]) # ์Œ์˜ ์ƒ๊ด€
]
titles = ['๋…๋ฆฝ (ฯ=0)', '์–‘์˜ ์ƒ๊ด€ (ฯ=0.8)', '์Œ์˜ ์ƒ๊ด€ (ฯ=-0.8)']

x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
pos = np.dstack((X, Y))

for ax, cov, title in zip(axes, covs, titles):
    rv = multivariate_normal(mu, cov)
    Z = rv.pdf(pos)

    contour = ax.contourf(X, Y, Z, levels=15, cmap='viridis')
    ax.contour(X, Y, Z, levels=15, colors='white', alpha=0.3, linewidths=0.5)
    ax.set_xlabel('$X_1$', fontsize=12)
    ax.set_ylabel('$X_2$', fontsize=12)
    ax.set_title(title, fontsize=13)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('multivariate_normal.png', dpi=150)
plt.show()

print("๋‹ค๋ณ€๋Ÿ‰ ์ •๊ทœ๋ถ„ํฌ:")
print("  - ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋ง์˜ ๊ธฐ๋ณธ")
print("  - ๊ฐ€์šฐ์‹œ์•ˆ ํ”„๋กœ์„ธ์Šค, GMM, VAE ๋“ฑ์—์„œ ํ•ต์‹ฌ")
print("  - ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ๋กœ ๋ณ€์ˆ˜ ๊ฐ„ ์˜์กด์„ฑ ํ‘œํ˜„")

5. Advanced Bayes' Theorem

5.1 Bayesian Update

Prior โ†’ Data โ†’ Posterior

$$ P(\theta | D) = \frac{P(D | \theta) P(\theta)}{P(D)} \propto P(D | \theta) P(\theta) $$

  • $\theta$: parameter (treated as random variable)
  • $D$: observed data
  • $P(\theta)$: prior probability (belief before data)
  • $P(D | \theta)$: likelihood (plausibility of data given parameter)
  • $P(\theta | D)$: posterior probability (updated belief after data)

5.2 Example: Coin Flip (Beta-Binomial Model)

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt

# ๋ฒ ํƒ€-์ดํ•ญ ๋ชจ๋ธ: ๋™์ „์˜ ์•ž๋ฉด ํ™•๋ฅ  ์ถ”์ •
# ์‚ฌ์ „๋ถ„ํฌ: Beta(ฮฑ, ฮฒ)
# ์šฐ๋„: Binomial
# ์‚ฌํ›„๋ถ„ํฌ: Beta(ฮฑ + n_heads, ฮฒ + n_tails)

np.random.seed(42)

# ์ง„์งœ ๋™์ „ ํ™•๋ฅ  (์•Œ ์ˆ˜ ์—†๋‹ค๊ณ  ๊ฐ€์ •)
true_p = 0.7

# ์‚ฌ์ „๋ถ„ํฌ (๊ท ๋“ฑ ์‚ฌ์ „: Beta(1, 1))
alpha_prior, beta_prior = 1, 1

# ๋™์ „ ๋˜์ง€๊ธฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
n_flips_list = [0, 1, 5, 20, 100]
data = np.random.binomial(1, true_p, 100)

fig, axes = plt.subplots(2, 3, figsize=(18, 10))
axes = axes.flatten()

p_vals = np.linspace(0, 1, 1000)

for idx, n_flips in enumerate(n_flips_list):
    ax = axes[idx]

    if n_flips == 0:
        # ์‚ฌ์ „๋ถ„ํฌ๋งŒ
        prior_pdf = stats.beta.pdf(p_vals, alpha_prior, beta_prior)
        ax.plot(p_vals, prior_pdf, linewidth=3, color='blue', label='์‚ฌ์ „๋ถ„ํฌ')
    else:
        # ๋ฐ์ดํ„ฐ
        observed_data = data[:n_flips]
        n_heads = np.sum(observed_data)
        n_tails = n_flips - n_heads

        # ์‚ฌํ›„๋ถ„ํฌ
        alpha_post = alpha_prior + n_heads
        beta_post = beta_prior + n_tails
        posterior_pdf = stats.beta.pdf(p_vals, alpha_post, beta_post)

        # ์‚ฌ์ „๋ถ„ํฌ
        prior_pdf = stats.beta.pdf(p_vals, alpha_prior, beta_prior)

        ax.plot(p_vals, prior_pdf, linewidth=2, color='blue', linestyle='--',
                label='์‚ฌ์ „๋ถ„ํฌ', alpha=0.7)
        ax.plot(p_vals, posterior_pdf, linewidth=3, color='red', label='์‚ฌํ›„๋ถ„ํฌ')

        # MAP ์ถ”์ • (์ตœ๋Œ€ ์‚ฌํ›„ ํ™•๋ฅ )
        map_estimate = (alpha_post - 1) / (alpha_post + beta_post - 2)
        ax.axvline(map_estimate, color='red', linestyle=':', linewidth=2,
                   label=f'MAP = {map_estimate:.3f}')

    # ์ง„์งœ ํ™•๋ฅ 
    ax.axvline(true_p, color='green', linestyle='--', linewidth=2,
               label=f'์ง„์งœ p = {true_p}')

    ax.set_xlabel('p (์•ž๋ฉด ํ™•๋ฅ )', fontsize=11)
    ax.set_ylabel('๋ฐ€๋„', fontsize=11)
    ax.set_title(f'๋™์ „ {n_flips}๋ฒˆ ๋˜์ง„ ํ›„' if n_flips > 0 else '์‚ฌ์ „๋ถ„ํฌ', fontsize=12)
    ax.legend(fontsize=9)
    ax.grid(True, alpha=0.3)
    ax.set_xlim(0, 1)

# ์ˆ˜๋ ด ๊ณก์„ 
ax = axes[-1]
n_range = np.arange(1, 101)
map_estimates = []
for n in n_range:
    n_heads = np.sum(data[:n])
    n_tails = n - n_heads
    alpha_post = alpha_prior + n_heads
    beta_post = beta_prior + n_tails
    map_est = (alpha_post - 1) / (alpha_post + beta_post - 2)
    map_estimates.append(map_est)

ax.plot(n_range, map_estimates, linewidth=2, color='red', label='MAP ์ถ”์ •')
ax.axhline(true_p, color='green', linestyle='--', linewidth=2, label=f'์ง„์งœ p = {true_p}')
ax.set_xlabel('๋™์ „ ๋˜์ง„ ํšŸ์ˆ˜', fontsize=11)
ax.set_ylabel('์ถ”์ •๋œ p', fontsize=11)
ax.set_title('๋ฒ ์ด์ง€์•ˆ ํ•™์Šต ์ˆ˜๋ ด', fontsize=12)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('bayesian_update_coin.png', dpi=150)
plt.show()

print("๋ฒ ์ด์ง€์•ˆ ์—…๋ฐ์ดํŠธ:")
print("  - ๋ฐ์ดํ„ฐ๊ฐ€ ๋Š˜์–ด๋‚ ์ˆ˜๋ก ์‚ฌํ›„๋ถ„ํฌ๊ฐ€ ์ง„์งœ ๊ฐ’ ์ฃผ๋ณ€์— ์ง‘์ค‘")
print("  - ์‚ฌ์ „๋ถ„ํฌ์˜ ์˜ํ–ฅ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์•„์ง€๋ฉด ๊ฐ์†Œ")
print("  - ๋ถˆํ™•์‹ค์„ฑ์„ ๋ถ„ํฌ๋กœ ํ‘œํ˜„ (์  ์ถ”์ •์ด ์•„๋‹˜)")

6. Probability in Machine Learning

6.1 Generative vs Discriminative Models

Generative Model: - Models $P(X, Y) = P(Y)P(X|Y)$ - Learns data distribution per class - Prediction: $P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)}$ via Bayes' theorem - Examples: Naive Bayes, GMM, VAE, GAN

Discriminative Model: - Directly models $P(Y|X)$ - Learns only decision boundary - Examples: Logistic regression, SVM, neural networks

from sklearn.datasets import make_classification
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

# ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
X, y = make_classification(n_samples=300, n_features=2, n_informative=2,
                           n_redundant=0, n_clusters_per_class=1,
                           random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# ์ƒ์„ฑ ๋ชจ๋ธ: ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ
generative_model = GaussianNB()
generative_model.fit(X_train, y_train)

# ํŒ๋ณ„ ๋ชจ๋ธ: ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€
discriminative_model = LogisticRegression()
discriminative_model.fit(X_train, y_train)

# ์‹œ๊ฐํ™”
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# ๊ทธ๋ฆฌ๋“œ
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                     np.linspace(y_min, y_max, 200))

models = [
    (generative_model, '์ƒ์„ฑ ๋ชจ๋ธ (๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ)', axes[0]),
    (discriminative_model, 'ํŒ๋ณ„ ๋ชจ๋ธ (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€)', axes[1])
]

for model, title, ax in models:
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    ax.contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
    ax.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1],
               c='blue', marker='o', s=50, edgecolors='k', label='Class 0')
    ax.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1],
               c='red', marker='s', s=50, edgecolors='k', label='Class 1')

    score = model.score(X_test, y_test)
    ax.set_xlabel('Feature 1', fontsize=12)
    ax.set_ylabel('Feature 2', fontsize=12)
    ax.set_title(f'{title}\nTest Accuracy: {score:.3f}', fontsize=13)
    ax.legend(fontsize=10)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('generative_vs_discriminative.png', dpi=150)
plt.show()

print("์ƒ์„ฑ vs ํŒ๋ณ„:")
print("  ์ƒ์„ฑ: P(X,Y) ์ „์ฒด ๋ถ„ํฌ ๋ชจ๋ธ๋ง โ†’ ์ƒ˜ํ”Œ ์ƒ์„ฑ ๊ฐ€๋Šฅ")
print("  ํŒ๋ณ„: P(Y|X) ์กฐ๊ฑด๋ถ€๋งŒ โ†’ ์˜ˆ์ธก๋งŒ ๊ฐ€๋Šฅ, ๋ณดํ†ต ๋” ๋†’์€ ์„ฑ๋Šฅ")

6.2 Naive Bayes Classifier

Assumption: features are conditionally independent given the class

$$ P(X_1, \ldots, X_d | Y) = \prod_{i=1}^d P(X_i | Y) $$

Prediction: $$ \hat{y} = \arg\max_y P(Y=y) \prod_{i=1}^d P(X_i | Y=y) $$

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report

# ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ์˜ˆ์ œ
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
train = fetch_20newsgroups(subset='train', categories=categories, random_state=42)
test = fetch_20newsgroups(subset='test', categories=categories, random_state=42)

# ํŠน์ง• ์ถ”์ถœ (Bag-of-Words)
vectorizer = CountVectorizer(max_features=1000)
X_train = vectorizer.fit_transform(train.data)
X_test = vectorizer.transform(test.data)

# ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ํ•™์Šต
nb_model = MultinomialNB()
nb_model.fit(X_train, train.target)

# ์˜ˆ์ธก
y_pred = nb_model.predict(X_test)

print("๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ํ…์ŠคํŠธ ๋ถ„๋ฅ˜:")
print(classification_report(test.target, y_pred, target_names=test.target_names))

print("\n๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ์˜ ํŠน์ง•:")
print("  - ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ ๊ฐ€์ • (naive) โ†’ ๊ณ„์‚ฐ ํšจ์œจ์ ")
print("  - ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์—์„œ๋„ ์ž˜ ์ž‘๋™ (ํ…์ŠคํŠธ ๋ถ„๋ฅ˜)")
print("  - ํ™•๋ฅ ์  ํ•ด์„ ๊ฐ€๋Šฅ")
print("  - ์ž‘์€ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ํ•ฉ๋ฆฌ์  ์„ฑ๋Šฅ")

6.3 Probabilistic Graphical Models

  • Bayesian Network: represents conditional independence with directed acyclic graph (DAG)
  • Markov Random Field: undirected graph
  • Hidden Markov Model (HMM): inference of hidden states in time series data
  • Applications: speech recognition, natural language processing, computer vision
# ๊ฐ„๋‹จํ•œ ๋ฒ ์ด์ง€์•ˆ ๋„คํŠธ์›Œํฌ ์˜ˆ์ œ (๊ฐœ๋…์ )
import networkx as nx
import matplotlib.pyplot as plt

# ๋ฒ ์ด์ง€์•ˆ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ
# Rain โ†’ Sprinkler, Rain โ†’ Grass Wet, Sprinkler โ†’ Grass Wet
G = nx.DiGraph()
G.add_edges_from([('Rain', 'Sprinkler'), ('Rain', 'Grass Wet'),
                  ('Sprinkler', 'Grass Wet')])

plt.figure(figsize=(10, 6))
pos = {'Rain': (0.5, 1), 'Sprinkler': (0, 0), 'Grass Wet': (1, 0)}
nx.draw(G, pos, with_labels=True, node_size=3000, node_color='lightblue',
        font_size=12, font_weight='bold', arrowsize=20, arrows=True)
plt.title('๋ฒ ์ด์ง€์•ˆ ๋„คํŠธ์›Œํฌ: ๋น„ โ†’ ์Šคํ”„๋งํด๋Ÿฌ, ์ž”๋”” ์ –์Œ', fontsize=14)
plt.tight_layout()
plt.savefig('bayesian_network_example.png', dpi=150)
plt.show()

print("ํ™•๋ฅ ์  ๊ทธ๋ž˜ํ”„ ๋ชจ๋ธ:")
print("  - ๋ณ€์ˆ˜ ๊ฐ„ ์˜์กด์„ฑ์„ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„")
print("  - ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ์„ฑ์œผ๋กœ ๊ณ„์‚ฐ ํšจ์œจํ™”")
print("  - ์ถ”๋ก : ๊ด€์ธก๋œ ๋ณ€์ˆ˜๋กœ ์ˆจ๊ฒจ์ง„ ๋ณ€์ˆ˜ ์ถ”์ •")
print("  - ํ•™์Šต: ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ์™€ ํ™•๋ฅ  ํŒŒ๋ผ๋ฏธํ„ฐ ํ•™์Šต")

Practice Problems

  1. Bayes' Theorem Application: Design a spam filter using Bayes' theorem. Derive the formula for calculating spam probability given the presence of specific words, and implement with simple example data.

  2. Distribution Fitting: Use scipy.stats to fit a normal distribution to real data (e.g., height, test scores) and verify goodness-of-fit with Q-Q plot. If normal distribution is inadequate, try other distributions.

  3. Monte Carlo Integration: For $X \sim \mathcal{N}(0, 1)$, compute $\mathbb{E}[e^X]$ (1) analytically and (2) estimate via Monte Carlo sampling. Verify convergence as sample size increases.

  4. Bayesian Linear Regression: Implement linear regression from a Bayesian perspective. Assign normal prior to weights and update posterior with each data observation. Visualize posterior mean and uncertainty.

  5. Naive Bayes vs Logistic Regression: Compare performance of Naive Bayes (generative) and logistic regression (discriminative) on Iris dataset. Plot learning curves as training data size varies. Analyze which model is advantageous in which situations.

References

  • Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press
  • The bible of ML from a probabilistic viewpoint
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer
  • Chapter 1-2: Probability foundations and distributions
  • Wasserman, L. (2004). All of Statistics. Springer
  • Concise summary of statistics and probability
  • Koller, D., & Friedman, N. (2009). Probabilistic Graphical Models. MIT Press
  • Comprehensive textbook on probabilistic graphical models
  • SciPy Stats Documentation: https://docs.scipy.org/doc/scipy/reference/stats.html
  • Seeing Theory (probability/statistics visualization): https://seeing-theory.brown.edu/
  • Bayesian Methods for Hackers (online book): https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
to navigate between lessons