18. Introduction to Bayesian Statistics
18. Introduction to Bayesian Statistics¶
Previous: Generalized Linear Models | Next: Bayesian Inference
Overview¶
Bayesian statistics interprets probability as a degree of uncertainty and combines prior knowledge with data for inference. This chapter covers the differences between frequentist and Bayesian paradigms, Bayes' theorem, and core concepts including prior distributions, likelihood, and posterior distributions.
1. Frequentist vs Bayesian Paradigm¶
1.1 Differences in Probability Interpretation¶
| Perspective | Frequentist | Bayesian |
|---|---|---|
| Meaning of probability | Long-run frequency (infinite repetition) | Degree of uncertainty (belief) |
| Parameters | Fixed unknown constants | Random variables (have distributions) |
| Inference goal | Point estimates, confidence intervals | Entire posterior distribution |
| Prior information | Not used | Reflected in prior distribution |
| Interpretation | "95% confidence interval" (95% contain true value in repeated sampling) | "95% probability that true value is in the interval" |
1.2 Comparison Example: Coin Flipping¶
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
np.random.seed(42)
# Data: 10 flips, 7 heads
n_trials = 10
n_heads = 7
# === Frequentist Approach ===
# Point estimate: MLE
p_mle = n_heads / n_trials
print(f"[Frequentist] MLE estimate: {p_mle:.3f}")
# 95% confidence interval (Wald interval)
se = np.sqrt(p_mle * (1 - p_mle) / n_trials)
ci_freq = (p_mle - 1.96 * se, p_mle + 1.96 * se)
print(f"[Frequentist] 95% confidence interval: ({ci_freq[0]:.3f}, {ci_freq[1]:.3f})")
# === Bayesian Approach ===
# Prior: Beta(1, 1) = Uniform(0, 1)
alpha_prior, beta_prior = 1, 1
# Posterior: Beta(alpha + heads, beta + tails)
alpha_post = alpha_prior + n_heads
beta_post = beta_prior + (n_trials - n_heads)
posterior = stats.beta(alpha_post, beta_post)
# Posterior mean
p_bayes = posterior.mean()
print(f"\n[Bayesian] Posterior mean: {p_bayes:.3f}")
# 95% credible interval
ci_bayes = posterior.interval(0.95)
print(f"[Bayesian] 95% credible interval: ({ci_bayes[0]:.3f}, {ci_bayes[1]:.3f})")
1.3 Interpretation Differences¶
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
p_range = np.linspace(0, 1, 200)
# Frequentist: Likelihood function
likelihood = stats.binom.pmf(n_heads, n_trials, p_range)
likelihood = likelihood / likelihood.max() # Normalize
axes[0].plot(p_range, likelihood, 'b-', lw=2, label='Likelihood')
axes[0].axvline(p_mle, color='r', linestyle='--', label=f'MLE = {p_mle:.2f}')
axes[0].axvspan(ci_freq[0], ci_freq[1], alpha=0.3, color='r', label='95% CI')
axes[0].set_xlabel('p (probability of heads)')
axes[0].set_ylabel('Normalized likelihood')
axes[0].set_title('Frequentist: Likelihood Function and Confidence Interval')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Bayesian: Posterior distribution
prior_pdf = stats.beta(alpha_prior, beta_prior).pdf(p_range)
posterior_pdf = posterior.pdf(p_range)
axes[1].plot(p_range, prior_pdf, 'g--', lw=2, label='Prior: Beta(1,1)')
axes[1].plot(p_range, posterior_pdf, 'b-', lw=2, label=f'Posterior: Beta({alpha_post},{beta_post})')
axes[1].axvline(p_bayes, color='r', linestyle='--', label=f'Posterior mean = {p_bayes:.2f}')
axes[1].axvspan(ci_bayes[0], ci_bayes[1], alpha=0.3, color='b', label='95% Credible Interval')
axes[1].set_xlabel('p (probability of heads)')
axes[1].set_ylabel('Density')
axes[1].set_title('Bayesian: Prior and Posterior Distributions')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
2. Bayes' Theorem¶
2.1 Derivation of Bayes' Theorem¶
Starting from the definition of conditional probability:
$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$
$$P(B|A) = \frac{P(A \cap B)}{P(A)}$$
Eliminating P(A ∩ B) from these two equations:
$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$
Expressed in terms of parameter θ and data D:
$$P(\theta|D) = \frac{P(D|\theta) \cdot P(\theta)}{P(D)}$$
2.2 Terminology¶
| Term | Formula | Meaning |
|---|---|---|
| Posterior | P(θ|D) | Belief about parameter after observing data |
| Likelihood | P(D|θ) | Probability of observing data given parameter |
| Prior | P(θ) | Prior knowledge about parameter before data |
| Evidence (Marginal likelihood) | P(D) | Normalizing constant (integral over all θ) |
2.3 Implementing Bayes' Theorem¶
def bayes_theorem_discrete(prior: dict, likelihood: dict) -> dict:
"""
Calculate discrete Bayes' theorem
Parameters:
-----------
prior : dict
Prior probabilities {hypothesis: probability}
likelihood : dict
Likelihood {hypothesis: P(data|hypothesis)}
Returns:
--------
posterior : dict
Posterior probabilities {hypothesis: P(hypothesis|data)}
"""
# Unnormalized posterior
unnormalized = {h: prior[h] * likelihood[h] for h in prior}
# Marginal likelihood (normalizing constant)
evidence = sum(unnormalized.values())
# Normalize
posterior = {h: p / evidence for h, p in unnormalized.items()}
return posterior
# Example: Disease diagnosis
# Hypotheses: Disease present (D), Disease absent (~D)
prior = {
'D': 0.001, # Prevalence 0.1%
'~D': 0.999
}
# Likelihood when test is positive (+)
likelihood = {
'D': 0.99, # Sensitivity: P(+|D) = 99%
'~D': 0.05 # False positive rate: P(+|~D) = 5%
}
posterior = bayes_theorem_discrete(prior, likelihood)
print("=== Disease Diagnosis Bayes Update ===")
print(f"Prior P(disease) = {prior['D']:.4f}")
print(f"Likelihood P(positive|disease) = {likelihood['D']:.2f}")
print(f"Likelihood P(positive|healthy) = {likelihood['~D']:.2f}")
print(f"\nPosterior after positive test:")
print(f"P(disease|positive) = {posterior['D']:.4f} ({posterior['D']*100:.2f}%)")
print(f"P(healthy|positive) = {posterior['~D']:.4f}")
2.4 Sequential Updates¶
def sequential_bayes_update(prior_dist, data_points, likelihood_func):
"""
Sequential Bayes update (when data arrives one at a time)
Parameters:
-----------
prior_dist : scipy.stats distribution
Initial prior distribution
data_points : array-like
Observed data
likelihood_func : callable
likelihood_func(x, theta) -> P(x|theta)
"""
theta_range = np.linspace(0, 1, 1000)
current_prior = prior_dist.pdf(theta_range)
history = [current_prior.copy()]
for x in data_points:
# Calculate likelihood
likelihood = likelihood_func(x, theta_range)
# Unnormalized posterior
unnormalized_posterior = current_prior * likelihood
# Normalize (numerical integration)
posterior = unnormalized_posterior / np.trapz(unnormalized_posterior, theta_range)
history.append(posterior.copy())
current_prior = posterior
return theta_range, history
# Coin flip simulation
np.random.seed(123)
true_p = 0.7
data = np.random.binomial(1, true_p, size=20) # 20 flips
# Bernoulli likelihood
def bernoulli_likelihood(x, theta):
return theta**x * (1 - theta)**(1 - x)
# Sequential update
theta_range, history = sequential_bayes_update(
stats.beta(1, 1), # Uniform prior
data,
bernoulli_likelihood
)
# Visualization
fig, axes = plt.subplots(2, 3, figsize=(14, 8))
axes = axes.flatten()
update_points = [0, 1, 5, 10, 15, 20]
for i, n in enumerate(update_points):
axes[i].plot(theta_range, history[n], 'b-', lw=2)
axes[i].axvline(true_p, color='r', linestyle='--', alpha=0.7, label=f'True p = {true_p}')
axes[i].fill_between(theta_range, history[n], alpha=0.3)
axes[i].set_xlabel('θ')
axes[i].set_ylabel('Density')
axes[i].set_title(f'After n = {n} observations')
axes[i].legend()
axes[i].set_xlim(0, 1)
plt.tight_layout()
plt.suptitle('Sequential Bayes Update: Changes in Posterior Distribution', y=1.02)
plt.show()
print(f"Data: {data}")
print(f"Total heads: {data.sum()}, Total: {len(data)}")
3. Prior, Likelihood, and Posterior¶
3.1 Types of Prior Distributions¶
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
x = np.linspace(0, 1, 200)
# 1. Non-informative prior
ax = axes[0, 0]
uniform_prior = stats.beta(1, 1).pdf(x)
ax.plot(x, uniform_prior, 'b-', lw=2)
ax.set_title('Non-informative Prior\nBeta(1, 1) = Uniform(0, 1)')
ax.set_xlabel('θ')
ax.set_ylabel('Density')
ax.fill_between(x, uniform_prior, alpha=0.3)
# 2. Jeffreys prior
ax = axes[0, 1]
jeffreys_prior = stats.beta(0.5, 0.5).pdf(x)
ax.plot(x, jeffreys_prior, 'b-', lw=2)
ax.set_title('Jeffreys Prior\nBeta(0.5, 0.5)')
ax.set_xlabel('θ')
ax.set_ylabel('Density')
ax.fill_between(x, jeffreys_prior, alpha=0.3)
# 3. Weakly informative prior
ax = axes[0, 2]
weak_prior = stats.beta(2, 2).pdf(x)
ax.plot(x, weak_prior, 'b-', lw=2)
ax.set_title('Weakly Informative Prior\nBeta(2, 2)')
ax.set_xlabel('θ')
ax.set_ylabel('Density')
ax.fill_between(x, weak_prior, alpha=0.3)
# 4. Informative prior - centered at 0.3
ax = axes[1, 0]
informative_prior1 = stats.beta(3, 7).pdf(x)
ax.plot(x, informative_prior1, 'b-', lw=2)
ax.set_title('Informative Prior (low p)\nBeta(3, 7), mean=0.3')
ax.set_xlabel('θ')
ax.set_ylabel('Density')
ax.fill_between(x, informative_prior1, alpha=0.3)
# 5. Informative prior - centered at 0.7
ax = axes[1, 1]
informative_prior2 = stats.beta(7, 3).pdf(x)
ax.plot(x, informative_prior2, 'b-', lw=2)
ax.set_title('Informative Prior (high p)\nBeta(7, 3), mean=0.7')
ax.set_xlabel('θ')
ax.set_ylabel('Density')
ax.fill_between(x, informative_prior2, alpha=0.3)
# 6. Strong informative prior
ax = axes[1, 2]
strong_prior = stats.beta(30, 30).pdf(x)
ax.plot(x, strong_prior, 'b-', lw=2)
ax.set_title('Strong Informative Prior\nBeta(30, 30), mean=0.5')
ax.set_xlabel('θ')
ax.set_ylabel('Density')
ax.fill_between(x, strong_prior, alpha=0.3)
plt.tight_layout()
plt.show()
3.2 Influence of Prior Distributions¶
# Same data, different priors
n_trials, n_heads = 10, 7
priors = [
('Non-informative Beta(1,1)', 1, 1),
('Jeffreys Beta(0.5,0.5)', 0.5, 0.5),
('Weak Beta(2,2)', 2, 2),
('Informative Beta(5,5)', 5, 5),
('Strong Beta(20,20)', 20, 20),
]
fig, axes = plt.subplots(1, 5, figsize=(18, 4))
theta = np.linspace(0, 1, 200)
for i, (name, a, b) in enumerate(priors):
# Prior
prior = stats.beta(a, b)
# Posterior
a_post = a + n_heads
b_post = b + (n_trials - n_heads)
posterior = stats.beta(a_post, b_post)
# Visualization
axes[i].plot(theta, prior.pdf(theta), 'g--', lw=2, label='Prior')
axes[i].plot(theta, posterior.pdf(theta), 'b-', lw=2, label='Posterior')
axes[i].axvline(0.7, color='r', linestyle=':', alpha=0.7, label='MLE')
axes[i].axvline(posterior.mean(), color='purple', linestyle='--',
alpha=0.7, label=f'Post Mean={posterior.mean():.2f}')
axes[i].set_xlabel('θ')
axes[i].set_title(f'{name}')
axes[i].legend(fontsize=8)
axes[i].set_xlim(0, 1)
plt.suptitle(f'Influence of Prior (Data: {n_heads}/{n_trials} successes)', y=1.02)
plt.tight_layout()
plt.show()
3.3 Likelihood Function¶
def plot_likelihood_function(data, distribution='binomial'):
"""
Visualize likelihood function
"""
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
if distribution == 'binomial':
n = len(data)
k = sum(data)
theta_range = np.linspace(0.001, 0.999, 200)
# Likelihood function: L(θ) = θ^k * (1-θ)^(n-k)
likelihood = theta_range**k * (1 - theta_range)**(n - k)
log_likelihood = k * np.log(theta_range) + (n - k) * np.log(1 - theta_range)
# MLE
mle = k / n
# Likelihood function
axes[0].plot(theta_range, likelihood, 'b-', lw=2)
axes[0].axvline(mle, color='r', linestyle='--', label=f'MLE = {mle:.3f}')
axes[0].set_xlabel('θ')
axes[0].set_ylabel('L(θ)')
axes[0].set_title('Likelihood Function')
axes[0].legend()
axes[0].fill_between(theta_range, likelihood, alpha=0.3)
# Log-likelihood function
axes[1].plot(theta_range, log_likelihood, 'b-', lw=2)
axes[1].axvline(mle, color='r', linestyle='--', label=f'MLE = {mle:.3f}')
axes[1].set_xlabel('θ')
axes[1].set_ylabel('log L(θ)')
axes[1].set_title('Log-Likelihood Function')
axes[1].legend()
plt.tight_layout()
plt.show()
return mle
# Example
data = [1, 1, 1, 0, 1, 0, 1, 1, 0, 1] # 7/10 successes
mle = plot_likelihood_function(data)
print(f"Data: {data}")
print(f"Successes: {sum(data)}, Trials: {len(data)}")
print(f"MLE: {mle}")
4. Conjugate Priors¶
4.1 What are Conjugate Priors?¶
Definition: If the prior and posterior belong to the same distribution family, they are "conjugate".
Advantages: - Closed-form posterior calculation possible - No numerical methods needed - Sequential updates are simple
4.2 Beta-Binomial Conjugate¶
Model: - Likelihood: X ~ Binomial(n, θ) - Prior: θ ~ Beta(α, β) - Posterior: θ|X ~ Beta(α + x, β + n - x)
class BetaBinomialModel:
"""Beta-Binomial conjugate model"""
def __init__(self, alpha_prior=1, beta_prior=1):
"""
Parameters:
-----------
alpha_prior : float
Alpha parameter of Beta prior
beta_prior : float
Beta parameter of Beta prior
"""
self.alpha = alpha_prior
self.beta = beta_prior
self.n_observations = 0
self.n_successes = 0
@property
def prior(self):
return stats.beta(self.alpha, self.beta)
@property
def posterior(self):
return stats.beta(
self.alpha + self.n_successes,
self.beta + self.n_observations - self.n_successes
)
def update(self, n_successes, n_trials):
"""Update posterior with data"""
self.n_successes += n_successes
self.n_observations += n_trials
return self
def posterior_mean(self):
return self.posterior.mean()
def posterior_std(self):
return self.posterior.std()
def credible_interval(self, alpha=0.05):
"""Calculate credible interval"""
return self.posterior.interval(1 - alpha)
def posterior_predictive(self, n_trials):
"""
Posterior predictive distribution: number of successes in new n_trials
Beta-Binomial distribution
"""
a = self.alpha + self.n_successes
b = self.beta + self.n_observations - self.n_successes
return stats.betabinom(n_trials, a, b)
def summary(self):
"""Model summary"""
print("=== Beta-Binomial Model Summary ===")
print(f"Prior: Beta({self.alpha}, {self.beta})")
print(f"Data: {self.n_successes} successes / {self.n_observations} trials")
print(f"Posterior: Beta({self.alpha + self.n_successes}, "
f"{self.beta + self.n_observations - self.n_successes})")
print(f"Posterior Mean: {self.posterior_mean():.4f}")
print(f"Posterior Std: {self.posterior_std():.4f}")
ci = self.credible_interval()
print(f"95% Credible Interval: ({ci[0]:.4f}, {ci[1]:.4f})")
# Example: Estimating click-through rate
model = BetaBinomialModel(alpha_prior=2, beta_prior=8) # Prior: 20% CTR
# 1st data: 35 clicks out of 100
model.update(35, 100)
print("After 1st data:")
model.summary()
# 2nd data: 22 clicks out of 50
model.update(22, 50)
print("\nAfter 2nd data:")
model.summary()
# Visualization
theta = np.linspace(0, 1, 200)
fig, ax = plt.subplots(figsize=(10, 6))
# Prior
prior = stats.beta(2, 8)
ax.plot(theta, prior.pdf(theta), 'g--', lw=2, label='Prior: Beta(2,8)')
# Intermediate posterior (after 1st data)
post1 = stats.beta(2 + 35, 8 + 100 - 35)
ax.plot(theta, post1.pdf(theta), 'orange', lw=2, linestyle='-.',
label='After 1st data: Beta(37, 73)')
# Final posterior
ax.plot(theta, model.posterior.pdf(theta), 'b-', lw=2,
label=f'Final Posterior: Beta(59, 95)')
ax.fill_between(theta, model.posterior.pdf(theta), alpha=0.3)
ax.axvline(model.posterior_mean(), color='r', linestyle=':',
label=f'Posterior Mean = {model.posterior_mean():.3f}')
ax.set_xlabel('θ (click-through rate)')
ax.set_ylabel('Density')
ax.set_title('Beta-Binomial Conjugate: Click-through Rate Estimation')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
4.3 Normal-Normal Conjugate (Known Variance)¶
Model: - Likelihood: X ~ N(μ, σ²) (σ² known) - Prior: μ ~ N(μ₀, σ₀²) - Posterior: μ|X ~ N(μₙ, σₙ²)
$$\mu_n = \frac{\frac{\mu_0}{\sigma_0^2} + \frac{n\bar{x}}{\sigma^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}$$
$$\sigma_n^2 = \frac{1}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}$$
class NormalNormalModel:
"""Normal-Normal conjugate model (known variance)"""
def __init__(self, mu_prior=0, sigma_prior=10, sigma_known=1):
"""
Parameters:
-----------
mu_prior : float
Prior mean
sigma_prior : float
Prior standard deviation
sigma_known : float
Known data standard deviation
"""
self.mu_0 = mu_prior
self.sigma_0 = sigma_prior
self.sigma = sigma_known
self.data = []
@property
def n(self):
return len(self.data)
@property
def x_bar(self):
return np.mean(self.data) if self.data else 0
@property
def prior(self):
return stats.norm(self.mu_0, self.sigma_0)
@property
def posterior_precision(self):
"""Posterior precision (inverse of variance)"""
return 1/self.sigma_0**2 + self.n/self.sigma**2
@property
def posterior_variance(self):
return 1 / self.posterior_precision
@property
def posterior_mean(self):
if self.n == 0:
return self.mu_0
return (self.mu_0/self.sigma_0**2 + self.n*self.x_bar/self.sigma**2) / \
self.posterior_precision
@property
def posterior_std(self):
return np.sqrt(self.posterior_variance)
@property
def posterior(self):
return stats.norm(self.posterior_mean, self.posterior_std)
def update(self, data):
"""Add data"""
if isinstance(data, (list, np.ndarray)):
self.data.extend(data)
else:
self.data.append(data)
return self
def credible_interval(self, alpha=0.05):
return self.posterior.interval(1 - alpha)
def summary(self):
print("=== Normal-Normal Model Summary ===")
print(f"Prior: N({self.mu_0}, {self.sigma_0}²)")
print(f"Known σ: {self.sigma}")
print(f"Data: n={self.n}, x̄={self.x_bar:.4f}" if self.n > 0 else "Data: none")
print(f"Posterior: N({self.posterior_mean:.4f}, {self.posterior_std:.4f}²)")
ci = self.credible_interval()
print(f"95% Credible Interval: ({ci[0]:.4f}, {ci[1]:.4f})")
# Example: Body temperature measurement
# Prior knowledge: Average temperature about 36.5°C, uncertainty σ=0.5
# Measurement error σ=0.2 (known)
model = NormalNormalModel(mu_prior=36.5, sigma_prior=0.5, sigma_known=0.2)
print("Prior:")
model.summary()
# Data collection: Patient temperature measurements
measurements = [37.1, 37.3, 36.9, 37.2, 37.0]
model.update(measurements)
print("\nAfter data update:")
model.summary()
# Visualization
mu_range = np.linspace(35.5, 38, 200)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(mu_range, model.prior.pdf(mu_range), 'g--', lw=2,
label=f'Prior: N({model.mu_0}, {model.sigma_0}²)')
ax.plot(mu_range, model.posterior.pdf(mu_range), 'b-', lw=2,
label=f'Posterior: N({model.posterior_mean:.2f}, {model.posterior_std:.3f}²)')
ax.fill_between(mu_range, model.posterior.pdf(mu_range), alpha=0.3)
# Show individual measurements
for i, x in enumerate(measurements):
ax.axvline(x, color='orange', linestyle=':', alpha=0.5,
label='Measurements' if i == 0 else '')
ax.axvline(np.mean(measurements), color='r', linestyle='--',
label=f'Sample Mean = {np.mean(measurements):.2f}')
ax.set_xlabel('μ (body temperature)')
ax.set_ylabel('Density')
ax.set_title('Normal-Normal Conjugate: Body Temperature Estimation')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
4.4 Gamma-Poisson Conjugate¶
Model: - Likelihood: X ~ Poisson(λ) - Prior: λ ~ Gamma(α, β) - Posterior: λ|X ~ Gamma(α + Σxᵢ, β + n)
class GammaPoissonModel:
"""Gamma-Poisson conjugate model"""
def __init__(self, alpha_prior=1, beta_prior=1):
"""
Parameters:
-----------
alpha_prior : float (shape)
beta_prior : float (rate)
"""
self.alpha = alpha_prior
self.beta = beta_prior
self.data = []
@property
def n(self):
return len(self.data)
@property
def sum_x(self):
return sum(self.data)
@property
def prior(self):
return stats.gamma(self.alpha, scale=1/self.beta)
@property
def posterior(self):
alpha_post = self.alpha + self.sum_x
beta_post = self.beta + self.n
return stats.gamma(alpha_post, scale=1/beta_post)
def update(self, data):
if isinstance(data, (list, np.ndarray)):
self.data.extend(data)
else:
self.data.append(data)
return self
def posterior_mean(self):
return (self.alpha + self.sum_x) / (self.beta + self.n)
def posterior_std(self):
return self.posterior.std()
def credible_interval(self, alpha=0.05):
return self.posterior.interval(1 - alpha)
def posterior_predictive(self):
"""
Posterior predictive distribution: Negative Binomial
"""
alpha_post = self.alpha + self.sum_x
beta_post = self.beta + self.n
p = beta_post / (beta_post + 1)
return stats.nbinom(alpha_post, p)
def summary(self):
print("=== Gamma-Poisson Model Summary ===")
print(f"Prior: Gamma({self.alpha}, {self.beta})")
print(f"Data: n={self.n}, Σx={self.sum_x}")
alpha_post = self.alpha + self.sum_x
beta_post = self.beta + self.n
print(f"Posterior: Gamma({alpha_post}, {beta_post})")
print(f"Posterior Mean (λ): {self.posterior_mean():.4f}")
print(f"Posterior Std: {self.posterior_std():.4f}")
ci = self.credible_interval()
print(f"95% Credible Interval: ({ci[0]:.4f}, {ci[1]:.4f})")
# Example: Call center call count estimation
# Prior: Expect about 10 calls per hour, with uncertainty
model = GammaPoissonModel(alpha_prior=10, beta_prior=1) # Mean 10, variance 10
print("Prior:")
model.summary()
# Call counts over 5 hours
calls = [8, 12, 15, 9, 11]
model.update(calls)
print("\nAfter data:")
model.summary()
# Visualization
lambda_range = np.linspace(0, 25, 200)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Posterior
ax = axes[0]
ax.plot(lambda_range, model.prior.pdf(lambda_range), 'g--', lw=2, label='Prior')
ax.plot(lambda_range, model.posterior.pdf(lambda_range), 'b-', lw=2, label='Posterior')
ax.fill_between(lambda_range, model.posterior.pdf(lambda_range), alpha=0.3)
ax.axvline(model.posterior_mean(), color='r', linestyle='--',
label=f'Post Mean = {model.posterior_mean():.2f}')
ax.axvline(np.mean(calls), color='orange', linestyle=':',
label=f'Sample Mean = {np.mean(calls):.1f}')
ax.set_xlabel('λ (calls per hour)')
ax.set_ylabel('Density')
ax.set_title('Gamma-Poisson Conjugate: Call Rate Estimation')
ax.legend()
ax.grid(True, alpha=0.3)
# Posterior predictive distribution
ax = axes[1]
x_pred = np.arange(0, 30)
pred_dist = model.posterior_predictive()
ax.bar(x_pred, pred_dist.pmf(x_pred), alpha=0.6, label='Posterior Predictive')
ax.axvline(pred_dist.mean(), color='r', linestyle='--',
label=f'Expected = {pred_dist.mean():.1f}')
ax.set_xlabel('Next hour call count')
ax.set_ylabel('Probability')
ax.set_title('Posterior Predictive Distribution')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
4.5 Conjugate Prior Summary Table¶
| Likelihood | Prior | Posterior | Update Rule |
|---|---|---|---|
| Binomial(n, p) | Beta(α, β) | Beta(α+x, β+n-x) | α←α+x, β←β+(n-x) |
| Poisson(λ) | Gamma(α, β) | Gamma(α+Σx, β+n) | α←α+Σx, β←β+n |
| Normal(μ, σ²) | Normal(μ₀, σ₀²) | Normal(μₙ, σₙ²) | Precision-weighted average |
| Exponential(λ) | Gamma(α, β) | Gamma(α+n, β+Σx) | α←α+n, β←β+Σx |
| Multinomial | Dirichlet | Dirichlet | α←α+counts |
5. MAP Estimation (Maximum A Posteriori)¶
5.1 MAP vs MLE¶
MLE: Maximize likelihood $$\hat{\theta}_{MLE} = \arg\max_\theta P(D|\theta)$$
MAP: Maximize posterior $$\hat{\theta}_{MAP} = \arg\max_\theta P(\theta|D) = \arg\max_\theta P(D|\theta)P(\theta)$$
def compare_mle_map(n_trials, n_successes, alpha_prior, beta_prior):
"""Compare MLE and MAP"""
# MLE
mle = n_successes / n_trials
# MAP for Beta-Binomial
# posterior: Beta(alpha + k, beta + n - k)
# mode of Beta(a, b) = (a - 1) / (a + b - 2) when a, b > 1
a = alpha_prior + n_successes
b = beta_prior + (n_trials - n_successes)
if a > 1 and b > 1:
map_est = (a - 1) / (a + b - 2)
else:
# Mode at boundary
map_est = a / (a + b) # Use posterior mean
# Posterior mean
posterior_mean = a / (a + b)
return mle, map_est, posterior_mean
# Compare across various priors and data sizes
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
scenarios = [
(10, 7, 'n=10, k=7'),
(100, 70, 'n=100, k=70'),
(10, 3, 'n=10, k=3'),
(100, 30, 'n=100, k=30'),
]
priors = [
('Uniform', 1, 1),
('Weak', 2, 2),
('Informative 0.5', 10, 10),
('Strong 0.5', 50, 50),
]
for i, (n, k, title) in enumerate(scenarios):
ax = axes.flatten()[i]
x = np.arange(len(priors))
width = 0.25
mles, maps, means = [], [], []
for name, a, b in priors:
mle, map_est, post_mean = compare_mle_map(n, k, a, b)
mles.append(mle)
maps.append(map_est)
means.append(post_mean)
ax.bar(x - width, mles, width, label='MLE', alpha=0.8)
ax.bar(x, maps, width, label='MAP', alpha=0.8)
ax.bar(x + width, means, width, label='Posterior Mean', alpha=0.8)
ax.axhline(k/n, color='r', linestyle='--', alpha=0.5, label=f'True ratio = {k/n:.2f}')
ax.set_xticks(x)
ax.set_xticklabels([p[0] for p in priors], rotation=15)
ax.set_ylabel('Estimate')
ax.set_title(title)
ax.legend(loc='best', fontsize=8)
ax.set_ylim(0, 1)
ax.grid(True, alpha=0.3, axis='y')
plt.suptitle('MLE vs MAP vs Posterior Mean: Influence of Prior', y=1.02)
plt.tight_layout()
plt.show()
5.2 Numerical MAP Estimation¶
from scipy.optimize import minimize_scalar, minimize
def map_estimation_normal(data, prior_mean, prior_std, known_std):
"""
MAP estimation for normal data (numerical method)
"""
n = len(data)
x_bar = np.mean(data)
# Negative log posterior (to minimize)
def neg_log_posterior(mu):
# Log likelihood
log_likelihood = -n * np.log(known_std) - \
0.5 * np.sum((data - mu)**2) / known_std**2
# Log prior
log_prior = -0.5 * ((mu - prior_mean)**2) / prior_std**2
return -(log_likelihood + log_prior)
# Optimization
result = minimize_scalar(neg_log_posterior, bounds=(x_bar - 3*known_std, x_bar + 3*known_std))
return result.x
# Example
np.random.seed(42)
true_mu = 5
data = np.random.normal(true_mu, 1, 10)
# MAP with various priors
priors = [
(5, 100), # Very weak prior
(5, 2), # Weak prior
(0, 1), # Wrong strong prior
(5, 0.5), # Correct strong prior
]
print(f"True μ: {true_mu}")
print(f"Sample mean: {np.mean(data):.4f}")
print(f"Sample size: {len(data)}")
print("\nMAP estimates with different priors:")
for prior_mean, prior_std in priors:
map_est = map_estimation_normal(data, prior_mean, prior_std, known_std=1)
print(f"Prior N({prior_mean}, {prior_std}²): MAP = {map_est:.4f}")
5.3 MAP and Regularization Connection¶
# MAP estimation is equivalent to regularization
# Beta(α, β) prior for Binomial → Similar effect to L2 regularization
def show_map_regularization_connection():
"""Connection between MAP and regularization"""
print("=== Relationship between MAP and Regularization ===")
print()
print("1. In logistic regression:")
print(" - Normal prior N(0, σ²) → L2 regularization (Ridge)")
print(" - Laplace prior → L1 regularization (Lasso)")
print()
print("2. In linear regression:")
print(" - MAP with N(0, σ²) prior ≡ Ridge regression")
print(" - λ = σ²_noise / σ²_prior")
print()
print("3. Bayesian perspective:")
print(" - Regularization strength = Confidence in prior")
print(" - Strong prior → Strong regularization")
print(" - Non-informative prior → No regularization (MLE)")
show_map_regularization_connection()
# Visual example
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Ridge vs MAP
theta = np.linspace(-3, 3, 200)
# Ridge perspective: L2 penalty
ridge_penalty = theta**2
axes[0].plot(theta, ridge_penalty, 'b-', lw=2, label='L2 penalty: λθ²')
axes[0].set_xlabel('θ')
axes[0].set_ylabel('Penalty')
axes[0].set_title('Ridge Regression: L2 Penalty')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# MAP perspective: Normal prior
prior_pdf = stats.norm(0, 1).pdf(theta)
neg_log_prior = -np.log(prior_pdf + 1e-10)
axes[1].plot(theta, neg_log_prior, 'r-', lw=2, label='-log P(θ) for N(0, 1)')
axes[1].set_xlabel('θ')
axes[1].set_ylabel('-log Prior')
axes[1].set_title('MAP: Negative Log Prior')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
6. Practice Examples¶
6.1 Bayesian A/B Test¶
class BayesianABTest:
"""Bayesian A/B Test"""
def __init__(self, alpha_prior=1, beta_prior=1):
self.alpha_prior = alpha_prior
self.beta_prior = beta_prior
def fit(self, successes_A, trials_A, successes_B, trials_B):
"""Fit data"""
self.successes_A = successes_A
self.trials_A = trials_A
self.successes_B = successes_B
self.trials_B = trials_B
# Posteriors
self.posterior_A = stats.beta(
self.alpha_prior + successes_A,
self.beta_prior + trials_A - successes_A
)
self.posterior_B = stats.beta(
self.alpha_prior + successes_B,
self.beta_prior + trials_B - successes_B
)
def prob_B_better(self, n_samples=100000):
"""Calculate P(B > A) by simulation"""
samples_A = self.posterior_A.rvs(n_samples)
samples_B = self.posterior_B.rvs(n_samples)
return (samples_B > samples_A).mean()
def expected_lift(self, n_samples=100000):
"""Expected lift"""
samples_A = self.posterior_A.rvs(n_samples)
samples_B = self.posterior_B.rvs(n_samples)
lift = (samples_B - samples_A) / samples_A
return lift.mean(), np.percentile(lift, [2.5, 97.5])
def summary(self):
print("=== Bayesian A/B Test Results ===")
print(f"\nGroup A: {self.successes_A}/{self.trials_A} "
f"({self.successes_A/self.trials_A*100:.1f}%)")
print(f"Group B: {self.successes_B}/{self.trials_B} "
f"({self.successes_B/self.trials_B*100:.1f}%)")
print(f"\nPosterior Mean A: {self.posterior_A.mean():.4f}")
print(f"Posterior Mean B: {self.posterior_B.mean():.4f}")
prob_b_better = self.prob_B_better()
print(f"\nP(B > A): {prob_b_better:.4f} ({prob_b_better*100:.1f}%)")
lift_mean, lift_ci = self.expected_lift()
print(f"Expected Lift: {lift_mean*100:.2f}%")
print(f"95% CI for Lift: ({lift_ci[0]*100:.2f}%, {lift_ci[1]*100:.2f}%)")
def plot(self):
"""Visualize results"""
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
p_range = np.linspace(0, 1, 200)
# Compare posteriors
ax = axes[0]
ax.plot(p_range, self.posterior_A.pdf(p_range), 'b-', lw=2, label='A')
ax.plot(p_range, self.posterior_B.pdf(p_range), 'r-', lw=2, label='B')
ax.fill_between(p_range, self.posterior_A.pdf(p_range), alpha=0.3)
ax.fill_between(p_range, self.posterior_B.pdf(p_range), alpha=0.3)
ax.set_xlabel('Conversion rate')
ax.set_ylabel('Density')
ax.set_title('Posterior distributions of conversion rates')
ax.legend()
ax.grid(True, alpha=0.3)
# Difference distribution
ax = axes[1]
samples_A = self.posterior_A.rvs(50000)
samples_B = self.posterior_B.rvs(50000)
diff = samples_B - samples_A
ax.hist(diff, bins=50, density=True, alpha=0.7, edgecolor='black')
ax.axvline(0, color='r', linestyle='--', label='No difference')
ax.axvline(diff.mean(), color='g', linestyle='-',
label=f'Mean diff = {diff.mean():.4f}')
ax.set_xlabel('B - A')
ax.set_ylabel('Density')
ax.set_title('Conversion rate difference distribution')
ax.legend()
ax.grid(True, alpha=0.3)
# Lift distribution
ax = axes[2]
lift = (samples_B - samples_A) / samples_A
lift = lift[(lift > -1) & (lift < 2)] # Remove extremes
ax.hist(lift, bins=50, density=True, alpha=0.7, edgecolor='black')
ax.axvline(0, color='r', linestyle='--', label='No lift')
ax.axvline(np.median(lift), color='g', linestyle='-',
label=f'Median = {np.median(lift)*100:.1f}%')
ax.set_xlabel('Lift (B-A)/A')
ax.set_ylabel('Density')
ax.set_title('Lift distribution')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Example: Website button color A/B test
ab_test = BayesianABTest(alpha_prior=1, beta_prior=1)
ab_test.fit(
successes_A=48, trials_A=500, # Original button: 48/500 clicks
successes_B=63, trials_B=500 # New button: 63/500 clicks
)
ab_test.summary()
ab_test.plot()
6.2 Bayesian Approach to Quality Control¶
def bayesian_quality_control():
"""Bayesian analysis for manufacturing quality control"""
# Scenario: Defect rate estimation
# Prior knowledge: Defect rate about 5% from past experience, moderate confidence
# Prior: Beta(2, 38) → Mean 5%, equivalent information from sample size 40
alpha_prior = 2
beta_prior = 38
prior = stats.beta(alpha_prior, beta_prior)
print("=== Bayesian Analysis for Manufacturing Quality Control ===")
print(f"Prior: Beta({alpha_prior}, {beta_prior})")
print(f"Prior mean (expected defect rate): {prior.mean()*100:.1f}%")
print(f"Prior 95% interval: ({prior.ppf(0.025)*100:.2f}%, {prior.ppf(0.975)*100:.2f}%)")
# New data: 8 defects out of 100 samples
n_samples = 100
n_defects = 8
# Posterior
alpha_post = alpha_prior + n_defects
beta_post = beta_prior + (n_samples - n_defects)
posterior = stats.beta(alpha_post, beta_post)
print(f"\nData: {n_defects}/{n_samples} defects")
print(f"Posterior: Beta({alpha_post}, {beta_post})")
print(f"Posterior mean (estimated defect rate): {posterior.mean()*100:.2f}%")
ci = posterior.interval(0.95)
print(f"Posterior 95% credible interval: ({ci[0]*100:.2f}%, {ci[1]*100:.2f}%)")
# Decision: Probability of defect rate exceeding 10%
prob_exceed_10 = 1 - posterior.cdf(0.10)
print(f"\nP(defect rate > 10%): {prob_exceed_10*100:.2f}%")
if prob_exceed_10 > 0.05:
print("⚠️ Recommendation: Quality inspection needed (>10% probability exceeds 5%)")
else:
print("✓ Normal: Quality standard met")
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
p_range = np.linspace(0, 0.25, 200)
# Prior/posterior comparison
ax = axes[0]
ax.plot(p_range, prior.pdf(p_range), 'g--', lw=2, label='Prior')
ax.plot(p_range, posterior.pdf(p_range), 'b-', lw=2, label='Posterior')
ax.fill_between(p_range, posterior.pdf(p_range), alpha=0.3)
ax.axvline(0.10, color='r', linestyle=':', lw=2, label='Threshold (10%)')
ax.axvline(posterior.mean(), color='purple', linestyle='--',
label=f'Post Mean = {posterior.mean()*100:.1f}%')
ax.set_xlabel('Defect rate')
ax.set_ylabel('Density')
ax.set_title('Prior/Posterior distributions of defect rate')
ax.legend()
ax.grid(True, alpha=0.3)
# Probability of exceeding threshold
ax = axes[1]
thresholds = np.linspace(0.01, 0.20, 50)
exceed_probs = [1 - posterior.cdf(t) for t in thresholds]
ax.plot(thresholds * 100, exceed_probs, 'b-', lw=2)
ax.axvline(10, color='r', linestyle='--', label='Current threshold (10%)')
ax.axhline(0.05, color='orange', linestyle=':', label='Significance level (5%)')
ax.set_xlabel('Defect rate threshold (%)')
ax.set_ylabel('Probability of exceedance')
ax.set_title('Probability of exceeding threshold by defect rate')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
bayesian_quality_control()
7. Practice Problems¶
Problem 1: Apply Bayes' Theorem¶
You want to implement a spam filter. - P(spam) = 0.3 (30% are spam) - P("free"|spam) = 0.8 (80% of spam contains "free") - P("free"|legitimate) = 0.1 (10% of legitimate emails contain "free")
Calculate the probability that an email containing "free" is spam.
Problem 2: Choose Conjugate Prior¶
Select appropriate conjugate priors for the following situations and explain: 1. Estimating website click-through rate 2. Estimating call center calls per hour 3. Estimating average product weight (variance is known)
Problem 3: Prior Sensitivity Analysis¶
Perform MAP estimation with the following priors for the same data (100 trials, 30 successes): 1. Beta(1, 1) 2. Beta(5, 5) 3. Beta(1, 9) 4. Beta(9, 1)
Compare MAP estimates with MLE for each case and analyze the influence of priors on results.
Problem 4: Sequential Update¶
10 customers visit daily, and purchase data for 3 days is [3, 5, 4]. Starting with Beta(1, 1) prior, update posterior with each day's data. Find the mean and 95% credible interval of the final posterior.
8. Key Summary¶
Core Concepts of Bayesian Statistics¶
- Meaning of probability: Degree of uncertainty (belief)
- Bayes' theorem: Prior knowledge + Data → Posterior knowledge
- Conjugate priors: Closed-form posterior calculation possible
Major Conjugate Pairs¶
Binomial + Beta → Beta
Poisson + Gamma → Gamma
Normal(μ, σ²_known) + Normal → Normal
Comparison of Estimation Methods¶
| Method | Objective | Feature |
|---|---|---|
| MLE | max P(D|θ) | Uses data only |
| MAP | max P(θ|D) | Reflects prior |
| Posterior mean | E[θ|D] | Reflects full uncertainty |
Preview of Next Chapter¶
Chapter 09 Bayesian Inference covers: - MCMC methods (Metropolis-Hastings, Gibbs Sampling) - Bayesian modeling with PyMC - Bayesian regression analysis - Model comparison and selection