24. ์คํ ์ค๊ณ (Experimental Design)
24. ์คํ ์ค๊ณ (Experimental Design)¶
์ด์ : ๋น๋ชจ์ ํต๊ณ | ๋ค์: ์ค์ ํ๋ก์ ํธ
๊ฐ์¶
์คํ ์ค๊ณ๋ ์ธ๊ณผ๊ด๊ณ๋ฅผ ์ถ๋ก ํ๊ธฐ ์ํ ์ฒด๊ณ์ ์ธ ๋ฐฉ๋ฒ๋ก ์ ๋๋ค. ์ด ์ฅ์์๋ ์คํ ์ค๊ณ์ ๊ธฐ๋ณธ ์๋ฆฌ, A/B ํ ์คํธ, ๊ฒ์ ๋ ฅ ๋ถ์์ ํตํ ํ๋ณธ ํฌ๊ธฐ ๊ฒฐ์ , ๊ทธ๋ฆฌ๊ณ ์์ฐจ์ ๊ฒ์ ๋ฐฉ๋ฒ์ ํ์ตํฉ๋๋ค.
1. ์คํ ์ค๊ณ์ ๊ธฐ๋ณธ ์๋ฆฌ¶
1.1 ์ธ ๊ฐ์ง ํต์ฌ ์๋ฆฌ¶
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
from scipy.stats import norm, t
np.random.seed(42)
def experimental_design_principles():
"""์คํ ์ค๊ณ์ ์ธ ๊ฐ์ง ํต์ฌ ์๋ฆฌ"""
print("""
=================================================
์คํ ์ค๊ณ์ ์ธ ๊ฐ์ง ํต์ฌ ์๋ฆฌ
=================================================
1. ๋ฌด์์ํ (Randomization)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- ํผํ์๋ฅผ ์ฒ๋ฆฌ๊ตฐ์ ๋ฌด์์๋ก ๋ฐฐ์
- ๊ต๋๋ณ์์ ์ํฅ์ ๊ท ๋ฑํ๊ฒ ๋ถ๋ฐฐ
- ์ธ๊ณผ๊ด๊ณ ์ถ๋ก ์ ๊ธฐ์ด
์์:
- ๋์ ๋์ง๊ธฐ๋ก A/B ๊ทธ๋ฃน ๋ฐฐ์
- ์ปดํจํฐ ์์ฑ ๋์ ์ฌ์ฉ
- ๋ธ๋ก ๋ฌด์์ํ (์ธตํ ํ ๋ฌด์์)
2. ๋ฐ๋ณต (Replication)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- ์ถฉ๋ถํ ์์ ๋
๋ฆฝ์ ๊ด์ธก
- ํต๊ณ์ ๊ฒ์ ๋ ฅ ํ๋ณด
- ๋ณ๋์ฑ ์ถ์ ๊ฐ๋ฅ
๊ณ ๋ ค์ฌํญ:
- ํ๋ณธ ํฌ๊ธฐ ๊ณ์ฐ (๊ฒ์ ๋ ฅ ๋ถ์)
- ๋น์ฉ ๋๋น ํจ๊ณผ
- ์ค์ฉ์ ์ ์ฝ
3. ๋ธ๋กํน (Blocking)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- ์๋ ค์ง ๋ณ๋ ์์ธ์ผ๋ก ํผํ์ ๊ทธ๋ฃนํ
- ๊ทธ๋ฃน ๋ด์์ ๋ฌด์์ ๋ฐฐ์
- ์ค์ฐจ ๊ฐ์, ๊ฒ์ ๋ ฅ ํฅ์
์์:
- ์ฑ๋ณ๋ก ๋ธ๋ก โ ๊ฐ ๋ธ๋ก ๋ด ๋ฌด์์ ๋ฐฐ์
- ์ฐ๋ น๋๋ก ์ธตํ
- ์ง์ญ, ์๊ฐ๋ ๋ฑ
=================================================
์ถ๊ฐ ์๋ฆฌ
=================================================
- ํต์ (Control): ๋์กฐ๊ตฐ ํฌํจ
- ๋งน๊ฒ (Blinding): ๋จ์ผ/์ด์ค ๋งน๊ฒ
- ๊ท ํ (Balance): ๊ทธ๋ฃน ๊ฐ ๊ท ๋ฑ ๋ฐฐ์
""")
experimental_design_principles()
1.2 ๋ฌด์์ํ ๊ตฌํ¶
def randomize_participants(participants, n_groups=2, method='simple', block_var=None):
"""
ํผํ์ ๋ฌด์์ํ
Parameters:
-----------
participants : DataFrame
ํผํ์ ์ ๋ณด
n_groups : int
๊ทธ๋ฃน ์
method : str
'simple' - ๋จ์ ๋ฌด์์
'stratified' - ์ธตํ ๋ฌด์์
block_var : str
์ธตํ ๋ณ์ (method='stratified'์ผ ๋)
"""
n = len(participants)
result = participants.copy()
if method == 'simple':
# ๋จ์ ๋ฌด์์ ๋ฐฐ์
assignments = np.random.choice(range(n_groups), size=n)
result['group'] = assignments
elif method == 'stratified' and block_var is not None:
# ์ธตํ ๋ฌด์์ ๋ฐฐ์
result['group'] = -1
for block_value in participants[block_var].unique():
mask = participants[block_var] == block_value
block_n = mask.sum()
assignments = np.random.choice(range(n_groups), size=block_n)
result.loc[mask, 'group'] = assignments
return result
# ์์: 100๋ช
์ ํผํ์
np.random.seed(42)
participants = pd.DataFrame({
'id': range(100),
'age': np.random.choice(['young', 'middle', 'old'], 100),
'gender': np.random.choice(['M', 'F'], 100)
})
# ๋จ์ ๋ฌด์์
simple_rand = randomize_participants(participants, n_groups=2, method='simple')
# ์ธตํ ๋ฌด์์ (์ฑ๋ณ ๊ธฐ์ค)
stratified_rand = randomize_participants(participants, n_groups=2,
method='stratified', block_var='gender')
print("=== ๋จ์ ๋ฌด์์ํ ๊ฒฐ๊ณผ ===")
print(pd.crosstab(simple_rand['gender'], simple_rand['group']))
print("\n=== ์ธตํ ๋ฌด์์ํ ๊ฒฐ๊ณผ (์ฑ๋ณ ๊ธฐ์ค) ===")
print(pd.crosstab(stratified_rand['gender'], stratified_rand['group']))
1.3 ์คํ ์ค๊ณ ์ ํ¶
def experimental_design_types():
"""์ฃผ์ ์คํ ์ค๊ณ ์ ํ"""
print("""
=================================================
์คํ ์ค๊ณ ์ ํ
=================================================
1. ์์ ๋ฌด์์ ์ค๊ณ (Completely Randomized Design)
- ๊ฐ์ฅ ๋จ์ํ ์ค๊ณ
- ํผํ์๋ฅผ ์ฒ๋ฆฌ๊ตฐ์ ์์ ๋ฌด์์ ๋ฐฐ์
- ๋ถ์: ๋
๋ฆฝํ๋ณธ t-๊ฒ์ , ์ผ์ ANOVA
2. ๋ฌด์์ ๋ธ๋ก ์ค๊ณ (Randomized Block Design)
- ๋ธ๋ก ๋ณ์๋ก ์ธตํ ํ ๋ฌด์์ ๋ฐฐ์
- ๊ฐ ๋ธ๋ก ๋ด ๋ชจ๋ ์ฒ๋ฆฌ ์์ค ํฌํจ
- ๋ถ์: ์ด์ ANOVA (๋ธ๋ก ํจ๊ณผ ์ ๊ฑฐ)
3. ์์ธ ์ค๊ณ (Factorial Design)
- ์ฌ๋ฌ ์์ธ์ ์กฐํฉ ํจ๊ณผ ์ฐ๊ตฌ
- ์ํธ์์ฉ ํจ๊ณผ ๊ฒ์ถ ๊ฐ๋ฅ
- ๋ถ์: ๋ค์ ANOVA
4. ๊ต์ฐจ ์ค๊ณ (Crossover Design)
- ํผํ์๊ฐ ๋ชจ๋ ์ฒ๋ฆฌ๋ฅผ ์์ฐจ์ ์ผ๋ก ๋ฐ์
- ๊ฐ์ธ ๊ฐ ๋ณ๋ ํต์
- ์ด์ ํจ๊ณผ ์ฃผ์
5. ๋ถํ ๊ตฌ ์ค๊ณ (Split-Plot Design)
- ํ ์์ธ์ ์ ์ฒด์, ๋ค๋ฅธ ์์ธ์ ๋ถ๋ถ์ ์ ์ฉ
- ๋์
, ๊ณตํ์์ ํํจ
""")
experimental_design_types()
2. A/B ํ ์คํธ ์ด๋ก ¶
2.1 A/B ํ ์คํธ ๊ฐ์¶
def ab_test_overview():
"""A/B ํ
์คํธ ๊ฐ์"""
print("""
=================================================
A/B ํ
์คํธ (A/B Testing)
=================================================
์ ์:
- ๋ ๊ฐ์ง ๋ฒ์ (A, B)์ ํจ๊ณผ๋ฅผ ๋น๊ตํ๋ ๋ฌด์์ ๋์กฐ ์คํ
- ์น/์ฑ์์ ๊ฐ์ฅ ๋๋ฆฌ ์ฌ์ฉ๋๋ ์คํ ๋ฐฉ๋ฒ
์ฉ์ด:
- Control (A): ๊ธฐ์กด ๋ฒ์ (๋์กฐ๊ตฐ)
- Treatment (B): ์ ๋ฒ์ (์คํ๊ตฐ)
- ์ ํ์จ (Conversion Rate): ๋ชฉํ ํ๋ ๋น์จ
- ์์น๋ฅ (Lift): (B - A) / A
ํ๋ก์ธ์ค:
1. ๊ฐ์ค ์๋ฆฝ
2. ๋ฉํธ๋ฆญ ์ ์
3. ํ๋ณธ ํฌ๊ธฐ ๊ณ์ฐ
4. ์คํ ์คํ
5. ํต๊ณ ๋ถ์
6. ์์ฌ๊ฒฐ์
์ฃผ์์ฌํญ:
- ๋จ์์ ์ผ๊ด์ฑ (์ฌ์ฉ์ vs ์ธ์
vs ํ์ด์ง๋ทฐ)
- ์คํ ๊ธฐ๊ฐ (์ต์ 1-2์ฃผ, ์์ผ ํจ๊ณผ ๊ณ ๋ ค)
- ๋ค์ค ๋น๊ต ๋ณด์
- ๋คํธ์ํฌ ํจ๊ณผ (spillover)
""")
ab_test_overview()
2.2 A/B ํ ์คํธ ๋ถ์¶
class ABTest:
"""A/B ํ
์คํธ ๋ถ์ ํด๋์ค"""
def __init__(self, control_visitors, control_conversions,
treatment_visitors, treatment_conversions):
self.n_c = control_visitors
self.x_c = control_conversions
self.n_t = treatment_visitors
self.x_t = treatment_conversions
self.p_c = self.x_c / self.n_c
self.p_t = self.x_t / self.n_t
def z_test(self, alternative='two-sided'):
"""๋ ๋น์จ์ Z-๊ฒ์ """
# ํตํฉ ๋น์จ
p_pooled = (self.x_c + self.x_t) / (self.n_c + self.n_t)
# ํ์ค์ค์ฐจ
se = np.sqrt(p_pooled * (1 - p_pooled) * (1/self.n_c + 1/self.n_t))
# Z ํต๊ณ๋
z = (self.p_t - self.p_c) / se
# p-value
if alternative == 'two-sided':
p_value = 2 * (1 - norm.cdf(abs(z)))
elif alternative == 'greater': # treatment > control
p_value = 1 - norm.cdf(z)
else: # treatment < control
p_value = norm.cdf(z)
return z, p_value
def confidence_interval(self, alpha=0.05):
"""์ฐจ์ด์ ์ ๋ขฐ๊ตฌ๊ฐ"""
diff = self.p_t - self.p_c
# ๊ฐ ๋น์จ์ ๋ถ์ฐ
var_c = self.p_c * (1 - self.p_c) / self.n_c
var_t = self.p_t * (1 - self.p_t) / self.n_t
se = np.sqrt(var_c + var_t)
z_crit = norm.ppf(1 - alpha/2)
ci_lower = diff - z_crit * se
ci_upper = diff + z_crit * se
return diff, (ci_lower, ci_upper)
def lift(self):
"""์์น๋ฅ ๊ณ์ฐ"""
if self.p_c == 0:
return np.inf
return (self.p_t - self.p_c) / self.p_c
def summary(self):
"""๊ฒฐ๊ณผ ์์ฝ"""
print("=== A/B Test Summary ===")
print(f"\nControl: {self.x_c:,}/{self.n_c:,} = {self.p_c:.4f} ({self.p_c*100:.2f}%)")
print(f"Treatment: {self.x_t:,}/{self.n_t:,} = {self.p_t:.4f} ({self.p_t*100:.2f}%)")
z, p_value = self.z_test()
diff, ci = self.confidence_interval()
lift = self.lift()
print(f"\n์ฐจ์ด: {diff:.4f} ({diff*100:.2f}%p)")
print(f"์์น๋ฅ : {lift*100:.2f}%")
print(f"95% CI: ({ci[0]*100:.2f}%p, {ci[1]*100:.2f}%p)")
print(f"\nZ ํต๊ณ๋: {z:.3f}")
print(f"p-value: {p_value:.4f}")
if p_value < 0.05:
print("\n๊ฒฐ๋ก : ํต๊ณ์ ์ผ๋ก ์ ์ํ ์ฐจ์ด ์์ (p < 0.05)")
if diff > 0:
print("Treatment๊ฐ Control๋ณด๋ค ์ ์ํ๊ฒ ๋์")
else:
print("Treatment๊ฐ Control๋ณด๋ค ์ ์ํ๊ฒ ๋ฎ์")
else:
print("\n๊ฒฐ๋ก : ํต๊ณ์ ์ผ๋ก ์ ์ํ ์ฐจ์ด ์์ (p >= 0.05)")
# ์์: ๋ฒํผ ์์ A/B ํ
์คํธ
ab_test = ABTest(
control_visitors=10000,
control_conversions=350,
treatment_visitors=10000,
treatment_conversions=420
)
ab_test.summary()
# ์๊ฐํ
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# ์ ํ์จ ๋น๊ต
ax = axes[0]
bars = ax.bar(['Control', 'Treatment'], [ab_test.p_c, ab_test.p_t], alpha=0.7)
ax.set_ylabel('์ ํ์จ')
ax.set_title('A/B ํ
์คํธ: ์ ํ์จ ๋น๊ต')
# ์๋ฌ๋ฐ ์ถ๊ฐ
se_c = np.sqrt(ab_test.p_c * (1 - ab_test.p_c) / ab_test.n_c)
se_t = np.sqrt(ab_test.p_t * (1 - ab_test.p_t) / ab_test.n_t)
ax.errorbar(['Control', 'Treatment'], [ab_test.p_c, ab_test.p_t],
yerr=[1.96*se_c, 1.96*se_t], fmt='none', color='black', capsize=5)
ax.grid(True, alpha=0.3, axis='y')
# ์ฐจ์ด์ ์ ๋ขฐ๊ตฌ๊ฐ
ax = axes[1]
diff, ci = ab_test.confidence_interval()
ax.errorbar([0], [diff], yerr=[[diff - ci[0]], [ci[1] - diff]],
fmt='o', markersize=10, capsize=10, capthick=2)
ax.axhline(0, color='r', linestyle='--', label='์ฐจ์ด ์์')
ax.set_xlim(-1, 1)
ax.set_ylabel('์ ํ์จ ์ฐจ์ด')
ax.set_title(f'์ฐจ์ด์ 95% ์ ๋ขฐ๊ตฌ๊ฐ\n({ci[0]:.4f}, {ci[1]:.4f})')
ax.set_xticks([])
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
2.3 ๋ฒ ์ด์ง์ A/B ํ ์คํธ¶
def bayesian_ab_test(n_c, x_c, n_t, x_t, alpha_prior=1, beta_prior=1, n_samples=100000):
"""
๋ฒ ์ด์ง์ A/B ํ
์คํธ
Beta ์ฌ์ ๋ถํฌ๋ฅผ ์ฌ์ฉํ ์ ํ์จ ์ถ์
"""
# ์ฌํ๋ถํฌ (Beta-Binomial conjugate)
alpha_c = alpha_prior + x_c
beta_c = beta_prior + n_c - x_c
alpha_t = alpha_prior + x_t
beta_t = beta_prior + n_t - x_t
# ์ฌํ๋ถํฌ์์ ์ํ๋ง
samples_c = np.random.beta(alpha_c, beta_c, n_samples)
samples_t = np.random.beta(alpha_t, beta_t, n_samples)
# P(Treatment > Control)
prob_t_better = np.mean(samples_t > samples_c)
# ๊ธฐ๋ ์์น๋ฅ
lift_samples = (samples_t - samples_c) / samples_c
expected_lift = np.mean(lift_samples)
lift_ci = np.percentile(lift_samples, [2.5, 97.5])
print("=== ๋ฒ ์ด์ง์ A/B ํ
์คํธ ===")
print(f"\nP(Treatment > Control): {prob_t_better:.4f} ({prob_t_better*100:.1f}%)")
print(f"๊ธฐ๋ ์์น๋ฅ : {expected_lift*100:.2f}%")
print(f"์์น๋ฅ 95% CI: ({lift_ci[0]*100:.2f}%, {lift_ci[1]*100:.2f}%)")
# ์์ฌ๊ฒฐ์ ๊ธฐ์ค
print("\n์์ฌ๊ฒฐ์ :")
if prob_t_better > 0.95:
print(" โ Treatment ์ฑํ ๊ถ์ฅ (P > 95%)")
elif prob_t_better < 0.05:
print(" โ Control ์ ์ง ๊ถ์ฅ (P < 5%)")
else:
print(" โ ์ถ๊ฐ ๋ฐ์ดํฐ ์์ง ํ์")
# ์๊ฐํ
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# ์ฌํ๋ถํฌ ๋น๊ต
ax = axes[0]
x_range = np.linspace(0, 0.1, 200)
ax.plot(x_range, stats.beta(alpha_c, beta_c).pdf(x_range), label='Control')
ax.plot(x_range, stats.beta(alpha_t, beta_t).pdf(x_range), label='Treatment')
ax.fill_between(x_range, stats.beta(alpha_c, beta_c).pdf(x_range), alpha=0.3)
ax.fill_between(x_range, stats.beta(alpha_t, beta_t).pdf(x_range), alpha=0.3)
ax.set_xlabel('์ ํ์จ')
ax.set_ylabel('๋ฐ๋')
ax.set_title('์ ํ์จ ์ฌํ๋ถํฌ')
ax.legend()
ax.grid(True, alpha=0.3)
# ์ฐจ์ด ๋ถํฌ
ax = axes[1]
diff_samples = samples_t - samples_c
ax.hist(diff_samples, bins=50, density=True, alpha=0.7, edgecolor='black')
ax.axvline(0, color='r', linestyle='--', label='์ฐจ์ด ์์')
ax.axvline(np.mean(diff_samples), color='g', linestyle='-',
label=f'ํ๊ท : {np.mean(diff_samples):.4f}')
ax.set_xlabel('์ ํ์จ ์ฐจ์ด (T - C)')
ax.set_ylabel('๋ฐ๋')
ax.set_title(f'์ฐจ์ด ์ฌํ๋ถํฌ\nP(T>C)={prob_t_better:.3f}')
ax.legend()
ax.grid(True, alpha=0.3)
# ์์น๋ฅ ๋ถํฌ
ax = axes[2]
lift_samples_clipped = np.clip(lift_samples, -1, 2)
ax.hist(lift_samples_clipped, bins=50, density=True, alpha=0.7, edgecolor='black')
ax.axvline(0, color='r', linestyle='--', label='0%')
ax.axvline(expected_lift, color='g', linestyle='-',
label=f'๊ธฐ๋๊ฐ: {expected_lift*100:.1f}%')
ax.set_xlabel('์์น๋ฅ ')
ax.set_ylabel('๋ฐ๋')
ax.set_title('์์น๋ฅ ์ฌํ๋ถํฌ')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
return prob_t_better, expected_lift
# ๋ฒ ์ด์ง์ ๋ถ์
prob_better, exp_lift = bayesian_ab_test(10000, 350, 10000, 420)
3. ํ๋ณธ ํฌ๊ธฐ ๊ฒฐ์ (๊ฒ์ ๋ ฅ ๋ถ์)¶
3.1 ๊ฒ์ ๋ ฅ ๋ถ์ ๊ฐ๋ ¶
def power_analysis_concepts():
"""๊ฒ์ ๋ ฅ ๋ถ์ ํต์ฌ ๊ฐ๋
"""
print("""
=================================================
๊ฒ์ ๋ ฅ ๋ถ์ (Power Analysis)
=================================================
๋ค ๊ฐ์ง ์์ (ํ๋๋ฅผ ๋ค๋ฅธ ์
์ผ๋ก๋ถํฐ ๊ณ์ฐ):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1. ํจ๊ณผ ํฌ๊ธฐ (Effect Size)
- ํ์งํ๊ณ ์ ํ๋ ์ต์ ํจ๊ณผ
- ์: ์ ํ์จ ์ฐจ์ด 0.02 (2%p)
2. ์ ์์์ค ฮฑ (Significance Level)
- ์ 1์ข
์ค๋ฅ ํ๋ฅ
- ์ผ๋ฐ์ ์ผ๋ก 0.05
3. ๊ฒ์ ๋ ฅ 1-ฮฒ (Power)
- ํจ๊ณผ๊ฐ ์์ ๋ ํ์งํ ํ๋ฅ
- ์ผ๋ฐ์ ์ผ๋ก 0.80 (์ต์) ~ 0.90
4. ํ๋ณธ ํฌ๊ธฐ n (Sample Size)
- ํ์ํ ๊ด์ธก ์
๊ณ์ฐ ํ๋ฆ:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ํจ๊ณผ ํฌ๊ธฐ + ฮฑ + (1-ฮฒ) โ n (์ฌ์ ์ค๊ณ)
n + ฮฑ + (1-ฮฒ) โ ์ต์ ํ์ง ๊ฐ๋ฅ ํจ๊ณผ (๋ฏผ๊ฐ๋ ๋ถ์)
n + ฮฑ + ํจ๊ณผ ํฌ๊ธฐ โ ๋ฌ์ฑ ๊ฒ์ ๋ ฅ (์ฌํ ๋ถ์)
๊ฒฝํ ๋ฒ์น:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- ๊ฒ์ ๋ ฅ 80% ๋ฏธ๋ง: ๊ณผ์ ๊ฒ์ ๋ ฅ
- ๊ฒ์ ๋ ฅ 80-90%: ์ผ๋ฐ์ ๊ถ์ฅ
- ๊ฒ์ ๋ ฅ 90% ์ด์: ๊ณ ๊ฒ์ ๋ ฅ ์ฐ๊ตฌ
""")
power_analysis_concepts()
3.2 ๋ ๋น์จ ๋น๊ต์ ํ๋ณธ ํฌ๊ธฐ¶
def sample_size_two_proportions(p1, p2, alpha=0.05, power=0.80, ratio=1):
"""
๋ ๋น์จ ๋น๊ต๋ฅผ ์ํ ํ๋ณธ ํฌ๊ธฐ ๊ณ์ฐ
Parameters:
-----------
p1 : float
Control ์ ํ์จ (๊ธฐ์ค)
p2 : float
Treatment ์ ํ์จ (๋ชฉํ)
alpha : float
์ ์์์ค
power : float
๊ฒ์ ๋ ฅ
ratio : float
n2/n1 ๋น์จ (๊ธฐ๋ณธ๊ฐ 1 = ๋์ผ ํฌ๊ธฐ)
Returns:
--------
n1, n2 : int
๊ฐ ๊ทธ๋ฃน์ ํ์ ํ๋ณธ ํฌ๊ธฐ
"""
# ํจ๊ณผ ํฌ๊ธฐ
effect = abs(p2 - p1)
p_pooled = (p1 + ratio * p2) / (1 + ratio)
# Z ๊ฐ
z_alpha = norm.ppf(1 - alpha/2) # ์์ธก
z_beta = norm.ppf(power)
# ํ๋ณธ ํฌ๊ธฐ ๊ณต์
numerator = (z_alpha * np.sqrt((1 + ratio) * p_pooled * (1 - p_pooled)) +
z_beta * np.sqrt(p1 * (1 - p1) + ratio * p2 * (1 - p2)))**2
n1 = numerator / (effect**2 * ratio)
n2 = n1 * ratio
return int(np.ceil(n1)), int(np.ceil(n2))
def plot_sample_size_analysis(p1_base, effects, alpha=0.05, power=0.80):
"""ํจ๊ณผ ํฌ๊ธฐ์ ๋ฐ๋ฅธ ํ์ ํ๋ณธ ํฌ๊ธฐ"""
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# ํจ๊ณผ ํฌ๊ธฐ vs ํ๋ณธ ํฌ๊ธฐ
ax = axes[0]
sample_sizes = []
for effect in effects:
p2 = p1_base + effect
n1, _ = sample_size_two_proportions(p1_base, p2, alpha, power)
sample_sizes.append(n1)
ax.plot(np.array(effects)*100, sample_sizes, 'bo-', linewidth=2)
ax.set_xlabel('ํจ๊ณผ ํฌ๊ธฐ (์ ํ์จ ์ฐจ์ด %p)')
ax.set_ylabel('๊ทธ๋ฃน๋น ํ์ ํ๋ณธ ํฌ๊ธฐ')
ax.set_title(f'ํจ๊ณผ ํฌ๊ธฐ vs ํ๋ณธ ํฌ๊ธฐ\n(๊ธฐ์ค ์ ํ์จ={p1_base:.1%}, ฮฑ={alpha}, power={power})')
ax.grid(True, alpha=0.3)
# ๋ก๊ทธ ์ค์ผ์ผ
ax.set_yscale('log')
for i, (eff, n) in enumerate(zip(effects, sample_sizes)):
ax.annotate(f'{n:,}', (eff*100, n), textcoords="offset points",
xytext=(0, 10), ha='center', fontsize=9)
# ๊ฒ์ ๋ ฅ vs ํ๋ณธ ํฌ๊ธฐ
ax = axes[1]
effect_fixed = 0.02 # 2%p ๊ณ ์
p2_fixed = p1_base + effect_fixed
powers = np.linspace(0.5, 0.95, 10)
sample_sizes_power = []
for pwr in powers:
n1, _ = sample_size_two_proportions(p1_base, p2_fixed, alpha, pwr)
sample_sizes_power.append(n1)
ax.plot(powers*100, sample_sizes_power, 'go-', linewidth=2)
ax.set_xlabel('๊ฒ์ ๋ ฅ (%)')
ax.set_ylabel('๊ทธ๋ฃน๋น ํ์ ํ๋ณธ ํฌ๊ธฐ')
ax.set_title(f'๊ฒ์ ๋ ฅ vs ํ๋ณธ ํฌ๊ธฐ\n(ํจ๊ณผ ํฌ๊ธฐ={effect_fixed:.1%}p)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# ์์
p1 = 0.05 # ๊ธฐ์ค ์ ํ์จ 5%
effects = [0.005, 0.01, 0.015, 0.02, 0.025, 0.03] # 0.5%p ~ 3%p
print("=== ํ๋ณธ ํฌ๊ธฐ ๊ณ์ฐ ===")
print(f"๊ธฐ์ค ์ ํ์จ: {p1:.1%}")
print(f"ฮฑ = 0.05, Power = 0.80")
print()
for effect in effects:
p2 = p1 + effect
n1, n2 = sample_size_two_proportions(p1, p2)
print(f"ํจ๊ณผ {effect*100:.1f}%p (์๋ {effect/p1*100:.0f}%): n1={n1:,}, n2={n2:,}, ์ด={n1+n2:,}")
plot_sample_size_analysis(p1, effects)
3.3 statsmodels ๊ฒ์ ๋ ฅ ๋ถ์¶
from statsmodels.stats.power import TTestPower, NormalIndPower, tt_ind_solve_power
from statsmodels.stats.proportion import proportion_effectsize
def statsmodels_power_analysis():
"""statsmodels๋ฅผ ์ฌ์ฉํ ๊ฒ์ ๋ ฅ ๋ถ์"""
# 1. t-๊ฒ์ ๊ฒ์ ๋ ฅ ๋ถ์
print("=== t-๊ฒ์ ๊ฒ์ ๋ ฅ ๋ถ์ ===")
# ํจ๊ณผ ํฌ๊ธฐ ๊ณ์ฐ (Cohen's d)
# d = (ฮผ1 - ฮผ2) / ฯ
mean_diff = 5
std = 15
d = mean_diff / std
print(f"Cohen's d = {d:.3f}")
# ํ์ ํ๋ณธ ํฌ๊ธฐ
power_analysis = TTestPower()
n = power_analysis.solve_power(effect_size=d, alpha=0.05, power=0.80,
alternative='two-sided')
print(f"ํ์ ํ๋ณธ ํฌ๊ธฐ (๊ฐ ๊ทธ๋ฃน): {int(np.ceil(n))}")
# ๋ฌ์ฑ ๊ฒ์ ๋ ฅ
achieved_power = power_analysis.power(effect_size=d, nobs=100, alpha=0.05,
alternative='two-sided')
print(f"n=100์ผ ๋ ๊ฒ์ ๋ ฅ: {achieved_power:.3f}")
# 2. ๋น์จ ๊ฒ์ ๊ฒ์ ๋ ฅ ๋ถ์
print("\n=== ๋น์จ ๊ฒ์ ๊ฒ์ ๋ ฅ ๋ถ์ ===")
p1 = 0.05
p2 = 0.07
effect = proportion_effectsize(p1, p2)
print(f"ํจ๊ณผ ํฌ๊ธฐ (h): {effect:.3f}")
# ํ์ ํ๋ณธ ํฌ๊ธฐ
power_prop = NormalIndPower()
n_prop = power_prop.solve_power(effect_size=effect, alpha=0.05, power=0.80,
alternative='two-sided', ratio=1)
print(f"ํ์ ํ๋ณธ ํฌ๊ธฐ (๊ฐ ๊ทธ๋ฃน): {int(np.ceil(n_prop))}")
return n, n_prop
n_t, n_prop = statsmodels_power_analysis()
3.4 ๊ฒ์ ๋ ฅ ๊ณก์ ¶
def plot_power_curve(effect_sizes, n_per_group, alpha=0.05):
"""๊ฒ์ ๋ ฅ ๊ณก์ ์๊ฐํ"""
power_analysis = NormalIndPower()
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# ํจ๊ณผ ํฌ๊ธฐ vs ๊ฒ์ ๋ ฅ (n ๊ณ ์ )
ax = axes[0]
for n in n_per_group:
powers = [power_analysis.power(effect_size=es, nobs=n, alpha=alpha,
alternative='two-sided', ratio=1)
for es in effect_sizes]
ax.plot(effect_sizes, powers, '-o', label=f'n={n}')
ax.axhline(0.80, color='r', linestyle='--', alpha=0.5, label='Power=0.80')
ax.set_xlabel('ํจ๊ณผ ํฌ๊ธฐ (Cohen\'s h)')
ax.set_ylabel('๊ฒ์ ๋ ฅ')
ax.set_title('๊ฒ์ ๋ ฅ ๊ณก์ (ํ๋ณธ ํฌ๊ธฐ๋ณ)')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_ylim(0, 1)
# ํ๋ณธ ํฌ๊ธฐ vs ๊ฒ์ ๋ ฅ (ํจ๊ณผ ํฌ๊ธฐ ๊ณ ์ )
ax = axes[1]
n_range = np.arange(50, 1001, 50)
effect_fixed = [0.1, 0.2, 0.3, 0.5]
for es in effect_fixed:
powers = [power_analysis.power(effect_size=es, nobs=n, alpha=alpha,
alternative='two-sided', ratio=1)
for n in n_range]
ax.plot(n_range, powers, '-', label=f'h={es}')
ax.axhline(0.80, color='r', linestyle='--', alpha=0.5, label='Power=0.80')
ax.set_xlabel('๊ทธ๋ฃน๋น ํ๋ณธ ํฌ๊ธฐ')
ax.set_ylabel('๊ฒ์ ๋ ฅ')
ax.set_title('๊ฒ์ ๋ ฅ ๊ณก์ (ํจ๊ณผ ํฌ๊ธฐ๋ณ)')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_ylim(0, 1)
plt.tight_layout()
plt.show()
effect_sizes = np.linspace(0.05, 0.5, 20)
n_per_group = [50, 100, 200, 500]
plot_power_curve(effect_sizes, n_per_group)
4. ์์ฐจ์ ๊ฒ์ (Sequential Testing)¶
4.1 ์ ์์ฐจ์ ๊ฒ์ ์ธ๊ฐ?¶
def sequential_testing_motivation():
"""์์ฐจ์ ๊ฒ์ ์ ํ์์ฑ"""
print("""
=================================================
์์ฐจ์ ๊ฒ์ (Sequential Testing)
=================================================
๋ฌธ์ : Peeking Problem
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- A/B ํ
์คํธ ์ค๊ฐ์ ๊ฒฐ๊ณผ๋ฅผ ํ์ธํ๋ฉด ์ 1์ข
์ค๋ฅ์จ ์ฆ๊ฐ
- ฮฑ=0.05๋ก ์ค๊ณํด๋ 5๋ฒ ์ค๊ฐ ํ์ธ ์ ์ค์ ์ค๋ฅ์จ ~14%
์์:
- 1ํ ํ์ธ: ฮฑ = 0.05
- 5ํ ํ์ธ: ฮฑ โ 0.14
- 10ํ ํ์ธ: ฮฑ โ 0.19
ํด๊ฒฐ์ฑ
:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1. ๊ณ ์ ํ๋ณธ ๊ฒ์ : ๋ฏธ๋ฆฌ ์ ํ n๊น์ง ๊ธฐ๋ค๋ฆผ
2. ์์ฐจ์ ๊ฒ์ : ์ค๊ฐ ํ์ธ์ ํ์ฉํ๋ ๋ณด์
- O'Brien-Fleming
- Pocock
- Alpha spending functions
์ฅ์ :
- ํจ๊ณผ๊ฐ ๋ช
ํํ๋ฉด ์กฐ๊ธฐ ์ข
๋ฃ โ ๋น์ฉ/์๊ฐ ์ ์ฝ
- ํจ๊ณผ๊ฐ ์์ผ๋ฉด ๋น ๋ฅธ ์ข
๋ฃ
- ํต๊ณ์ ํ๋น์ฑ ์ ์ง
""")
sequential_testing_motivation()
4.2 Peeking ๋ฌธ์ ์๋ฎฌ๋ ์ด์ ¶
def simulate_peeking_problem(n_simulations=10000, n_total=1000, n_looks=5):
"""
Peeking ๋ฌธ์ ์๋ฎฌ๋ ์ด์
:
๊ท๋ฌด๊ฐ์ค์ด ์ฐธ์ผ ๋ (์ค์ ์ฐจ์ด ์์) ์ผ๋ง๋ ์์ฃผ ์ ์ํ๊ฒ ๋์ค๋๊ฐ
"""
np.random.seed(42)
alpha = 0.05
# ์ค๊ฐ ํ์ธ ์์
look_points = np.linspace(n_total // n_looks, n_total, n_looks).astype(int)
false_positives_fixed = 0 # ๊ณ ์ ํ๋ณธ (๋ง์ง๋ง๋ง ํ์ธ)
false_positives_peeking = 0 # ๋ชจ๋ ์์ ํ์ธ
for _ in range(n_simulations):
# ๊ท๋ฌด๊ฐ์ค ํ์์ ๋ฐ์ดํฐ ์์ฑ (๋ ๊ทธ๋ฃน ๋์ผ)
control = np.random.binomial(1, 0.1, n_total)
treatment = np.random.binomial(1, 0.1, n_total)
# Peeking: ๊ฐ ์์ ์์ ๊ฒ์
for look in look_points:
x_c = control[:look].sum()
x_t = treatment[:look].sum()
n = look
# ๋น์จ
p_c = x_c / n
p_t = x_t / n
p_pooled = (x_c + x_t) / (2 * n)
if p_pooled > 0 and p_pooled < 1:
se = np.sqrt(p_pooled * (1 - p_pooled) * 2 / n)
z = (p_t - p_c) / se if se > 0 else 0
p_value = 2 * (1 - norm.cdf(abs(z)))
if p_value < alpha:
false_positives_peeking += 1
break # ํ ๋ฒ์ด๋ผ๋ ์ ์ํ๋ฉด ์ข
๋ฃ
# ๊ณ ์ ํ๋ณธ: ๋ง์ง๋ง๋ง ํ์ธ
x_c = control.sum()
x_t = treatment.sum()
p_c = x_c / n_total
p_t = x_t / n_total
p_pooled = (x_c + x_t) / (2 * n_total)
if p_pooled > 0 and p_pooled < 1:
se = np.sqrt(p_pooled * (1 - p_pooled) * 2 / n_total)
z = (p_t - p_c) / se if se > 0 else 0
p_value = 2 * (1 - norm.cdf(abs(z)))
if p_value < alpha:
false_positives_fixed += 1
fpr_fixed = false_positives_fixed / n_simulations
fpr_peeking = false_positives_peeking / n_simulations
print("=== Peeking ๋ฌธ์ ์๋ฎฌ๋ ์ด์
===")
print(f"์๋ฎฌ๋ ์ด์
ํ์: {n_simulations:,}")
print(f"์ด ํ๋ณธ ํฌ๊ธฐ: {n_total}")
print(f"์ค๊ฐ ํ์ธ ํ์: {n_looks}")
print(f"๋ชฉํ ฮฑ: {alpha}")
print(f"\n๊ณ ์ ํ๋ณธ ๊ฒ์ ์์์ฑ๋ฅ : {fpr_fixed:.4f} ({fpr_fixed*100:.2f}%)")
print(f"Peeking ๊ฒ์ ์์์ฑ๋ฅ : {fpr_peeking:.4f} ({fpr_peeking*100:.2f}%)")
print(f"์์์ฑ๋ฅ ์ฆ๊ฐ: {(fpr_peeking/alpha - 1)*100:.1f}%")
return fpr_fixed, fpr_peeking
fpr_fixed, fpr_peeking = simulate_peeking_problem()
4.3 Alpha Spending ํจ์¶
def alpha_spending_pocock(t, alpha=0.05):
"""Pocock alpha spending function"""
return alpha * np.log(1 + (np.e - 1) * t)
def alpha_spending_obrien_fleming(t, alpha=0.05):
"""O'Brien-Fleming alpha spending function"""
return 2 * (1 - norm.cdf(norm.ppf(1 - alpha/2) / np.sqrt(t)))
def plot_alpha_spending():
"""Alpha spending ํจ์ ์๊ฐํ"""
t = np.linspace(0.01, 1, 100)
alpha = 0.05
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(t, alpha_spending_pocock(t, alpha), label='Pocock', linewidth=2)
ax.plot(t, alpha_spending_obrien_fleming(t, alpha), label="O'Brien-Fleming", linewidth=2)
ax.plot(t, t * alpha, '--', label='Linear (reference)', alpha=0.5)
ax.axhline(alpha, color='r', linestyle=':', label=f'Total ฮฑ={alpha}')
ax.set_xlabel('์ ๋ณด ๋น์จ (ํ์ฌ/์ต์ข
)')
ax.set_ylabel('๋์ ฮฑ spent')
ax.set_title('Alpha Spending Functions')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_xlim(0, 1)
ax.set_ylim(0, alpha * 1.1)
plt.show()
print("=== Alpha Spending ํจ์ ๋น๊ต ===")
print("\nPocock:")
print(" - ๊ฐ ๋ถ์์์ ๋์ผํ ์๊ณ๊ฐ")
print(" - ์กฐ๊ธฐ ์ข
๋ฃ์ ๋ ๊ด๋")
print(" - ์ต์ข
๋ถ์์์ ๋ ๋ณด์์ ")
print("\nO'Brien-Fleming:")
print(" - ์ด๊ธฐ์ ๋งค์ฐ ๋ณด์์ (๋์ ์๊ณ๊ฐ)")
print(" - ํ๊ธฐ์ ๊ณ ์ ํ๋ณธ๊ณผ ์ ์ฌ")
print(" - ์กฐ๊ธฐ ์ข
๋ฃ๋ ๊ทน๋จ์ ํจ๊ณผ์์๋ง")
plot_alpha_spending()
4.4 ์์ฐจ์ ๊ฒ์ ๊ตฌํ¶
class SequentialTest:
"""์์ฐจ์ A/B ํ
์คํธ"""
def __init__(self, max_n, n_looks, alpha=0.05, spending='obrien_fleming'):
"""
Parameters:
-----------
max_n : int
์ต๋ ํ๋ณธ ํฌ๊ธฐ (๊ฐ ๊ทธ๋ฃน)
n_looks : int
์ค๊ฐ ๋ถ์ ํ์
alpha : float
์ ์ฒด ์ ์์์ค
spending : str
'pocock' or 'obrien_fleming'
"""
self.max_n = max_n
self.n_looks = n_looks
self.alpha = alpha
self.spending = spending
# ๋ถ์ ์์
self.look_times = np.linspace(1/n_looks, 1, n_looks)
# ๊ฐ ๋ถ์์์ ์ฌ์ฉํ alpha
self.alphas = self._compute_alphas()
def _compute_alphas(self):
"""๊ฐ ๋ถ์ ์์ ์ alpha ๊ณ์ฐ"""
if self.spending == 'pocock':
cumulative = [alpha_spending_pocock(t, self.alpha) for t in self.look_times]
else:
cumulative = [alpha_spending_obrien_fleming(t, self.alpha) for t in self.look_times]
# ์ฆ๋ถ alpha
alphas = [cumulative[0]]
for i in range(1, len(cumulative)):
alphas.append(cumulative[i] - cumulative[i-1])
return alphas
def critical_values(self):
"""๊ฐ ๋ถ์์ ์๊ณ Z ๊ฐ"""
return [norm.ppf(1 - a/2) for a in self.alphas]
def summary(self):
"""๋ถ์ ๊ณํ ์์ฝ"""
print("=== ์์ฐจ์ ๊ฒ์ ๊ณํ ===")
print(f"์ต๋ ํ๋ณธ: {self.max_n} (๊ฐ ๊ทธ๋ฃน)")
print(f"์ค๊ฐ ๋ถ์: {self.n_looks}ํ")
print(f"์ ์ฒด ฮฑ: {self.alpha}")
print(f"Spending: {self.spending}")
print("\n๋ถ์ ์์ ๋ณ ๊ณํ:")
print("-" * 50)
print(f"{'๋ถ์':<6} {'n':<10} {'๋์ ฮฑ':<12} {'์ฆ๋ถ ฮฑ':<12} {'Z ์๊ณ๊ฐ':<10}")
print("-" * 50)
cumulative_alpha = 0
z_crits = self.critical_values()
for i, (t, a) in enumerate(zip(self.look_times, self.alphas)):
n = int(t * self.max_n)
cumulative_alpha += a
print(f"{i+1:<6} {n:<10} {cumulative_alpha:<12.4f} {a:<12.4f} {z_crits[i]:<10.3f}")
# ์์
seq_test = SequentialTest(max_n=5000, n_looks=5, alpha=0.05, spending='obrien_fleming')
seq_test.summary()
print("\n")
seq_test_pocock = SequentialTest(max_n=5000, n_looks=5, alpha=0.05, spending='pocock')
seq_test_pocock.summary()
5. ์ผ๋ฐ์ ์ธ ํจ์ ๊ณผ ์ฃผ์์ฌํญ¶
5.1 ๋ค์ค ๋น๊ต ๋ฌธ์ ¶
def multiple_comparisons_problem():
"""๋ค์ค ๋น๊ต ๋ฌธ์ """
print("""
=================================================
๋ค์ค ๋น๊ต ๋ฌธ์ (Multiple Comparisons)
=================================================
๋ฌธ์ :
- ์ฌ๋ฌ ๊ฒ์ ์ ๋์์ ์ํํ๋ฉด ์ 1์ข
์ค๋ฅ ์ฆ๊ฐ
- k๊ฐ ๊ฒ์ ์ ์ต์ ํ๋ ์์์ฑ ํ๋ฅ : 1 - (1-ฮฑ)^k
์์ (ฮฑ=0.05):
- 1๊ฐ ๊ฒ์ : 5%
- 5๊ฐ ๊ฒ์ : 23%
- 10๊ฐ ๊ฒ์ : 40%
- 20๊ฐ ๊ฒ์ : 64%
๋ณด์ ๋ฐฉ๋ฒ:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1. Bonferroni: ฮฑ' = ฮฑ/k (๊ฐ์ฅ ๋ณด์์ )
2. Holm-Bonferroni: ์์ฐจ์ Bonferroni
3. Benjamini-Hochberg (FDR): ์๋ฐ๊ฒฌ๋ฅ ํต์
4. ์ฌ์ ๋ฑ๋ก: ์ฃผ์ ๊ฐ์ค 1๊ฐ ์ง์
""")
# ์๊ฐํ
k_values = range(1, 21)
alpha = 0.05
fwer = [1 - (1 - alpha)**k for k in k_values]
bonferroni = [min(alpha * k, 1.0) for k in k_values] # ๋ณด์ ์ ํ์ฉ ๋ฒ์
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(k_values, fwer, 'b-o', label='๋ณด์ ์์ (FWER)')
ax.axhline(alpha, color='r', linestyle='--', label=f'๋ชฉํ ฮฑ={alpha}')
ax.set_xlabel('๊ฒ์ ํ์')
ax.set_ylabel('์ต์ 1๊ฐ ์์์ฑ ํ๋ฅ ')
ax.set_title('๋ค์ค ๋น๊ต ๋ฌธ์ : ๊ฒ์ ํ์์ ์์์ฑ๋ฅ ')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_ylim(0, 0.7)
plt.show()
multiple_comparisons_problem()
5.2 ๊ธฐํ ์ฃผ์์ฌํญ¶
def common_pitfalls():
"""A/B ํ
์คํธ์ ์ผ๋ฐ์ ์ธ ํจ์ """
print("""
=================================================
A/B ํ
์คํธ ์ฃผ์์ฌํญ
=================================================
1. Peeking (์ค๊ฐ ํ์ธ)
- ๋ฌธ์ : ์ํ๋ ๊ฒฐ๊ณผ๊ฐ ๋์ฌ ๋๊น์ง ํ์ธ
- ํด๊ฒฐ: ์์ฐจ์ ๊ฒ์ ๋๋ ๊ณ ์ ํ๋ณธ
2. ๋ค์ค ๋น๊ต
- ๋ฌธ์ : ์ฌ๋ฌ ๋ฉํธ๋ฆญ/์ธ๊ทธ๋จผํธ ํ
์คํธ
- ํด๊ฒฐ: ์ฌ์ ๋ฑ๋ก, ๋ณด์ , ์ฃผ์ ๋ฉํธ๋ฆญ ์ง์
3. ๋ถ์ ์ ํ ํ๋ณธ ํฌ๊ธฐ
- ๋ฌธ์ : ๋๋ฌด ์์ผ๋ฉด ํจ๊ณผ ํ์ง ์คํจ, ๋๋ฌด ํฌ๋ฉด ๋ญ๋น
- ํด๊ฒฐ: ์ฌ์ ๊ฒ์ ๋ ฅ ๋ถ์
4. ์ ๊ท ํจ๊ณผ (Novelty Effect)
- ๋ฌธ์ : ์๋ก์ ์์ฒด๊ฐ ์ผ์์ ํจ๊ณผ ์ ๋ฐ
- ํด๊ฒฐ: ์ถฉ๋ถํ ์คํ ๊ธฐ๊ฐ
5. ๋คํธ์ํฌ ํจ๊ณผ (Spillover)
- ๋ฌธ์ : ๊ทธ๋ฃน ๊ฐ ์ํธ์์ฉ
- ํด๊ฒฐ: ํด๋ฌ์คํฐ ๋ฌด์์ํ
6. Simpson's Paradox
- ๋ฌธ์ : ์ ์ฒด vs ์ธ๊ทธ๋จผํธ๋ณ ๊ฒฐ๊ณผ ์์ถฉ
- ํด๊ฒฐ: ์ธตํ ๋ถ์, ์ธ๊ณผ ๊ทธ๋ํ
7. ์ค์ ์ ์ ์์ฑ ๋ฌด์
- ๋ฌธ์ : ํต๊ณ์ ์ ์์ฑ โ ์ค์ ์ ์ค์์ฑ
- ํด๊ฒฐ: ํจ๊ณผ ํฌ๊ธฐ, ์ ๋ขฐ๊ตฌ๊ฐ, ๋น์ฆ๋์ค ์ํฅ ๊ณ ๋ ค
8. ๊ฒ์ ๋ ฅ ๋ถ์กฑ
- ๋ฌธ์ : "ํจ๊ณผ ์์" โ "๊ท๋ฌด๊ฐ์ค ์ฐธ"
- ํด๊ฒฐ: ๊ฒ์ ๋ ฅ ๋ณด๊ณ , ๋๋ฑ์ฑ ๊ฒ์
""")
common_pitfalls()
6. ์ค์ต ์์ ¶
6.1 ์ข ํฉ ์คํ ์ค๊ณ¶
def complete_ab_test_workflow():
"""A/B ํ
์คํธ ์ ์ฒด ์ํฌํ๋ก์ฐ"""
print("="*60)
print("A/B ํ
์คํธ ์ํฌํ๋ก์ฐ")
print("="*60)
# 1. ๊ฐ์ค ์๋ฆฝ
print("\n[1๋จ๊ณ] ๊ฐ์ค ์๋ฆฝ")
print(" H0: ์ ๋ฒํผ ์์์ ์ ํ์จ์ ์ํฅ ์์")
print(" H1: ์ ๋ฒํผ ์์์ ์ ํ์จ์ ๋ณํ์ํด")
# 2. ๋ฉํธ๋ฆญ ์ ์
print("\n[2๋จ๊ณ] ๋ฉํธ๋ฆญ ์ ์")
baseline_rate = 0.05 # 5%
mde = 0.01 # ์ต์ ํ์ง ํจ๊ณผ: 1%p
print(f" ๊ธฐ์ค ์ ํ์จ: {baseline_rate:.1%}")
print(f" MDE (Minimum Detectable Effect): {mde:.1%}p")
# 3. ํ๋ณธ ํฌ๊ธฐ ๊ณ์ฐ
print("\n[3๋จ๊ณ] ํ๋ณธ ํฌ๊ธฐ ๊ณ์ฐ")
target_rate = baseline_rate + mde
n1, n2 = sample_size_two_proportions(baseline_rate, target_rate, alpha=0.05, power=0.80)
print(f" ํ์ ํ๋ณธ ํฌ๊ธฐ: {n1:,} (๊ฐ ๊ทธ๋ฃน)")
print(f" ์ด ํ์ ํธ๋ํฝ: {n1 + n2:,}")
# 4. ์คํ ์คํ (์๋ฎฌ๋ ์ด์
)
print("\n[4๋จ๊ณ] ์คํ ์คํ (์๋ฎฌ๋ ์ด์
)")
np.random.seed(42)
n_control = n1
n_treatment = n2
x_control = np.random.binomial(n_control, baseline_rate)
x_treatment = np.random.binomial(n_treatment, baseline_rate + mde * 0.8) # ์ค์ ํจ๊ณผ๋ MDE์ 80%
print(f" Control: {x_control:,}/{n_control:,} = {x_control/n_control:.2%}")
print(f" Treatment: {x_treatment:,}/{n_treatment:,} = {x_treatment/n_treatment:.2%}")
# 5. ๋ถ์
print("\n[5๋จ๊ณ] ๋ถ์")
ab = ABTest(n_control, x_control, n_treatment, x_treatment)
z, p = ab.z_test()
diff, ci = ab.confidence_interval()
lift = ab.lift()
print(f" ์ฐจ์ด: {diff:.4f} ({diff*100:.2f}%p)")
print(f" ์์น๋ฅ : {lift*100:.2f}%")
print(f" 95% CI: ({ci[0]*100:.2f}%p, {ci[1]*100:.2f}%p)")
print(f" Z ํต๊ณ๋: {z:.3f}")
print(f" p-value: {p:.4f}")
# 6. ์์ฌ๊ฒฐ์
print("\n[6๋จ๊ณ] ์์ฌ๊ฒฐ์ ")
if p < 0.05:
if diff > 0:
print(" ๊ฒฐ๋ก : Treatment ์ฑํ (ํต๊ณ์ ์ผ๋ก ์ ์ํ ๊ฐ์ )")
else:
print(" ๊ฒฐ๋ก : Control ์ ์ง (Treatment๊ฐ ๋ ๋์จ)")
else:
print(" ๊ฒฐ๋ก : ๊ฒฐ์ ๋ณด๋ฅ (์ ์ํ ์ฐจ์ด ์์)")
print(" ๊ณ ๋ ค์ฌํญ: ํ๋ณธ ํฌ๊ธฐ ์ฆ๊ฐ ๋๋ ๋ค๋ฅธ ๋ณํ ํ
์คํธ")
return ab
ab_result = complete_ab_test_workflow()
7. ์ฐ์ต ๋ฌธ์ ¶
๋ฌธ์ 1: ํ๋ณธ ํฌ๊ธฐ ๊ณ์ฐ¶
๊ธฐ์กด ์ ํ์จ์ด 3%์ด๊ณ , ์ต์ 20%์ ์๋์ ์์น(3.6%๋ก)์ ํ์งํ๊ณ ์ถ๋ค๋ฉด: 1. ฮฑ=0.05, Power=0.80์์ ํ์ํ ํ๋ณธ ํฌ๊ธฐ 2. Power=0.90์ผ๋ก ๋์ด๋ฉด ํ๋ณธ ํฌ๊ธฐ ๋ณํ 3. MDE๋ฅผ 10% ์์น์ผ๋ก ๋ฎ์ถ๋ฉด ํ๋ณธ ํฌ๊ธฐ ๋ณํ
๋ฌธ์ 2: ์คํ ๊ธฐ๊ฐ ์ถ์ ¶
์ผ์ผ ํธ๋ํฝ์ด 10,000 ๋ฐฉ๋ฌธ์ด๊ณ , 50:50์ผ๋ก ๋ถํ ํ๋ค๋ฉด: 1. ๋ฌธ์ 1์ ํ๋ณธ ํฌ๊ธฐ๋ฅผ ํ๋ณดํ๋๋ฐ ํ์ํ ๊ธฐ๊ฐ 2. ์ฃผ๋ง ํจ๊ณผ๋ฅผ ๊ณ ๋ คํ๋ฉด ์ต์ ๋ช ์ฃผ ์คํ?
๋ฌธ์ 3: ์์ฐจ์ ๊ฒ์ ์ค๊ณ¶
5ํ ์ค๊ฐ ๋ถ์์ ๊ณํํ๋ค๋ฉด: 1. O'Brien-Fleming ๋ฐฉ๋ฒ์ ๊ฐ ๋ถ์ ์๊ณ๊ฐ 2. Pocock ๋ฐฉ๋ฒ๊ณผ ๋น๊ต 3. ์ฒซ ๋ฒ์งธ ๋ถ์์์ ์กฐ๊ธฐ ์ข ๋ฃ ์กฐ๊ฑด
๋ฌธ์ 4: ๋ค์ค ๋น๊ต ๋ณด์ ¶
5๊ฐ ์ธ๊ทธ๋จผํธ(์ฐ๋ น๋)์์ A/B ํ ์คํธ ๊ฒฐ๊ณผ๋ฅผ ๋ถ์ํ๋ค๋ฉด: 1. Bonferroni ๋ณด์ ๋ ์ ์์์ค 2. ํ๋์ ์ธ๊ทธ๋จผํธ์์ p=0.02๊ฐ ๋์์ ๋ ๊ฒฐ๋ก 3. ์ฌ์ ๋ฑ๋กํ๋ค๋ฉด ์ด๋ป๊ฒ ๋ค๋ฅผ๊น?
8. ํต์ฌ ์์ฝ¶
์คํ ์ค๊ณ ์ฒดํฌ๋ฆฌ์คํธ¶
- [ ] ๋ช ํํ ๊ฐ์ค๊ณผ ๋ฉํธ๋ฆญ ์ ์
- [ ] ๊ฒ์ ๋ ฅ ๋ถ์์ผ๋ก ํ๋ณธ ํฌ๊ธฐ ๊ฒฐ์
- [ ] ๋ฌด์์ํ ๋ฐฉ๋ฒ ์ ํ
- [ ] ์คํ ๊ธฐ๊ฐ ์ค์ (์ฃผ๊ฐ ํจ๊ณผ ๊ณ ๋ ค)
- [ ] ์ค๊ฐ ๋ถ์ ๊ณํ (์์ฐจ์ ๊ฒ์ )
- [ ] ๋ค์ค ๋น๊ต ๊ณ ๋ ค
- [ ] ์ฌ์ ๋ฑ๋ก
ํ๋ณธ ํฌ๊ธฐ ๊ณต์ (๋น์จ ๋น๊ต)¶
$$n = \frac{(z_{\alpha/2}\sqrt{2\bar{p}(1-\bar{p})} + z_{\beta}\sqrt{p_1(1-p_1)+p_2(1-p_2)})^2}{(p_1-p_2)^2}$$
๊ฒ์ ๋ ฅ ๊ด๊ณ¶
| ์์ธ | ์ฆ๊ฐ ์ ํ๋ณธ ํฌ๊ธฐ |
|---|---|
| ํจ๊ณผ ํฌ๊ธฐ โ | ๊ฐ์ |
| ๊ฒ์ ๋ ฅ โ | ์ฆ๊ฐ |
| ฮฑ โ (๋ ์๊ฒฉ) | ์ฆ๊ฐ |
| ๋ถ์ฐ โ | ์ฆ๊ฐ |
์์ฐจ์ ๊ฒ์ ¶
| ๋ฐฉ๋ฒ | ์ด๊ธฐ | ํ๊ธฐ |
|---|---|---|
| O'Brien-Fleming | ๋งค์ฐ ๋ณด์์ | ๊ณ ์ ํ๋ณธ ์ ์ฌ |
| Pocock | ์ผ์ | ๋ ๋ณด์์ |
Python ๋ผ์ด๋ธ๋ฌ๋ฆฌ¶
from statsmodels.stats.power import TTestPower, NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize
from scipy.stats import norm
# ํ๋ณธ ํฌ๊ธฐ ๊ณ์ฐ
power_analysis = NormalIndPower()
n = power_analysis.solve_power(effect_size=h, alpha=0.05, power=0.80)