10_knn_naive_bayes.ipynb

Download
json 876 lines 27.1 KB
  1{
  2 "cells": [
  3  {
  4   "cell_type": "markdown",
  5   "id": "cell-0",
  6   "metadata": {},
  7   "source": [
  8    "# 10. k-์ตœ๊ทผ์ ‘ ์ด์›ƒ(kNN)๊ณผ ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ\n",
  9    "\n",
 10    "## ํ•™์Šต ๋ชฉํ‘œ\n",
 11    "- kNN์˜ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ถ„๋ฅ˜ ์›๋ฆฌ ์ดํ•ด\n",
 12    "- ๊ฑฐ๋ฆฌ ๋ฉ”ํŠธ๋ฆญ (Euclidean, Manhattan, Minkowski) ํ•™์Šต\n",
 13    "- ์ตœ์  k๊ฐ’ ์„ ํƒ ๋ฐฉ๋ฒ• ์Šต๋“\n",
 14    "- ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ์˜ ํ™•๋ฅ  ๊ธฐ๋ฐ˜ ๋ถ„๋ฅ˜ ์ดํ•ด\n",
 15    "- Gaussian, Multinomial, Bernoulli NB ๋น„๊ต\n",
 16    "- ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ์ ์šฉ"
 17   ]
 18  },
 19  {
 20   "cell_type": "code",
 21   "execution_count": null,
 22   "id": "cell-1",
 23   "metadata": {},
 24   "outputs": [],
 25   "source": [
 26    "# ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ\n",
 27    "import numpy as np\n",
 28    "import pandas as pd\n",
 29    "import matplotlib.pyplot as plt\n",
 30    "from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor\n",
 31    "from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB\n",
 32    "from sklearn.datasets import (\n",
 33    "    load_iris, load_breast_cancer, load_diabetes, load_digits,\n",
 34    "    make_classification, fetch_20newsgroups\n",
 35    ")\n",
 36    "from sklearn.model_selection import train_test_split, cross_val_score\n",
 37    "from sklearn.preprocessing import StandardScaler\n",
 38    "from sklearn.metrics import accuracy_score, classification_report, mean_squared_error, r2_score\n",
 39    "from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer\n",
 40    "from scipy.spatial.distance import euclidean, cityblock, minkowski, chebyshev\n",
 41    "from time import time\n",
 42    "\n",
 43    "# ํ•œ๊ธ€ ํฐํŠธ ์„ค์ •\n",
 44    "plt.rcParams['font.family'] = 'DejaVu Sans'\n",
 45    "plt.rcParams['axes.unicode_minus'] = False\n",
 46    "np.random.seed(42)"
 47   ]
 48  },
 49  {
 50   "cell_type": "markdown",
 51   "id": "cell-2",
 52   "metadata": {},
 53   "source": [
 54    "## 1. k-์ตœ๊ทผ์ ‘ ์ด์›ƒ (kNN) ๊ฐœ๋…\n",
 55    "\n",
 56    "kNN์€ ๊ฒŒ์œผ๋ฅธ ํ•™์Šต(Lazy Learning) ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.\n",
 57    "\n",
 58    "**๋™์ž‘ ์›๋ฆฌ**:\n",
 59    "1. ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์˜ค๋ฉด\n",
 60    "2. ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด k๊ฐœ์˜ ์ด์›ƒ์„ ์ฐพ์Œ\n",
 61    "3. k๊ฐœ ์ด์›ƒ์˜ ๋‹ค์ˆ˜๊ฒฐ(๋ถ„๋ฅ˜) ๋˜๋Š” ํ‰๊ท (ํšŒ๊ท€)์œผ๋กœ ์˜ˆ์ธก\n",
 62    "\n",
 63    "**ํŠน์ง•**:\n",
 64    "- ํ•™์Šต ์‹œ ๋ชจ๋ธ ์ƒ์„ฑ ์—†์Œ (๋ชจ๋“  ๋ฐ์ดํ„ฐ ์ €์žฅ)\n",
 65    "- ๋น„๋ชจ์ˆ˜์  ๋ฐฉ๋ฒ• (๋ฐ์ดํ„ฐ ๋ถ„ํฌ ๊ฐ€์ • ๋ถˆํ•„์š”)\n",
 66    "- ์˜ˆ์ธก ์‹œ๊ฐ„์ด ๋А๋ฆผ"
 67   ]
 68  },
 69  {
 70   "cell_type": "code",
 71   "execution_count": null,
 72   "id": "cell-3",
 73   "metadata": {},
 74   "outputs": [],
 75   "source": [
 76    "# 2D ๋ฐ์ดํ„ฐ๋กœ kNN ์‹œ๊ฐํ™”\n",
 77    "X, y = make_classification(\n",
 78    "    n_samples=100, n_features=2, n_redundant=0,\n",
 79    "    n_informative=2, n_clusters_per_class=1, random_state=42\n",
 80    ")\n",
 81    "\n",
 82    "# ์—ฌ๋Ÿฌ k๊ฐ’ ๋น„๊ต\n",
 83    "fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
 84    "k_values = [1, 5, 15]\n",
 85    "\n",
 86    "for ax, k in zip(axes, k_values):\n",
 87    "    knn = KNeighborsClassifier(n_neighbors=k)\n",
 88    "    knn.fit(X, y)\n",
 89    "\n",
 90    "    # ๊ฒฐ์ • ๊ฒฝ๊ณ„\n",
 91    "    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
 92    "    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
 93    "    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),\n",
 94    "                         np.linspace(y_min, y_max, 100))\n",
 95    "\n",
 96    "    Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])\n",
 97    "    Z = Z.reshape(xx.shape)\n",
 98    "\n",
 99    "    ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')\n",
100    "    ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='black')\n",
101    "    ax.set_title(f'k = {k}\\nAccuracy = {knn.score(X, y):.3f}')\n",
102    "    ax.set_xlabel('Feature 1')\n",
103    "    ax.set_ylabel('Feature 2')\n",
104    "\n",
105    "plt.tight_layout()\n",
106    "plt.show()"
107   ]
108  },
109  {
110   "cell_type": "markdown",
111   "id": "cell-4",
112   "metadata": {},
113   "source": [
114    "## 2. kNN ๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•"
115   ]
116  },
117  {
118   "cell_type": "code",
119   "execution_count": null,
120   "id": "cell-5",
121   "metadata": {},
122   "outputs": [],
123   "source": [
124    "# ๋ฐ์ดํ„ฐ ๋กœ๋“œ\n",
125    "iris = load_iris()\n",
126    "X_train, X_test, y_train, y_test = train_test_split(\n",
127    "    iris.data, iris.target, test_size=0.2, random_state=42\n",
128    ")\n",
129    "\n",
130    "# kNN ๋ถ„๋ฅ˜๊ธฐ\n",
131    "knn = KNeighborsClassifier(\n",
132    "    n_neighbors=5,           # k๊ฐ’\n",
133    "    weights='uniform',       # ๊ฐ€์ค‘์น˜: 'uniform' ๋˜๋Š” 'distance'\n",
134    "    algorithm='auto',        # ์•Œ๊ณ ๋ฆฌ์ฆ˜: 'auto', 'ball_tree', 'kd_tree', 'brute'\n",
135    "    metric='minkowski',      # ๊ฑฐ๋ฆฌ ์ธก์ •: 'euclidean', 'manhattan', 'minkowski'\n",
136    "    p=2                      # minkowski p๊ฐ’ (2=euclidean, 1=manhattan)\n",
137    ")\n",
138    "\n",
139    "knn.fit(X_train, y_train)\n",
140    "y_pred = knn.predict(X_test)\n",
141    "\n",
142    "print(\"kNN ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ:\")\n",
143    "print(f\"  ์ •ํ™•๋„: {accuracy_score(y_test, y_pred):.4f}\")\n",
144    "print(\"\\n๋ถ„๋ฅ˜ ๋ฆฌํฌํŠธ:\")\n",
145    "print(classification_report(y_test, y_pred, target_names=iris.target_names))"
146   ]
147  },
148  {
149   "cell_type": "markdown",
150   "id": "cell-6",
151   "metadata": {},
152   "source": [
153    "## 3. ๊ฑฐ๋ฆฌ ์ธก์ • ๋ฐฉ๋ฒ•\n",
154    "\n",
155    "kNN์˜ ํ•ต์‹ฌ์€ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ์ž…๋‹ˆ๋‹ค.\n",
156    "\n",
157    "์ฃผ์š” ๊ฑฐ๋ฆฌ ๋ฉ”ํŠธ๋ฆญ:\n",
158    "- **์œ ํด๋ฆฌ๋“œ (Euclidean, L2)**: d = โˆšฮฃ(xi - yi)ยฒ\n",
159    "- **๋งจํ•ดํŠผ (Manhattan, L1)**: d = ฮฃ|xi - yi|\n",
160    "- **๋ฏผ์ฝ”ํ”„์Šคํ‚ค (Minkowski)**: d = (ฮฃ|xi - yi|^p)^(1/p)\n",
161    "- **์ฒด๋น„์…ฐํ”„ (Chebyshev, Lโˆž)**: d = max(|xi - yi|)"
162   ]
163  },
164  {
165   "cell_type": "code",
166   "execution_count": null,
167   "id": "cell-7",
168   "metadata": {},
169   "outputs": [],
170   "source": [
171    "# ๊ฑฐ๋ฆฌ ์ธก์ • ์˜ˆ์‹œ\n",
172    "point1 = np.array([1, 2, 3])\n",
173    "point2 = np.array([4, 5, 6])\n",
174    "\n",
175    "print(\"๊ฑฐ๋ฆฌ ์ธก์ • ์˜ˆ์‹œ:\")\n",
176    "print(f\"  Point 1: {point1}\")\n",
177    "print(f\"  Point 2: {point2}\")\n",
178    "print()\n",
179    "print(f\"  ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ:     {euclidean(point1, point2):.4f}\")\n",
180    "print(f\"  ๋งจํ•ดํŠผ ๊ฑฐ๋ฆฌ:       {cityblock(point1, point2):.4f}\")\n",
181    "print(f\"  ๋ฏผ์ฝ”ํ”„์Šคํ‚ค (p=3):  {minkowski(point1, point2, p=3):.4f}\")\n",
182    "print(f\"  ์ฒด๋น„์…ฐํ”„ ๊ฑฐ๋ฆฌ:     {chebyshev(point1, point2):.4f}\")"
183   ]
184  },
185  {
186   "cell_type": "code",
187   "execution_count": null,
188   "id": "cell-8",
189   "metadata": {},
190   "outputs": [],
191   "source": [
192    "# ๊ฑฐ๋ฆฌ ๋ฉ”ํŠธ๋ฆญ๋ณ„ ์„ฑ๋Šฅ ๋น„๊ต\n",
193    "metrics = ['euclidean', 'manhattan', 'chebyshev']\n",
194    "\n",
195    "print(\"๊ฑฐ๋ฆฌ ๋ฉ”ํŠธ๋ฆญ๋ณ„ ์„ฑ๋Šฅ (Iris):\")\n",
196    "print(\"-\" * 40)\n",
197    "for metric in metrics:\n",
198    "    knn = KNeighborsClassifier(n_neighbors=5, metric=metric)\n",
199    "    knn.fit(X_train, y_train)\n",
200    "    acc = knn.score(X_test, y_test)\n",
201    "    print(f\"  {metric:12s}: {acc:.4f}\")"
202   ]
203  },
204  {
205   "cell_type": "markdown",
206   "id": "cell-9",
207   "metadata": {},
208   "source": [
209    "## 4. ์ตœ์  k๊ฐ’ ์„ ํƒ\n",
210    "\n",
211    "k๊ฐ’์ด ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ๊ณผ์ ํ•ฉ, ๋„ˆ๋ฌด ํฌ๋ฉด ๊ณผ์†Œ์ ํ•ฉ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.\n",
212    "๊ต์ฐจ ๊ฒ€์ฆ์œผ๋กœ ์ตœ์  k๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค."
213   ]
214  },
215  {
216   "cell_type": "code",
217   "execution_count": null,
218   "id": "cell-10",
219   "metadata": {},
220   "outputs": [],
221   "source": [
222    "# k๊ฐ’์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™”\n",
223    "k_range = range(1, 31)\n",
224    "train_scores = []\n",
225    "test_scores = []\n",
226    "\n",
227    "for k in k_range:\n",
228    "    knn = KNeighborsClassifier(n_neighbors=k)\n",
229    "    knn.fit(X_train, y_train)\n",
230    "    train_scores.append(knn.score(X_train, y_train))\n",
231    "    test_scores.append(knn.score(X_test, y_test))\n",
232    "\n",
233    "# ์‹œ๊ฐํ™”\n",
234    "plt.figure(figsize=(10, 6))\n",
235    "plt.plot(k_range, train_scores, 'o-', label='Train')\n",
236    "plt.plot(k_range, test_scores, 's-', label='Test')\n",
237    "plt.xlabel('k (Number of Neighbors)')\n",
238    "plt.ylabel('Accuracy')\n",
239    "plt.title('kNN: k vs Accuracy')\n",
240    "plt.legend()\n",
241    "plt.grid(True, alpha=0.3)\n",
242    "plt.xticks(k_range[::2])\n",
243    "plt.tight_layout()\n",
244    "plt.show()\n",
245    "\n",
246    "# ์ตœ์  k ์ฐพ๊ธฐ\n",
247    "best_k = k_range[np.argmax(test_scores)]\n",
248    "print(f\"์ตœ์  k: {best_k}\")\n",
249    "print(f\"์ตœ๊ณ  ํ…Œ์ŠคํŠธ ์ •ํ™•๋„: {max(test_scores):.4f}\")"
250   ]
251  },
252  {
253   "cell_type": "code",
254   "execution_count": null,
255   "id": "cell-11",
256   "metadata": {},
257   "outputs": [],
258   "source": [
259    "# ๊ต์ฐจ ๊ฒ€์ฆ์œผ๋กœ k ์„ ํƒ\n",
260    "k_range = range(1, 31)\n",
261    "cv_scores = []\n",
262    "\n",
263    "for k in k_range:\n",
264    "    knn = KNeighborsClassifier(n_neighbors=k)\n",
265    "    scores = cross_val_score(knn, X_train, y_train, cv=5, scoring='accuracy')\n",
266    "    cv_scores.append(scores.mean())\n",
267    "\n",
268    "# ์‹œ๊ฐํ™”\n",
269    "plt.figure(figsize=(10, 6))\n",
270    "plt.plot(k_range, cv_scores, 'o-', color='green')\n",
271    "plt.xlabel('k')\n",
272    "plt.ylabel('Cross-Validation Accuracy')\n",
273    "plt.title('kNN: k Selection with 5-Fold Cross-Validation')\n",
274    "plt.grid(True, alpha=0.3)\n",
275    "plt.xticks(k_range[::2])\n",
276    "plt.tight_layout()\n",
277    "plt.show()\n",
278    "\n",
279    "best_k_cv = k_range[np.argmax(cv_scores)]\n",
280    "print(f\"๊ต์ฐจ ๊ฒ€์ฆ ์ตœ์  k: {best_k_cv}\")\n",
281    "print(f\"์ตœ๊ณ  CV ์ •ํ™•๋„: {max(cv_scores):.4f}\")"
282   ]
283  },
284  {
285   "cell_type": "markdown",
286   "id": "cell-12",
287   "metadata": {},
288   "source": [
289    "## 5. ๊ฐ€์ค‘ kNN (Weighted kNN)\n",
290    "\n",
291    "๊ฑฐ๋ฆฌ์— ๋”ฐ๋ผ ์ด์›ƒ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค.\n",
292    "\n",
293    "- **uniform**: ๋ชจ๋“  ์ด์›ƒ์— ๋™์ผํ•œ ๊ฐ€์ค‘์น˜\n",
294    "- **distance**: ๊ฐ€๊นŒ์šด ์ด์›ƒ์— ๋” ํฐ ๊ฐ€์ค‘์น˜ (weight = 1/distance)"
295   ]
296  },
297  {
298   "cell_type": "code",
299   "execution_count": null,
300   "id": "cell-13",
301   "metadata": {},
302   "outputs": [],
303   "source": [
304    "# ๊ฐ€์ค‘์น˜ ๋ฐฉ์‹ ๋น„๊ต\n",
305    "weights = ['uniform', 'distance']\n",
306    "\n",
307    "print(\"๊ฐ€์ค‘์น˜ ๋ฐฉ์‹ ๋น„๊ต:\")\n",
308    "print(\"-\" * 40)\n",
309    "for weight in weights:\n",
310    "    knn = KNeighborsClassifier(n_neighbors=5, weights=weight)\n",
311    "    knn.fit(X_train, y_train)\n",
312    "    acc = knn.score(X_test, y_test)\n",
313    "    print(f\"  {weight:10s}: {acc:.4f}\")"
314   ]
315  },
316  {
317   "cell_type": "code",
318   "execution_count": null,
319   "id": "cell-14",
320   "metadata": {},
321   "outputs": [],
322   "source": [
323    "# ๊ฑฐ๋ฆฌ ๊ฐ€์ค‘ kNN ์‹œ๊ฐํ™”\n",
324    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
325    "\n",
326    "for ax, weight in zip(axes, weights):\n",
327    "    knn = KNeighborsClassifier(n_neighbors=15, weights=weight)\n",
328    "    knn.fit(X[:, :2], y)\n",
329    "\n",
330    "    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
331    "    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
332    "    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),\n",
333    "                         np.linspace(y_min, y_max, 100))\n",
334    "\n",
335    "    Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])\n",
336    "    Z = Z.reshape(xx.shape)\n",
337    "\n",
338    "    ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')\n",
339    "    ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='black')\n",
340    "    ax.set_title(f'weights = {weight}')\n",
341    "    ax.set_xlabel('Feature 1')\n",
342    "    ax.set_ylabel('Feature 2')\n",
343    "\n",
344    "plt.tight_layout()\n",
345    "plt.show()"
346   ]
347  },
348  {
349   "cell_type": "markdown",
350   "id": "cell-15",
351   "metadata": {},
352   "source": [
353    "## 6. kNN ํšŒ๊ท€\n",
354    "\n",
355    "kNN์€ ํšŒ๊ท€ ๋ฌธ์ œ์—๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\n",
356    "k๊ฐœ ์ด์›ƒ์˜ ํ‰๊ท ์œผ๋กœ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค."
357   ]
358  },
359  {
360   "cell_type": "code",
361   "execution_count": null,
362   "id": "cell-16",
363   "metadata": {},
364   "outputs": [],
365   "source": [
366    "# ๋ฐ์ดํ„ฐ ๋กœ๋“œ\n",
367    "diabetes = load_diabetes()\n",
368    "X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(\n",
369    "    diabetes.data, diabetes.target, test_size=0.2, random_state=42\n",
370    ")\n",
371    "\n",
372    "# ์Šค์ผ€์ผ๋ง (kNN์€ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜์ด๋ฏ€๋กœ ํ•„์ˆ˜)\n",
373    "scaler = StandardScaler()\n",
374    "X_train_d_scaled = scaler.fit_transform(X_train_d)\n",
375    "X_test_d_scaled = scaler.transform(X_test_d)\n",
376    "\n",
377    "# kNN ํšŒ๊ท€\n",
378    "knn_reg = KNeighborsRegressor(n_neighbors=5, weights='distance')\n",
379    "knn_reg.fit(X_train_d_scaled, y_train_d)\n",
380    "y_pred_d = knn_reg.predict(X_test_d_scaled)\n",
381    "\n",
382    "print(\"kNN ํšŒ๊ท€ ๊ฒฐ๊ณผ:\")\n",
383    "print(f\"  MSE: {mean_squared_error(y_test_d, y_pred_d):.4f}\")\n",
384    "print(f\"  RMSE: {np.sqrt(mean_squared_error(y_test_d, y_pred_d)):.4f}\")\n",
385    "print(f\"  Rยฒ: {r2_score(y_test_d, y_pred_d):.4f}\")"
386   ]
387  },
388  {
389   "cell_type": "markdown",
390   "id": "cell-17",
391   "metadata": {},
392   "source": [
393    "## 7. kNN ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋น„๊ต\n",
394    "\n",
395    "๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ํƒ์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ ํƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.\n",
396    "\n",
397    "- **brute**: ์ „์ˆ˜ ํƒ์ƒ‰ (O(n))\n",
398    "- **kd_tree**: KD-Tree ์‚ฌ์šฉ (์ €์ฐจ์›์— ํšจ์œจ์ )\n",
399    "- **ball_tree**: Ball-Tree ์‚ฌ์šฉ (๊ณ ์ฐจ์›์— ํšจ์œจ์ )"
400   ]
401  },
402  {
403   "cell_type": "code",
404   "execution_count": null,
405   "id": "cell-18",
406   "metadata": {},
407   "outputs": [],
408   "source": [
409    "# ์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณ„ ์‹œ๊ฐ„ ๋น„๊ต\n",
410    "algorithms = ['brute', 'kd_tree', 'ball_tree']\n",
411    "\n",
412    "print(\"์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณ„ ์‹œ๊ฐ„ ๋น„๊ต:\")\n",
413    "print(\"-\" * 60)\n",
414    "for algo in algorithms:\n",
415    "    knn = KNeighborsClassifier(n_neighbors=5, algorithm=algo)\n",
416    "\n",
417    "    # ํ•™์Šต ์‹œ๊ฐ„\n",
418    "    start = time()\n",
419    "    knn.fit(X_train, y_train)\n",
420    "    fit_time = time() - start\n",
421    "\n",
422    "    # ์˜ˆ์ธก ์‹œ๊ฐ„\n",
423    "    start = time()\n",
424    "    knn.predict(X_test)\n",
425    "    pred_time = time() - start\n",
426    "\n",
427    "    print(f\"  {algo:10s}: fit={fit_time:.4f}s, predict={pred_time:.4f}s\")"
428   ]
429  },
430  {
431   "cell_type": "markdown",
432   "id": "cell-19",
433   "metadata": {},
434   "source": [
435    "## 8. ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ (Naive Bayes)\n",
436    "\n",
437    "### ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ\n",
438    "\n",
439    "**P(y|X) = P(X|y) ร— P(y) / P(X)**\n",
440    "\n",
441    "- P(y|X): ์‚ฌํ›„ ํ™•๋ฅ  (ํŠน์„ฑ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ํด๋ž˜์Šค ํ™•๋ฅ )\n",
442    "- P(X|y): ์šฐ๋„ (ํด๋ž˜์Šค๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ํŠน์„ฑ ํ™•๋ฅ )\n",
443    "- P(y): ์‚ฌ์ „ ํ™•๋ฅ  (ํด๋ž˜์Šค์˜ ๊ธฐ๋ณธ ํ™•๋ฅ )\n",
444    "- P(X): ์ฆ๊ฑฐ (ํŠน์„ฑ์˜ ํ™•๋ฅ )\n",
445    "\n",
446    "### ๋‚˜์ด๋ธŒ ๊ฐ€์ •\n",
447    "\n",
448    "๋ชจ๋“  ํŠน์„ฑ์ด ์„œ๋กœ ๋…๋ฆฝ์ ์ด๋ผ๊ณ  ๊ฐ€์ •:\n",
449    "**P(X|y) = P(xโ‚|y) ร— P(xโ‚‚|y) ร— ... ร— P(xโ‚™|y)**"
450   ]
451  },
452  {
453   "cell_type": "markdown",
454   "id": "cell-20",
455   "metadata": {},
456   "source": [
457    "## 9. ๊ฐ€์šฐ์‹œ์•ˆ ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ\n",
458    "\n",
459    "์—ฐ์†ํ˜• ํŠน์„ฑ์ด ๊ฐ€์šฐ์‹œ์•ˆ(์ •๊ทœ) ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.\n",
460    "**P(xi|y) = N(xi; ฮผy, ฯƒy)**"
461   ]
462  },
463  {
464   "cell_type": "code",
465   "execution_count": null,
466   "id": "cell-21",
467   "metadata": {},
468   "outputs": [],
469   "source": [
470    "# ๊ฐ€์šฐ์‹œ์•ˆ ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ\n",
471    "gnb = GaussianNB()\n",
472    "gnb.fit(X_train, y_train)\n",
473    "y_pred_nb = gnb.predict(X_test)\n",
474    "\n",
475    "print(\"๊ฐ€์šฐ์‹œ์•ˆ ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ๊ฒฐ๊ณผ:\")\n",
476    "print(f\"  ์ •ํ™•๋„: {accuracy_score(y_test, y_pred_nb):.4f}\")\n",
477    "\n",
478    "# ํ•™์Šต๋œ ํŒŒ๋ผ๋ฏธํ„ฐ ํ™•์ธ\n",
479    "print(f\"\\nํด๋ž˜์Šค ์‚ฌ์ „ ํ™•๋ฅ : {gnb.class_prior_}\")\n",
480    "print(f\"\\nํด๋ž˜์Šค๋ณ„ ํ‰๊ท  (์ฒ˜์Œ 2๊ฐœ ํŠน์„ฑ):\")\n",
481    "print(gnb.theta_[:, :2])\n",
482    "print(f\"\\nํด๋ž˜์Šค๋ณ„ ๋ถ„์‚ฐ (์ฒ˜์Œ 2๊ฐœ ํŠน์„ฑ):\")\n",
483    "print(gnb.var_[:, :2])"
484   ]
485  },
486  {
487   "cell_type": "code",
488   "execution_count": null,
489   "id": "cell-22",
490   "metadata": {},
491   "outputs": [],
492   "source": [
493    "# ํ™•๋ฅ  ์˜ˆ์ธก\n",
494    "y_proba = gnb.predict_proba(X_test[:5])\n",
495    "\n",
496    "print(\"ํ™•๋ฅ  ์˜ˆ์ธก (์ฒ˜์Œ 5๊ฐœ):\")\n",
497    "print(f\"ํด๋ž˜์Šค: {iris.target_names}\")\n",
498    "print(y_proba)\n",
499    "print(f\"\\n์˜ˆ์ธก ํด๋ž˜์Šค: {gnb.predict(X_test[:5])}\")\n",
500    "print(f\"์‹ค์ œ ํด๋ž˜์Šค: {y_test[:5]}\")"
501   ]
502  },
503  {
504   "cell_type": "markdown",
505   "id": "cell-23",
506   "metadata": {},
507   "source": [
508    "## 10. ๋‹คํ•ญ ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ - ํ…์ŠคํŠธ ๋ถ„๋ฅ˜\n",
509    "\n",
510    "์ด์‚ฐํ˜•/์นด์šดํŠธ ํŠน์„ฑ์— ์‚ฌ์šฉํ•˜๋ฉฐ, ์ฃผ๋กœ ํ…์ŠคํŠธ ๋ถ„๋ฅ˜(๋‹จ์–ด ๋นˆ๋„)์— ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.\n",
511    "\n",
512    "**P(xi|y) = (Nyi + ฮฑ) / (Ny + ฮฑn)**\n",
513    "\n",
514    "- ฮฑ: Laplace smoothing ํŒŒ๋ผ๋ฏธํ„ฐ (Zero frequency ๋ฌธ์ œ ํ•ด๊ฒฐ)"
515   ]
516  },
517  {
518   "cell_type": "code",
519   "execution_count": null,
520   "id": "cell-24",
521   "metadata": {},
522   "outputs": [],
523   "source": [
524    "# ๋‰ด์Šค ๋ฐ์ดํ„ฐ ๋กœ๋“œ\n",
525    "categories = ['sci.space', 'rec.sport.baseball', 'talk.politics.misc']\n",
526    "newsgroups = fetch_20newsgroups(\n",
527    "    subset='train',\n",
528    "    categories=categories,\n",
529    "    remove=('headers', 'footers', 'quotes'),\n",
530    "    random_state=42\n",
531    ")\n",
532    "\n",
533    "print(f\"๋‰ด์Šค ๋ฐ์ดํ„ฐ: {len(newsgroups.data)} ๊ธฐ์‚ฌ\")\n",
534    "print(f\"์นดํ…Œ๊ณ ๋ฆฌ: {categories}\")\n",
535    "print(f\"\\n์ฒซ ๋ฒˆ์งธ ๊ธฐ์‚ฌ (์ผ๋ถ€):\\n{newsgroups.data[0][:200]}...\")"
536   ]
537  },
538  {
539   "cell_type": "code",
540   "execution_count": null,
541   "id": "cell-25",
542   "metadata": {},
543   "outputs": [],
544   "source": [
545    "# ํ…์ŠคํŠธ ๋ฒกํ„ฐํ™”\n",
546    "vectorizer = CountVectorizer(max_features=5000, stop_words='english')\n",
547    "X_news = vectorizer.fit_transform(newsgroups.data)\n",
548    "y_news = newsgroups.target\n",
549    "\n",
550    "print(f\"๋ฒกํ„ฐ ํฌ๊ธฐ: {X_news.shape}\")\n",
551    "print(f\"ํŠน์„ฑ ์ˆ˜: {len(vectorizer.get_feature_names_out())}\")\n",
552    "\n",
553    "# ํ•™์Šต/ํ…Œ์ŠคํŠธ ๋ถ„ํ• \n",
554    "X_train_news, X_test_news, y_train_news, y_test_news = train_test_split(\n",
555    "    X_news, y_news, test_size=0.2, random_state=42\n",
556    ")"
557   ]
558  },
559  {
560   "cell_type": "code",
561   "execution_count": null,
562   "id": "cell-26",
563   "metadata": {},
564   "outputs": [],
565   "source": [
566    "# ๋‹คํ•ญ ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ\n",
567    "mnb = MultinomialNB(alpha=1.0)  # alpha: Laplace smoothing\n",
568    "mnb.fit(X_train_news, y_train_news)\n",
569    "\n",
570    "y_pred_news = mnb.predict(X_test_news)\n",
571    "\n",
572    "print(\"๋‹คํ•ญ ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ (ํ…์ŠคํŠธ ๋ถ„๋ฅ˜) ๊ฒฐ๊ณผ:\")\n",
573    "print(f\"  ์ •ํ™•๋„: {mnb.score(X_test_news, y_test_news):.4f}\")\n",
574    "print(\"\\n๋ถ„๋ฅ˜ ๋ฆฌํฌํŠธ:\")\n",
575    "print(classification_report(y_test_news, y_pred_news, target_names=categories))"
576   ]
577  },
578  {
579   "cell_type": "code",
580   "execution_count": null,
581   "id": "cell-27",
582   "metadata": {},
583   "outputs": [],
584   "source": [
585    "# ๊ฐ ํด๋ž˜์Šค์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋‹จ์–ด\n",
586    "feature_names = vectorizer.get_feature_names_out()\n",
587    "\n",
588    "print(\"๊ฐ ํด๋ž˜์Šค๋ณ„ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด:\")\n",
589    "print(\"=\" * 60)\n",
590    "for i, category in enumerate(categories):\n",
591    "    top_indices = mnb.feature_log_prob_[i].argsort()[-10:][::-1]\n",
592    "    top_words = [feature_names[idx] for idx in top_indices]\n",
593    "    print(f\"\\n{category}:\")\n",
594    "    print(f\"  {', '.join(top_words)}\")"
595   ]
596  },
597  {
598   "cell_type": "markdown",
599   "id": "cell-28",
600   "metadata": {},
601   "source": [
602    "## 11. ๋ฒ ๋ฅด๋ˆ„์ด ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ\n",
603    "\n",
604    "์ด์ง„ ํŠน์„ฑ(0/1)์— ์‚ฌ์šฉํ•˜๋ฉฐ, ๋‹จ์–ด์˜ ์กด์žฌ ์—ฌ๋ถ€๋กœ ํ…์ŠคํŠธ๋ฅผ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค."
605   ]
606  },
607  {
608   "cell_type": "code",
609   "execution_count": null,
610   "id": "cell-29",
611   "metadata": {},
612   "outputs": [],
613   "source": [
614    "# ์ด์ง„ ๋ฒกํ„ฐํ™” (๋‹จ์–ด ์กด์žฌ ์—ฌ๋ถ€๋งŒ)\n",
615    "binary_vectorizer = CountVectorizer(max_features=5000, binary=True, stop_words='english')\n",
616    "X_binary = binary_vectorizer.fit_transform(newsgroups.data)\n",
617    "\n",
618    "X_train_bin, X_test_bin, y_train_bin, y_test_bin = train_test_split(\n",
619    "    X_binary, y_news, test_size=0.2, random_state=42\n",
620    ")\n",
621    "\n",
622    "# ๋ฒ ๋ฅด๋ˆ„์ด ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ\n",
623    "bnb = BernoulliNB(alpha=1.0)\n",
624    "bnb.fit(X_train_bin, y_train_bin)\n",
625    "\n",
626    "print(\"๋ฒ ๋ฅด๋ˆ„์ด ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ๊ฒฐ๊ณผ:\")\n",
627    "print(f\"  ์ •ํ™•๋„: {bnb.score(X_test_bin, y_test_bin):.4f}\")"
628   ]
629  },
630  {
631   "cell_type": "markdown",
632   "id": "cell-30",
633   "metadata": {},
634   "source": [
635    "## 12. ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ๋ชจ๋ธ ๋น„๊ต"
636   ]
637  },
638  {
639   "cell_type": "code",
640   "execution_count": null,
641   "id": "cell-31",
642   "metadata": {},
643   "outputs": [],
644   "source": [
645    "# ์ˆซ์ž ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ\n",
646    "digits = load_digits()\n",
647    "X_train_dig, X_test_dig, y_train_dig, y_test_dig = train_test_split(\n",
648    "    digits.data, digits.target, test_size=0.2, random_state=42\n",
649    ")\n",
650    "\n",
651    "# ์„ธ ๊ฐ€์ง€ ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ๋น„๊ต\n",
652    "models = {\n",
653    "    'Gaussian NB': GaussianNB(),\n",
654    "    'Multinomial NB': MultinomialNB(),\n",
655    "    'Bernoulli NB': BernoulliNB()\n",
656    "}\n",
657    "\n",
658    "print(\"๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ๋ชจ๋ธ ๋น„๊ต (Digits):\")\n",
659    "print(\"-\" * 50)\n",
660    "for name, model in models.items():\n",
661    "    model.fit(X_train_dig, y_train_dig)\n",
662    "    acc = model.score(X_test_dig, y_test_dig)\n",
663    "    print(f\"  {name:18s}: {acc:.4f}\")"
664   ]
665  },
666  {
667   "cell_type": "markdown",
668   "id": "cell-32",
669   "metadata": {},
670   "source": [
671    "## 13. ์˜จ๋ผ์ธ ํ•™์Šต (Incremental Learning)\n",
672    "\n",
673    "๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ๋Š” `partial_fit`์œผ๋กœ ์˜จ๋ผ์ธ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.\n",
674    "๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋‚˜ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค."
675   ]
676  },
677  {
678   "cell_type": "code",
679   "execution_count": null,
680   "id": "cell-33",
681   "metadata": {},
682   "outputs": [],
683   "source": [
684    "# ์˜จ๋ผ์ธ ํ•™์Šต ์‹œ๋ฎฌ๋ ˆ์ด์…˜\n",
685    "gnb_online = GaussianNB()\n",
686    "\n",
687    "# ๋ฐฐ์น˜ ํ•™์Šต\n",
688    "batch_size = 50\n",
689    "n_batches = len(X_train) // batch_size\n",
690    "\n",
691    "for i in range(n_batches):\n",
692    "    start = i * batch_size\n",
693    "    end = start + batch_size\n",
694    "    X_batch = X_train[start:end]\n",
695    "    y_batch = y_train[start:end]\n",
696    "\n",
697    "    # ์ฒซ ๋ฐฐ์น˜์—์„œ ํด๋ž˜์Šค ์ •์˜\n",
698    "    if i == 0:\n",
699    "        gnb_online.partial_fit(X_batch, y_batch, classes=np.unique(y_train))\n",
700    "    else:\n",
701    "        gnb_online.partial_fit(X_batch, y_batch)\n",
702    "\n",
703    "print(\"์˜จ๋ผ์ธ ํ•™์Šต ๊ฒฐ๊ณผ:\")\n",
704    "print(f\"  ๋ฐฐ์น˜ ์ˆ˜: {n_batches}\")\n",
705    "print(f\"  ๋ฐฐ์น˜ ํฌ๊ธฐ: {batch_size}\")\n",
706    "print(f\"  ์ •ํ™•๋„: {gnb_online.score(X_test, y_test):.4f}\")"
707   ]
708  },
709  {
710   "cell_type": "markdown",
711   "id": "cell-34",
712   "metadata": {},
713   "source": [
714    "## 14. kNN vs ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ๋น„๊ต"
715   ]
716  },
717  {
718   "cell_type": "code",
719   "execution_count": null,
720   "id": "cell-35",
721   "metadata": {},
722   "outputs": [],
723   "source": [
724    "# ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ\n",
725    "cancer = load_breast_cancer()\n",
726    "X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(\n",
727    "    cancer.data, cancer.target, test_size=0.2, random_state=42\n",
728    ")\n",
729    "\n",
730    "# ์Šค์ผ€์ผ๋ง\n",
731    "scaler = StandardScaler()\n",
732    "X_train_c_scaled = scaler.fit_transform(X_train_c)\n",
733    "X_test_c_scaled = scaler.transform(X_test_c)\n",
734    "\n",
735    "# ๋ชจ๋ธ ๋น„๊ต\n",
736    "models = {\n",
737    "    'kNN (k=5)': KNeighborsClassifier(n_neighbors=5),\n",
738    "    'kNN (weighted)': KNeighborsClassifier(n_neighbors=5, weights='distance'),\n",
739    "    'Gaussian NB': GaussianNB()\n",
740    "}\n",
741    "\n",
742    "print(\"kNN vs ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ๋น„๊ต (Breast Cancer):\")\n",
743    "print(\"=\" * 50)\n",
744    "\n",
745    "for name, model in models.items():\n",
746    "    if 'kNN' in name:\n",
747    "        model.fit(X_train_c_scaled, y_train_c)\n",
748    "        acc = model.score(X_test_c_scaled, y_test_c)\n",
749    "    else:\n",
750    "        model.fit(X_train_c, y_train_c)\n",
751    "        acc = model.score(X_test_c, y_test_c)\n",
752    "    print(f\"  {name:18s}: {acc:.4f}\")"
753   ]
754  },
755  {
756   "cell_type": "markdown",
757   "id": "cell-36",
758   "metadata": {},
759   "source": [
760    "## 15. ๊ฐ„๋‹จํ•œ ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ์˜ˆ์ œ"
761   ]
762  },
763  {
764   "cell_type": "code",
765   "execution_count": null,
766   "id": "cell-37",
767   "metadata": {},
768   "outputs": [],
769   "source": [
770    "# ๊ฐ„๋‹จํ•œ ๊ฐ์„ฑ ๋ถ„๋ฅ˜\n",
771    "texts = [\n",
772    "    \"I love this movie\", \"Great film\", \"Excellent acting\",\n",
773    "    \"Amazing performance\", \"Wonderful story\",\n",
774    "    \"Terrible movie\", \"Bad film\", \"Worst movie ever\",\n",
775    "    \"Horrible acting\", \"Disappointing story\"\n",
776    "]\n",
777    "labels = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]  # 1: positive, 0: negative\n",
778    "\n",
779    "# TF-IDF ๋ฒกํ„ฐํ™”\n",
780    "tfidf = TfidfVectorizer()\n",
781    "X_sentiment = tfidf.fit_transform(texts)\n",
782    "\n",
783    "# ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ํ•™์Šต\n",
784    "mnb_sentiment = MultinomialNB()\n",
785    "mnb_sentiment.fit(X_sentiment, labels)\n",
786    "\n",
787    "# ์ƒˆ๋กœ์šด ํ…์ŠคํŠธ ๋ถ„๋ฅ˜\n",
788    "new_texts = [\n",
789    "    \"This is a great movie\",\n",
790    "    \"I hate this film\",\n",
791    "    \"Excellent performance and story\",\n",
792    "    \"Terrible and disappointing\"\n",
793    "]\n",
794    "X_new = tfidf.transform(new_texts)\n",
795    "predictions = mnb_sentiment.predict(X_new)\n",
796    "probabilities = mnb_sentiment.predict_proba(X_new)\n",
797    "\n",
798    "print(\"๊ฐ์„ฑ ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ:\")\n",
799    "print(\"=\" * 60)\n",
800    "for text, pred, prob in zip(new_texts, predictions, probabilities):\n",
801    "    sentiment = \"Positive\" if pred == 1 else \"Negative\"\n",
802    "    confidence = max(prob) * 100\n",
803    "    print(f\"'{text}'\")\n",
804    "    print(f\"  โ†’ {sentiment} (์‹ ๋ขฐ๋„: {confidence:.1f}%)\\n\")"
805   ]
806  },
807  {
808   "cell_type": "markdown",
809   "id": "cell-38",
810   "metadata": {},
811   "source": [
812    "## ์ •๋ฆฌ\n",
813    "\n",
814    "### kNN ์š”์•ฝ\n",
815    "\n",
816    "| ํŒŒ๋ผ๋ฏธํ„ฐ | ์„ค๋ช… | ๊ถŒ์žฅ |\n",
817    "|----------|------|------|\n",
818    "| **n_neighbors** | ์ด์›ƒ ์ˆ˜ (k) | ๊ต์ฐจ ๊ฒ€์ฆ์œผ๋กœ ์„ ํƒ |\n",
819    "| **weights** | ๊ฐ€์ค‘์น˜ ๋ฐฉ์‹ | 'distance' ์ถ”์ฒœ |\n",
820    "| **metric** | ๊ฑฐ๋ฆฌ ์ธก์ • | 'euclidean' ๊ธฐ๋ณธ |\n",
821    "| **algorithm** | ํƒ์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜ | 'auto' |\n",
822    "\n",
823    "**ํŠน์ง•**:\n",
824    "- ๊ฒŒ์œผ๋ฅธ ํ•™์Šต (ํ•™์Šต ์‹œ๊ฐ„ ์—†์Œ)\n",
825    "- ์˜ˆ์ธก ์‹œ๊ฐ„ ๋А๋ฆผ (O(nยทd))\n",
826    "- ์Šค์ผ€์ผ๋ง ํ•„์ˆ˜\n",
827    "- ๊ณ ์ฐจ์›์—์„œ ์„ฑ๋Šฅ ์ €ํ•˜ (์ฐจ์›์˜ ์ €์ฃผ)\n",
828    "\n",
829    "### ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ์š”์•ฝ\n",
830    "\n",
831    "| ์ข…๋ฅ˜ | ํŠน์„ฑ ํƒ€์ž… | ์ฃผ์š” ์šฉ๋„ |\n",
832    "|------|-----------|----------|\n",
833    "| **GaussianNB** | ์—ฐ์†ํ˜• (์ •๊ทœ ๋ถ„ํฌ) | ์ผ๋ฐ˜ ๋ถ„๋ฅ˜ |\n",
834    "| **MultinomialNB** | ์นด์šดํŠธ/๋นˆ๋„ | ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ |\n",
835    "| **BernoulliNB** | ์ด์ง„ (0/1) | ๋‹จ์–ด ์กด์žฌ ์—ฌ๋ถ€ |\n",
836    "\n",
837    "**ํŠน์ง•**:\n",
838    "- ๋งค์šฐ ๋น ๋ฆ„ (ํ•™์Šต O(nยทd), ์˜ˆ์ธก O(d))\n",
839    "- ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ์ž˜ ์ž‘๋™\n",
840    "- ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์— ํšจ๊ณผ์ \n",
841    "- ์˜จ๋ผ์ธ ํ•™์Šต ๊ฐ€๋Šฅ\n",
842    "- ํŠน์„ฑ ๋…๋ฆฝ์„ฑ ๊ฐ€์ • (ํ˜„์‹ค์—์„œ ์œ„๋ฐ˜ ๊ฐ€๋Šฅ)\n",
843    "\n",
844    "### kNN vs ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ\n",
845    "\n",
846    "| ํŠน์„ฑ | kNN | ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ |\n",
847    "|------|-----|---------------|\n",
848    "| **ํ•™์Šต ์‹œ๊ฐ„** | O(1) | O(nยทd) |\n",
849    "| **์˜ˆ์ธก ์‹œ๊ฐ„** | O(nยทd) | O(d) |\n",
850    "| **๋ฉ”๋ชจ๋ฆฌ** | ๋†’์Œ | ๋‚ฎ์Œ |\n",
851    "| **์Šค์ผ€์ผ๋ง** | ํ•„์ˆ˜ | ๋ถˆํ•„์š” |\n",
852    "| **๊ณ ์ฐจ์›** | ์•ฝํ•จ | ๊ฐ•ํ•จ |\n",
853    "| **ํ•ด์„์„ฑ** | ์ง๊ด€์  | ํ™•๋ฅ  ๊ธฐ๋ฐ˜ |\n",
854    "\n",
855    "### ๋‹ค์Œ ๋‹จ๊ณ„\n",
856    "- Clustering (K-Means, DBSCAN)\n",
857    "- Dimensionality Reduction (PCA, t-SNE)\n",
858    "- Ensemble methods (Stacking, Voting)"
859   ]
860  }
861 ],
862 "metadata": {
863  "kernelspec": {
864   "display_name": "Python 3",
865   "language": "python",
866   "name": "python3"
867  },
868  "language_info": {
869   "name": "python",
870   "version": "3.9.0"
871  }
872 },
873 "nbformat": 4,
874 "nbformat_minor": 5
875}