09_svm.ipynb

Download
json 671 lines 21.8 KB
  1{
  2 "cells": [
  3  {
  4   "cell_type": "markdown",
  5   "id": "cell-0",
  6   "metadata": {},
  7   "source": [
  8    "# 09. ์„œํฌํŠธ ๋ฒกํ„ฐ ๋จธ์‹  (Support Vector Machine)\n",
  9    "\n",
 10    "## ํ•™์Šต ๋ชฉํ‘œ\n",
 11    "- SVM์˜ ๋งˆ์ง„ ์ตœ๋Œ€ํ™” ์›๋ฆฌ ์ดํ•ด\n",
 12    "- ์„œํฌํŠธ ๋ฒกํ„ฐ์˜ ์—ญํ•  ํ•™์Šต\n",
 13    "- ์ปค๋„ ํŠธ๋ฆญ์œผ๋กœ ๋น„์„ ํ˜• ๋ฌธ์ œ ํ•ด๊ฒฐ\n",
 14    "- ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ C์™€ gamma ํŠœ๋‹\n",
 15    "- SVR๋กœ ํšŒ๊ท€ ๋ฌธ์ œ ํ•ด๊ฒฐ"
 16   ]
 17  },
 18  {
 19   "cell_type": "code",
 20   "execution_count": null,
 21   "id": "cell-1",
 22   "metadata": {},
 23   "outputs": [],
 24   "source": [
 25    "# ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ\n",
 26    "import numpy as np\n",
 27    "import pandas as pd\n",
 28    "import matplotlib.pyplot as plt\n",
 29    "from sklearn import svm\n",
 30    "from sklearn.svm import SVC, SVR, LinearSVC\n",
 31    "from sklearn.datasets import (\n",
 32    "    make_blobs, make_classification, make_moons, make_circles,\n",
 33    "    load_iris, load_breast_cancer, load_diabetes\n",
 34    ")\n",
 35    "from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score\n",
 36    "from sklearn.preprocessing import StandardScaler\n",
 37    "from sklearn.metrics import accuracy_score, classification_report, mean_squared_error, r2_score\n",
 38    "\n",
 39    "# ํ•œ๊ธ€ ํฐํŠธ ์„ค์ •\n",
 40    "plt.rcParams['font.family'] = 'DejaVu Sans'\n",
 41    "plt.rcParams['axes.unicode_minus'] = False\n",
 42    "np.random.seed(42)"
 43   ]
 44  },
 45  {
 46   "cell_type": "markdown",
 47   "id": "cell-2",
 48   "metadata": {},
 49   "source": [
 50    "## 1. ์„ ํ˜• SVM - ๋งˆ์ง„ ์ตœ๋Œ€ํ™”\n",
 51    "\n",
 52    "SVM์˜ ํ•ต์‹ฌ์€ ๋‘ ํด๋ž˜์Šค๋ฅผ ๋ถ„๋ฆฌํ•˜๋Š” ์ตœ์ ์˜ ์ดˆํ‰๋ฉด(hyperplane)์„ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.\n",
 53    "๋งˆ์ง„(margin)์„ ์ตœ๋Œ€ํ™”ํ•˜์—ฌ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ž…๋‹ˆ๋‹ค."
 54   ]
 55  },
 56  {
 57   "cell_type": "code",
 58   "execution_count": null,
 59   "id": "cell-3",
 60   "metadata": {},
 61   "outputs": [],
 62   "source": [
 63    "# ์„ ํ˜• ๋ถ„๋ฆฌ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ\n",
 64    "X, y = make_blobs(n_samples=100, centers=2, random_state=6)\n",
 65    "\n",
 66    "# ์„ ํ˜• SVM ํ•™์Šต\n",
 67    "clf = svm.SVC(kernel='linear', C=1000)\n",
 68    "clf.fit(X, y)\n",
 69    "\n",
 70    "print(f\"์„œํฌํŠธ ๋ฒกํ„ฐ ์ˆ˜: {len(clf.support_vectors_)}\")\n",
 71    "print(f\"๊ฐ€์ค‘์น˜ (w): {clf.coef_}\")\n",
 72    "print(f\"์ ˆํŽธ (b): {clf.intercept_}\")"
 73   ]
 74  },
 75  {
 76   "cell_type": "code",
 77   "execution_count": null,
 78   "id": "cell-4",
 79   "metadata": {},
 80   "outputs": [],
 81   "source": [
 82    "# ๊ฒฐ์ • ๊ฒฝ๊ณ„์™€ ๋งˆ์ง„ ์‹œ๊ฐํ™”\n",
 83    "plt.figure(figsize=(10, 8))\n",
 84    "\n",
 85    "# ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ\n",
 86    "plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', s=100, edgecolors='black')\n",
 87    "\n",
 88    "# ๊ฒฐ์ • ๊ฒฝ๊ณ„์™€ ๋งˆ์ง„\n",
 89    "ax = plt.gca()\n",
 90    "xlim = ax.get_xlim()\n",
 91    "ylim = ax.get_ylim()\n",
 92    "\n",
 93    "# ๊ทธ๋ฆฌ๋“œ ์ƒ์„ฑ\n",
 94    "xx = np.linspace(xlim[0], xlim[1], 30)\n",
 95    "yy = np.linspace(ylim[0], ylim[1], 30)\n",
 96    "YY, XX = np.meshgrid(yy, xx)\n",
 97    "xy = np.vstack([XX.ravel(), YY.ravel()]).T\n",
 98    "Z = clf.decision_function(xy).reshape(XX.shape)\n",
 99    "\n",
100    "# ๊ฒฐ์ • ๊ฒฝ๊ณ„์™€ ๋งˆ์ง„ ๊ทธ๋ฆฌ๊ธฐ\n",
101    "ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1],\n",
102    "           linestyles=['--', '-', '--'], linewidths=[1, 2, 1])\n",
103    "\n",
104    "# ์„œํฌํŠธ ๋ฒกํ„ฐ ํ‘œ์‹œ\n",
105    "ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],\n",
106    "           s=200, linewidth=2, facecolors='none', edgecolors='green',\n",
107    "           label='Support Vectors')\n",
108    "\n",
109    "plt.xlabel('Feature 1')\n",
110    "plt.ylabel('Feature 2')\n",
111    "plt.title('Linear SVM: Maximum Margin Classifier')\n",
112    "plt.legend()\n",
113    "plt.show()"
114   ]
115  },
116  {
117   "cell_type": "markdown",
118   "id": "cell-5",
119   "metadata": {},
120   "source": [
121    "## 2. ์†Œํ”„ํŠธ ๋งˆ์ง„ - C ํŒŒ๋ผ๋ฏธํ„ฐ\n",
122    "\n",
123    "์‹ค์ œ ๋ฐ์ดํ„ฐ๋Š” ์™„๋ฒฝํ•˜๊ฒŒ ์„ ํ˜• ๋ถ„๋ฆฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.\n",
124    "C ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์˜ค๋ถ„๋ฅ˜์™€ ๋งˆ์ง„ ํฌ๊ธฐ์˜ ๊ท ํ˜•์„ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค.\n",
125    "\n",
126    "- **C ํผ**: ์˜ค๋ถ„๋ฅ˜ ํŽ˜๋„ํ‹ฐ ํผ โ†’ ์ข์€ ๋งˆ์ง„, ๊ณผ์ ํ•ฉ ์œ„ํ—˜\n",
127    "- **C ์ž‘์Œ**: ์˜ค๋ถ„๋ฅ˜ ํ—ˆ์šฉ โ†’ ๋„“์€ ๋งˆ์ง„, ์ผ๋ฐ˜ํ™” ํ–ฅ์ƒ"
128   ]
129  },
130  {
131   "cell_type": "code",
132   "execution_count": null,
133   "id": "cell-6",
134   "metadata": {},
135   "outputs": [],
136   "source": [
137    "# ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ\n",
138    "X, y = make_classification(\n",
139    "    n_samples=200, n_features=2, n_redundant=0,\n",
140    "    n_informative=2, n_clusters_per_class=1,\n",
141    "    flip_y=0.1,  # 10% ๋…ธ์ด์ฆˆ\n",
142    "    random_state=42\n",
143    ")\n",
144    "\n",
145    "# ์—ฌ๋Ÿฌ C ๊ฐ’ ๋น„๊ต\n",
146    "fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
147    "C_values = [0.1, 1, 100]\n",
148    "\n",
149    "for ax, C in zip(axes, C_values):\n",
150    "    clf = svm.SVC(kernel='linear', C=C)\n",
151    "    clf.fit(X, y)\n",
152    "\n",
153    "    # ๊ฒฐ์ • ๊ฒฝ๊ณ„\n",
154    "    xlim = [X[:, 0].min() - 0.5, X[:, 0].max() + 0.5]\n",
155    "    ylim = [X[:, 1].min() - 0.5, X[:, 1].max() + 0.5]\n",
156    "    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100),\n",
157    "                         np.linspace(ylim[0], ylim[1], 100))\n",
158    "\n",
159    "    Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])\n",
160    "    Z = Z.reshape(xx.shape)\n",
161    "\n",
162    "    ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')\n",
163    "    ax.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1],\n",
164    "               linestyles=['--', '-', '--'])\n",
165    "    ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='black')\n",
166    "    ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],\n",
167    "               s=150, facecolors='none', edgecolors='green', linewidths=2)\n",
168    "    ax.set_title(f'C = {C}\\nSupport Vectors: {len(clf.support_vectors_)}')\n",
169    "\n",
170    "plt.tight_layout()\n",
171    "plt.show()"
172   ]
173  },
174  {
175   "cell_type": "markdown",
176   "id": "cell-7",
177   "metadata": {},
178   "source": [
179    "## 3. ์ปค๋„ ํŠธ๋ฆญ - ๋น„์„ ํ˜• ๋ถ„๋ฅ˜\n",
180    "\n",
181    "์ปค๋„ ํ•จ์ˆ˜๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์— ๋งคํ•‘ํ•˜์—ฌ ๋น„์„ ํ˜• ํŒจํ„ด์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.\n",
182    "\n",
183    "์ฃผ์š” ์ปค๋„:\n",
184    "- **linear**: K(x, y) = xยทy\n",
185    "- **polynomial**: K(x, y) = (ฮณยทxยทy + r)^d\n",
186    "- **rbf** (Gaussian): K(x, y) = exp(-ฮณ||x - y||ยฒ)\n",
187    "- **sigmoid**: K(x, y) = tanh(ฮณยทxยทy + r)"
188   ]
189  },
190  {
191   "cell_type": "code",
192   "execution_count": null,
193   "id": "cell-8",
194   "metadata": {},
195   "outputs": [],
196   "source": [
197    "# ๋น„์„ ํ˜• ๋ฐ์ดํ„ฐ ์ƒ์„ฑ\n",
198    "X_moons, y_moons = make_moons(n_samples=200, noise=0.1, random_state=42)\n",
199    "X_circles, y_circles = make_circles(n_samples=200, noise=0.1, factor=0.5, random_state=42)\n",
200    "\n",
201    "# ์ปค๋„ ๋น„๊ต\n",
202    "kernels = ['linear', 'poly', 'rbf']\n",
203    "\n",
204    "fig, axes = plt.subplots(2, 3, figsize=(15, 10))\n",
205    "\n",
206    "for row, (X_data, y_data, name) in enumerate([(X_moons, y_moons, 'Moons'),\n",
207    "                                                (X_circles, y_circles, 'Circles')]):\n",
208    "    for col, kernel in enumerate(kernels):\n",
209    "        ax = axes[row, col]\n",
210    "\n",
211    "        # SVM ํ•™์Šต\n",
212    "        if kernel == 'poly':\n",
213    "            clf = svm.SVC(kernel=kernel, degree=3, gamma='scale')\n",
214    "        else:\n",
215    "            clf = svm.SVC(kernel=kernel, gamma='scale')\n",
216    "        clf.fit(X_data, y_data)\n",
217    "\n",
218    "        # ๊ฒฐ์ • ๊ฒฝ๊ณ„\n",
219    "        xlim = [X_data[:, 0].min() - 0.5, X_data[:, 0].max() + 0.5]\n",
220    "        ylim = [X_data[:, 1].min() - 0.5, X_data[:, 1].max() + 0.5]\n",
221    "        xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100),\n",
222    "                             np.linspace(ylim[0], ylim[1], 100))\n",
223    "\n",
224    "        Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])\n",
225    "        Z = Z.reshape(xx.shape)\n",
226    "\n",
227    "        ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')\n",
228    "        ax.scatter(X_data[:, 0], X_data[:, 1], c=y_data, cmap='coolwarm', edgecolors='black')\n",
229    "        ax.set_title(f'{name} - {kernel}\\nAccuracy: {clf.score(X_data, y_data):.3f}')\n",
230    "\n",
231    "plt.tight_layout()\n",
232    "plt.show()"
233   ]
234  },
235  {
236   "cell_type": "markdown",
237   "id": "cell-9",
238   "metadata": {},
239   "source": [
240    "## 4. RBF ์ปค๋„๊ณผ gamma ํŒŒ๋ผ๋ฏธํ„ฐ\n",
241    "\n",
242    "RBF ์ปค๋„์—์„œ gamma๋Š” ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์˜ํ–ฅ ๋ฒ”์œ„๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.\n",
243    "\n",
244    "- **gamma ํผ**: ์˜ํ–ฅ ๋ฒ”์œ„ ์ข์Œ โ†’ ๋ณต์žกํ•œ ๊ฒฝ๊ณ„, ๊ณผ์ ํ•ฉ ์œ„ํ—˜\n",
245    "- **gamma ์ž‘์Œ**: ์˜ํ–ฅ ๋ฒ”์œ„ ๋„“์Œ โ†’ ๋‹จ์ˆœํ•œ ๊ฒฝ๊ณ„, ๊ณผ์†Œ์ ํ•ฉ ์œ„ํ—˜"
246   ]
247  },
248  {
249   "cell_type": "code",
250   "execution_count": null,
251   "id": "cell-10",
252   "metadata": {},
253   "outputs": [],
254   "source": [
255    "# gamma ํšจ๊ณผ ์‹œ๊ฐํ™”\n",
256    "fig, axes = plt.subplots(1, 4, figsize=(20, 5))\n",
257    "gamma_values = [0.1, 1, 10, 100]\n",
258    "\n",
259    "X, y = make_moons(n_samples=200, noise=0.1, random_state=42)\n",
260    "\n",
261    "for ax, gamma in zip(axes, gamma_values):\n",
262    "    clf = svm.SVC(kernel='rbf', gamma=gamma, C=1)\n",
263    "    clf.fit(X, y)\n",
264    "\n",
265    "    xlim = [X[:, 0].min() - 0.5, X[:, 0].max() + 0.5]\n",
266    "    ylim = [X[:, 1].min() - 0.5, X[:, 1].max() + 0.5]\n",
267    "    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100),\n",
268    "                         np.linspace(ylim[0], ylim[1], 100))\n",
269    "\n",
270    "    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])\n",
271    "    Z = Z.reshape(xx.shape)\n",
272    "\n",
273    "    ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')\n",
274    "    ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='black')\n",
275    "    ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],\n",
276    "               s=100, facecolors='none', edgecolors='green', linewidths=2)\n",
277    "    ax.set_title(f'gamma = {gamma}\\nSVs: {len(clf.support_vectors_)}')\n",
278    "\n",
279    "plt.tight_layout()\n",
280    "plt.show()"
281   ]
282  },
283  {
284   "cell_type": "markdown",
285   "id": "cell-11",
286   "metadata": {},
287   "source": [
288    "## 5. SVC - ์‹ค์ œ ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜\n",
289    "\n",
290    "Iris ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.\n",
291    "**์ค‘์š”**: SVM์€ ํŠน์„ฑ ์Šค์ผ€์ผ์— ๋ฏผ๊ฐํ•˜๋ฏ€๋กœ ์Šค์ผ€์ผ๋ง์ด ํ•„์ˆ˜์ž…๋‹ˆ๋‹ค."
292   ]
293  },
294  {
295   "cell_type": "code",
296   "execution_count": null,
297   "id": "cell-12",
298   "metadata": {},
299   "outputs": [],
300   "source": [
301    "# ๋ฐ์ดํ„ฐ ๋กœ๋“œ\n",
302    "iris = load_iris()\n",
303    "X, y = iris.data, iris.target\n",
304    "\n",
305    "# ๋ฐ์ดํ„ฐ ๋ถ„ํ• \n",
306    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
307    "\n",
308    "# ์Šค์ผ€์ผ๋ง (SVM์€ ์Šค์ผ€์ผ์— ๋ฏผ๊ฐ)\n",
309    "scaler = StandardScaler()\n",
310    "X_train_scaled = scaler.fit_transform(X_train)\n",
311    "X_test_scaled = scaler.transform(X_test)\n",
312    "\n",
313    "# SVM ํ•™์Šต\n",
314    "svm_clf = SVC(\n",
315    "    C=1.0,\n",
316    "    kernel='rbf',\n",
317    "    gamma='scale',\n",
318    "    probability=True,  # ํ™•๋ฅ  ์˜ˆ์ธก ํ™œ์„ฑํ™”\n",
319    "    random_state=42\n",
320    ")\n",
321    "svm_clf.fit(X_train_scaled, y_train)\n",
322    "\n",
323    "# ์˜ˆ์ธก\n",
324    "y_pred = svm_clf.predict(X_test_scaled)\n",
325    "\n",
326    "print(\"SVM ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ:\")\n",
327    "print(f\"  ์ •ํ™•๋„: {accuracy_score(y_test, y_pred):.4f}\")\n",
328    "print(f\"  ์„œํฌํŠธ ๋ฒกํ„ฐ ์ˆ˜: {len(svm_clf.support_vectors_)}\")\n",
329    "print(\"\\n๋ถ„๋ฅ˜ ๋ฆฌํฌํŠธ:\")\n",
330    "print(classification_report(y_test, y_pred, target_names=iris.target_names))"
331   ]
332  },
333  {
334   "cell_type": "code",
335   "execution_count": null,
336   "id": "cell-13",
337   "metadata": {},
338   "outputs": [],
339   "source": [
340    "# ํ™•๋ฅ  ์˜ˆ์ธก\n",
341    "y_proba = svm_clf.predict_proba(X_test_scaled[:5])\n",
342    "\n",
343    "print(\"ํ™•๋ฅ  ์˜ˆ์ธก (์ฒ˜์Œ 5๊ฐœ):\")\n",
344    "print(f\"ํด๋ž˜์Šค: {iris.target_names}\")\n",
345    "print(y_proba)\n",
346    "print(f\"\\n์˜ˆ์ธก ํด๋ž˜์Šค: {y_pred[:5]}\")\n",
347    "print(f\"์‹ค์ œ ํด๋ž˜์Šค: {y_test[:5]}\")"
348   ]
349  },
350  {
351   "cell_type": "markdown",
352   "id": "cell-14",
353   "metadata": {},
354   "source": [
355    "## 6. ์Šค์ผ€์ผ๋ง์˜ ์ค‘์š”์„ฑ\n",
356    "\n",
357    "SVM์€ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ฏ€๋กœ ํŠน์„ฑ ์Šค์ผ€์ผ์ด ๋‹ค๋ฅด๋ฉด ์„ฑ๋Šฅ์ด ์ €ํ•˜๋ฉ๋‹ˆ๋‹ค."
358   ]
359  },
360  {
361   "cell_type": "code",
362   "execution_count": null,
363   "id": "cell-15",
364   "metadata": {},
365   "outputs": [],
366   "source": [
367    "# ์Šค์ผ€์ผ๋ง ํšจ๊ณผ ๋น„๊ต\n",
368    "cancer = load_breast_cancer()\n",
369    "X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(\n",
370    "    cancer.data, cancer.target, test_size=0.2, random_state=42\n",
371    ")\n",
372    "\n",
373    "# ์Šค์ผ€์ผ๋ง ์—†์ด\n",
374    "svm_no_scale = SVC(kernel='rbf', C=1, gamma='scale')\n",
375    "svm_no_scale.fit(X_train_c, y_train_c)\n",
376    "acc_no_scale = svm_no_scale.score(X_test_c, y_test_c)\n",
377    "\n",
378    "# ์Šค์ผ€์ผ๋ง ํ›„\n",
379    "scaler = StandardScaler()\n",
380    "X_train_c_scaled = scaler.fit_transform(X_train_c)\n",
381    "X_test_c_scaled = scaler.transform(X_test_c)\n",
382    "\n",
383    "svm_scaled = SVC(kernel='rbf', C=1, gamma='scale')\n",
384    "svm_scaled.fit(X_train_c_scaled, y_train_c)\n",
385    "acc_scaled = svm_scaled.score(X_test_c_scaled, y_test_c)\n",
386    "\n",
387    "print(\"์Šค์ผ€์ผ๋ง ํšจ๊ณผ:\")\n",
388    "print(f\"  ์Šค์ผ€์ผ๋ง ์—†์ด: {acc_no_scale:.4f}\")\n",
389    "print(f\"  ์Šค์ผ€์ผ๋ง ํ›„:   {acc_scaled:.4f}\")\n",
390    "print(f\"  ์„ฑ๋Šฅ ํ–ฅ์ƒ:     {(acc_scaled - acc_no_scale) * 100:.2f}%\")"
391   ]
392  },
393  {
394   "cell_type": "markdown",
395   "id": "cell-16",
396   "metadata": {},
397   "source": [
398    "## 7. ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ - Grid Search\n",
399    "\n",
400    "C์™€ gamma๋ฅผ ๋™์‹œ์— ํŠœ๋‹ํ•˜์—ฌ ์ตœ์  ์กฐํ•ฉ์„ ์ฐพ์Šต๋‹ˆ๋‹ค."
401   ]
402  },
403  {
404   "cell_type": "code",
405   "execution_count": null,
406   "id": "cell-17",
407   "metadata": {},
408   "outputs": [],
409   "source": [
410    "# ํŒŒ๋ผ๋ฏธํ„ฐ ๊ทธ๋ฆฌ๋“œ\n",
411    "param_grid = {\n",
412    "    'C': [0.1, 1, 10, 100],\n",
413    "    'gamma': ['scale', 'auto', 0.01, 0.1, 1],\n",
414    "    'kernel': ['rbf', 'poly']\n",
415    "}\n",
416    "\n",
417    "# Grid Search\n",
418    "grid_search = GridSearchCV(\n",
419    "    SVC(random_state=42),\n",
420    "    param_grid,\n",
421    "    cv=5,\n",
422    "    scoring='accuracy',\n",
423    "    n_jobs=-1,\n",
424    "    verbose=1\n",
425    ")\n",
426    "\n",
427    "grid_search.fit(X_train_scaled, y_train)\n",
428    "\n",
429    "print(\"\\nGrid Search ๊ฒฐ๊ณผ:\")\n",
430    "print(f\"  ์ตœ์  ํŒŒ๋ผ๋ฏธํ„ฐ: {grid_search.best_params_}\")\n",
431    "print(f\"  ์ตœ์  CV ์ ์ˆ˜: {grid_search.best_score_:.4f}\")\n",
432    "print(f\"  ํ…Œ์ŠคํŠธ ์ ์ˆ˜: {grid_search.score(X_test_scaled, y_test):.4f}\")"
433   ]
434  },
435  {
436   "cell_type": "code",
437   "execution_count": null,
438   "id": "cell-18",
439   "metadata": {},
440   "outputs": [],
441   "source": [
442    "# C์™€ gamma ๋™์‹œ ํŠœ๋‹ ์‹œ๊ฐํ™” (RBF ์ปค๋„๋งŒ)\n",
443    "C_range = np.logspace(-2, 2, 5)\n",
444    "gamma_range = np.logspace(-3, 1, 5)\n",
445    "\n",
446    "# ์ ์ˆ˜ ๊ณ„์‚ฐ\n",
447    "scores = np.zeros((len(C_range), len(gamma_range)))\n",
448    "\n",
449    "for i, C in enumerate(C_range):\n",
450    "    for j, gamma in enumerate(gamma_range):\n",
451    "        svm_clf = SVC(C=C, gamma=gamma, kernel='rbf')\n",
452    "        svm_clf.fit(X_train_c_scaled, y_train_c)\n",
453    "        scores[i, j] = svm_clf.score(X_test_c_scaled, y_test_c)\n",
454    "\n",
455    "# ํžˆํŠธ๋งต ์‹œ๊ฐํ™”\n",
456    "plt.figure(figsize=(10, 8))\n",
457    "plt.imshow(scores, interpolation='nearest', cmap='viridis')\n",
458    "plt.xlabel('gamma')\n",
459    "plt.ylabel('C')\n",
460    "plt.colorbar(label='Accuracy')\n",
461    "plt.xticks(np.arange(len(gamma_range)), [f'{g:.3f}' for g in gamma_range])\n",
462    "plt.yticks(np.arange(len(C_range)), [f'{c:.2f}' for c in C_range])\n",
463    "plt.title('SVM Hyperparameter Tuning (RBF Kernel)')\n",
464    "\n",
465    "# ์ตœ์ ์  ํ‘œ์‹œ\n",
466    "best_i, best_j = np.unravel_index(scores.argmax(), scores.shape)\n",
467    "plt.scatter(best_j, best_i, marker='*', s=300, c='red', edgecolors='white')\n",
468    "\n",
469    "plt.tight_layout()\n",
470    "plt.show()\n",
471    "\n",
472    "print(f\"์ตœ์  C: {C_range[best_i]:.2f}\")\n",
473    "print(f\"์ตœ์  gamma: {gamma_range[best_j]:.3f}\")\n",
474    "print(f\"์ตœ๊ณ  ์ •ํ™•๋„: {scores.max():.4f}\")"
475   ]
476  },
477  {
478   "cell_type": "markdown",
479   "id": "cell-19",
480   "metadata": {},
481   "source": [
482    "## 8. SVR - Support Vector Regression\n",
483    "\n",
484    "SVM์„ ํšŒ๊ท€ ๋ฌธ์ œ์— ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.\n",
485    "epsilon-tube ๋‚ด์˜ ์˜ค์ฐจ๋Š” ๋ฌด์‹œํ•˜๊ณ , ํŠœ๋ธŒ ๋ฐ–์˜ ์˜ค์ฐจ๋งŒ ํŽ˜๋„ํ‹ฐ๋ฅผ ์ค๋‹ˆ๋‹ค."
486   ]
487  },
488  {
489   "cell_type": "code",
490   "execution_count": null,
491   "id": "cell-20",
492   "metadata": {},
493   "outputs": [],
494   "source": [
495    "# ๋ฐ์ดํ„ฐ ๋กœ๋“œ\n",
496    "diabetes = load_diabetes()\n",
497    "X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(\n",
498    "    diabetes.data, diabetes.target, test_size=0.2, random_state=42\n",
499    ")\n",
500    "\n",
501    "# ์Šค์ผ€์ผ๋ง\n",
502    "scaler = StandardScaler()\n",
503    "X_train_d_scaled = scaler.fit_transform(X_train_d)\n",
504    "X_test_d_scaled = scaler.transform(X_test_d)\n",
505    "\n",
506    "# SVR ํ•™์Šต\n",
507    "svr = SVR(\n",
508    "    kernel='rbf',\n",
509    "    C=100,\n",
510    "    epsilon=0.1,  # ํŠœ๋ธŒ ํญ: ์ด ์•ˆ์˜ ์˜ค์ฐจ๋Š” ๋ฌด์‹œ\n",
511    "    gamma='scale'\n",
512    ")\n",
513    "svr.fit(X_train_d_scaled, y_train_d)\n",
514    "\n",
515    "# ์˜ˆ์ธก\n",
516    "y_pred_d = svr.predict(X_test_d_scaled)\n",
517    "\n",
518    "print(\"SVR ํšŒ๊ท€ ๊ฒฐ๊ณผ:\")\n",
519    "print(f\"  MSE: {mean_squared_error(y_test_d, y_pred_d):.4f}\")\n",
520    "print(f\"  RMSE: {np.sqrt(mean_squared_error(y_test_d, y_pred_d)):.4f}\")\n",
521    "print(f\"  Rยฒ: {r2_score(y_test_d, y_pred_d):.4f}\")\n",
522    "print(f\"  ์„œํฌํŠธ ๋ฒกํ„ฐ ์ˆ˜: {len(svr.support_vectors_)}\")"
523   ]
524  },
525  {
526   "cell_type": "code",
527   "execution_count": null,
528   "id": "cell-21",
529   "metadata": {},
530   "outputs": [],
531   "source": [
532    "# ์‹œ๊ฐํ™”\n",
533    "plt.figure(figsize=(8, 6))\n",
534    "plt.scatter(y_test_d, y_pred_d, alpha=0.7, edgecolors='black')\n",
535    "plt.plot([y_test_d.min(), y_test_d.max()], [y_test_d.min(), y_test_d.max()], 'r--', lw=2)\n",
536    "plt.xlabel('Actual')\n",
537    "plt.ylabel('Predicted')\n",
538    "plt.title(f'SVR Regression (Rยฒ = {r2_score(y_test_d, y_pred_d):.4f})')\n",
539    "plt.grid(True, alpha=0.3)\n",
540    "plt.tight_layout()\n",
541    "plt.show()"
542   ]
543  },
544  {
545   "cell_type": "markdown",
546   "id": "cell-22",
547   "metadata": {},
548   "source": [
549    "## 9. ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ์ „๋žต\n",
550    "\n",
551    "SVM์€ ์ด์ง„ ๋ถ„๋ฅ˜๊ธฐ์ด๋ฏ€๋กœ ๋‹ค์ค‘ ํด๋ž˜์Šค๋Š” ๋‹ค์Œ ์ „๋žต์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.\n",
552    "\n",
553    "- **OvO (One-vs-One)**: k(k-1)/2 ๊ฐœ ๋ถ„๋ฅ˜๊ธฐ, SVC ๊ธฐ๋ณธ๊ฐ’\n",
554    "- **OvR (One-vs-Rest)**: k ๊ฐœ ๋ถ„๋ฅ˜๊ธฐ, LinearSVC ๊ธฐ๋ณธ๊ฐ’"
555   ]
556  },
557  {
558   "cell_type": "code",
559   "execution_count": null,
560   "id": "cell-23",
561   "metadata": {},
562   "outputs": [],
563   "source": [
564    "# OvO (๊ธฐ๋ณธ)\n",
565    "svm_ovo = SVC(kernel='rbf', decision_function_shape='ovo')\n",
566    "svm_ovo.fit(X_train_scaled, y_train)\n",
567    "print(f\"OvO ์ •ํ™•๋„: {svm_ovo.score(X_test_scaled, y_test):.4f}\")\n",
568    "\n",
569    "# OvR\n",
570    "svm_ovr = SVC(kernel='rbf', decision_function_shape='ovr')\n",
571    "svm_ovr.fit(X_train_scaled, y_train)\n",
572    "print(f\"OvR ์ •ํ™•๋„: {svm_ovr.score(X_test_scaled, y_test):.4f}\")\n",
573    "\n",
574    "# LinearSVC (OvR ๊ธฐ๋ณธ)\n",
575    "linear_svc = LinearSVC(dual=True, max_iter=10000)\n",
576    "linear_svc.fit(X_train_scaled, y_train)\n",
577    "print(f\"LinearSVC ์ •ํ™•๋„: {linear_svc.score(X_test_scaled, y_test):.4f}\")"
578   ]
579  },
580  {
581   "cell_type": "markdown",
582   "id": "cell-24",
583   "metadata": {},
584   "source": [
585    "## 10. ์ปค๋„ ๋น„๊ต - ์‹ค์ „\n",
586    "\n",
587    "์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ๋กœ ์—ฌ๋Ÿฌ ์ปค๋„์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค."
588   ]
589  },
590  {
591   "cell_type": "code",
592   "execution_count": null,
593   "id": "cell-25",
594   "metadata": {},
595   "outputs": [],
596   "source": [
597    "kernels = ['linear', 'poly', 'rbf', 'sigmoid']\n",
598    "\n",
599    "print(\"์ปค๋„๋ณ„ ์„ฑ๋Šฅ ๋น„๊ต (Breast Cancer):\")\n",
600    "print(\"-\" * 50)\n",
601    "\n",
602    "for kernel in kernels:\n",
603    "    if kernel == 'poly':\n",
604    "        svm_model = SVC(kernel=kernel, degree=3, gamma='scale')\n",
605    "    else:\n",
606    "        svm_model = SVC(kernel=kernel, gamma='scale')\n",
607    "\n",
608    "    svm_model.fit(X_train_c_scaled, y_train_c)\n",
609    "    acc = svm_model.score(X_test_c_scaled, y_test_c)\n",
610    "    print(f\"  {kernel:8s}: {acc:.4f} (SVs: {len(svm_model.support_vectors_)})\")"
611   ]
612  },
613  {
614   "cell_type": "markdown",
615   "id": "cell-26",
616   "metadata": {},
617   "source": [
618    "## ์ •๋ฆฌ\n",
619    "\n",
620    "### ํ•ต์‹ฌ ๊ฐœ๋…\n",
621    "\n",
622    "| ๊ฐœ๋… | ์„ค๋ช… |\n",
623    "|------|------|\n",
624    "| **์„œํฌํŠธ ๋ฒกํ„ฐ** | ๋งˆ์ง„ ๊ฒฝ๊ณ„์— ์œ„์น˜ํ•œ ํ•ต์‹ฌ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ |\n",
625    "| **๋งˆ์ง„** | ๊ฒฐ์ • ๊ฒฝ๊ณ„์™€ ์„œํฌํŠธ ๋ฒกํ„ฐ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ |\n",
626    "| **C** | ๊ทœ์ œ ํŒŒ๋ผ๋ฏธํ„ฐ (ํผ: ์ข์€ ๋งˆ์ง„, ์ž‘์Œ: ๋„“์€ ๋งˆ์ง„) |\n",
627    "| **gamma** | RBF ์ปค๋„ ๋ฒ”์œ„ (ํผ: ์ข์€ ์˜ํ–ฅ, ์ž‘์Œ: ๋„“์€ ์˜ํ–ฅ) |\n",
628    "| **์ปค๋„** | ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ฐจ์›์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” ํ•จ์ˆ˜ |\n",
629    "\n",
630    "### SVM ์‚ฌ์šฉ ์ฒดํฌ๋ฆฌ์ŠคํŠธ\n",
631    "\n",
632    "1. โœ… **์Šค์ผ€์ผ๋ง ํ•„์ˆ˜**: StandardScaler ๋˜๋Š” MinMaxScaler ์ ์šฉ\n",
633    "2. โœ… **์ปค๋„ ์„ ํƒ**: ์„ ํ˜• ๋ถ„๋ฆฌ ๊ฐ€๋Šฅ โ†’ linear, ๋น„์„ ํ˜• โ†’ rbf\n",
634    "3. โœ… **ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹**: C์™€ gamma๋ฅผ GridSearchCV๋กœ ํŠœ๋‹\n",
635    "4. โœ… **๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ**: LinearSVC ๋˜๋Š” SGDClassifier ์‚ฌ์šฉ\n",
636    "5. โœ… **ํ™•๋ฅ  ํ•„์š”์‹œ**: probability=True ์„ค์ • (์ถ”๊ฐ€ ๋น„์šฉ ๋ฐœ์ƒ)\n",
637    "\n",
638    "### ์žฅ๋‹จ์ \n",
639    "\n",
640    "**์žฅ์ **:\n",
641    "- ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์— ํšจ๊ณผ์ \n",
642    "- ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์  (์„œํฌํŠธ ๋ฒกํ„ฐ๋งŒ ์ €์žฅ)\n",
643    "- ๋‹ค์–‘ํ•œ ์ปค๋„๋กœ ๋น„์„ ํ˜• ๋ฌธ์ œ ํ•ด๊ฒฐ\n",
644    "\n",
645    "**๋‹จ์ **:\n",
646    "- ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์— ๋А๋ฆผ (O(nยฒ) ~ O(nยณ))\n",
647    "- ์Šค์ผ€์ผ๋ง ํ•„์ˆ˜\n",
648    "- ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ํ•„์š”\n",
649    "\n",
650    "### ๋‹ค์Œ ๋‹จ๊ณ„\n",
651    "- k-Nearest Neighbors (kNN)\n",
652    "- Naive Bayes\n",
653    "- Ensemble methods"
654   ]
655  }
656 ],
657 "metadata": {
658  "kernelspec": {
659   "display_name": "Python 3",
660   "language": "python",
661   "name": "python3"
662  },
663  "language_info": {
664   "name": "python",
665   "version": "3.9.0"
666  }
667 },
668 "nbformat": 4,
669 "nbformat_minor": 5
670}