1{
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "id": "cell-0",
6 "metadata": {},
7 "source": [
8 "# 09. ์ํฌํธ ๋ฒกํฐ ๋จธ์ (Support Vector Machine)\n",
9 "\n",
10 "## ํ์ต ๋ชฉํ\n",
11 "- SVM์ ๋ง์ง ์ต๋ํ ์๋ฆฌ ์ดํด\n",
12 "- ์ํฌํธ ๋ฒกํฐ์ ์ญํ ํ์ต\n",
13 "- ์ปค๋ ํธ๋ฆญ์ผ๋ก ๋น์ ํ ๋ฌธ์ ํด๊ฒฐ\n",
14 "- ํ์ดํผํ๋ผ๋ฏธํฐ C์ gamma ํ๋\n",
15 "- SVR๋ก ํ๊ท ๋ฌธ์ ํด๊ฒฐ"
16 ]
17 },
18 {
19 "cell_type": "code",
20 "execution_count": null,
21 "id": "cell-1",
22 "metadata": {},
23 "outputs": [],
24 "source": [
25 "# ๋ผ์ด๋ธ๋ฌ๋ฆฌ ์ํฌํธ\n",
26 "import numpy as np\n",
27 "import pandas as pd\n",
28 "import matplotlib.pyplot as plt\n",
29 "from sklearn import svm\n",
30 "from sklearn.svm import SVC, SVR, LinearSVC\n",
31 "from sklearn.datasets import (\n",
32 " make_blobs, make_classification, make_moons, make_circles,\n",
33 " load_iris, load_breast_cancer, load_diabetes\n",
34 ")\n",
35 "from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score\n",
36 "from sklearn.preprocessing import StandardScaler\n",
37 "from sklearn.metrics import accuracy_score, classification_report, mean_squared_error, r2_score\n",
38 "\n",
39 "# ํ๊ธ ํฐํธ ์ค์ \n",
40 "plt.rcParams['font.family'] = 'DejaVu Sans'\n",
41 "plt.rcParams['axes.unicode_minus'] = False\n",
42 "np.random.seed(42)"
43 ]
44 },
45 {
46 "cell_type": "markdown",
47 "id": "cell-2",
48 "metadata": {},
49 "source": [
50 "## 1. ์ ํ SVM - ๋ง์ง ์ต๋ํ\n",
51 "\n",
52 "SVM์ ํต์ฌ์ ๋ ํด๋์ค๋ฅผ ๋ถ๋ฆฌํ๋ ์ต์ ์ ์ดํ๋ฉด(hyperplane)์ ์ฐพ๋ ๊ฒ์
๋๋ค.\n",
53 "๋ง์ง(margin)์ ์ต๋ํํ์ฌ ์ผ๋ฐํ ์ฑ๋ฅ์ ๋์
๋๋ค."
54 ]
55 },
56 {
57 "cell_type": "code",
58 "execution_count": null,
59 "id": "cell-3",
60 "metadata": {},
61 "outputs": [],
62 "source": [
63 "# ์ ํ ๋ถ๋ฆฌ ๊ฐ๋ฅํ ๋ฐ์ดํฐ ์์ฑ\n",
64 "X, y = make_blobs(n_samples=100, centers=2, random_state=6)\n",
65 "\n",
66 "# ์ ํ SVM ํ์ต\n",
67 "clf = svm.SVC(kernel='linear', C=1000)\n",
68 "clf.fit(X, y)\n",
69 "\n",
70 "print(f\"์ํฌํธ ๋ฒกํฐ ์: {len(clf.support_vectors_)}\")\n",
71 "print(f\"๊ฐ์ค์น (w): {clf.coef_}\")\n",
72 "print(f\"์ ํธ (b): {clf.intercept_}\")"
73 ]
74 },
75 {
76 "cell_type": "code",
77 "execution_count": null,
78 "id": "cell-4",
79 "metadata": {},
80 "outputs": [],
81 "source": [
82 "# ๊ฒฐ์ ๊ฒฝ๊ณ์ ๋ง์ง ์๊ฐํ\n",
83 "plt.figure(figsize=(10, 8))\n",
84 "\n",
85 "# ๋ฐ์ดํฐ ํฌ์ธํธ\n",
86 "plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', s=100, edgecolors='black')\n",
87 "\n",
88 "# ๊ฒฐ์ ๊ฒฝ๊ณ์ ๋ง์ง\n",
89 "ax = plt.gca()\n",
90 "xlim = ax.get_xlim()\n",
91 "ylim = ax.get_ylim()\n",
92 "\n",
93 "# ๊ทธ๋ฆฌ๋ ์์ฑ\n",
94 "xx = np.linspace(xlim[0], xlim[1], 30)\n",
95 "yy = np.linspace(ylim[0], ylim[1], 30)\n",
96 "YY, XX = np.meshgrid(yy, xx)\n",
97 "xy = np.vstack([XX.ravel(), YY.ravel()]).T\n",
98 "Z = clf.decision_function(xy).reshape(XX.shape)\n",
99 "\n",
100 "# ๊ฒฐ์ ๊ฒฝ๊ณ์ ๋ง์ง ๊ทธ๋ฆฌ๊ธฐ\n",
101 "ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1],\n",
102 " linestyles=['--', '-', '--'], linewidths=[1, 2, 1])\n",
103 "\n",
104 "# ์ํฌํธ ๋ฒกํฐ ํ์\n",
105 "ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],\n",
106 " s=200, linewidth=2, facecolors='none', edgecolors='green',\n",
107 " label='Support Vectors')\n",
108 "\n",
109 "plt.xlabel('Feature 1')\n",
110 "plt.ylabel('Feature 2')\n",
111 "plt.title('Linear SVM: Maximum Margin Classifier')\n",
112 "plt.legend()\n",
113 "plt.show()"
114 ]
115 },
116 {
117 "cell_type": "markdown",
118 "id": "cell-5",
119 "metadata": {},
120 "source": [
121 "## 2. ์ํํธ ๋ง์ง - C ํ๋ผ๋ฏธํฐ\n",
122 "\n",
123 "์ค์ ๋ฐ์ดํฐ๋ ์๋ฒฝํ๊ฒ ์ ํ ๋ถ๋ฆฌ๊ฐ ๋ถ๊ฐ๋ฅํฉ๋๋ค.\n",
124 "C ํ๋ผ๋ฏธํฐ๋ก ์ค๋ถ๋ฅ์ ๋ง์ง ํฌ๊ธฐ์ ๊ท ํ์ ์กฐ์ ํฉ๋๋ค.\n",
125 "\n",
126 "- **C ํผ**: ์ค๋ถ๋ฅ ํ๋ํฐ ํผ โ ์ข์ ๋ง์ง, ๊ณผ์ ํฉ ์ํ\n",
127 "- **C ์์**: ์ค๋ถ๋ฅ ํ์ฉ โ ๋์ ๋ง์ง, ์ผ๋ฐํ ํฅ์"
128 ]
129 },
130 {
131 "cell_type": "code",
132 "execution_count": null,
133 "id": "cell-6",
134 "metadata": {},
135 "outputs": [],
136 "source": [
137 "# ๋
ธ์ด์ฆ๊ฐ ์๋ ๋ฐ์ดํฐ\n",
138 "X, y = make_classification(\n",
139 " n_samples=200, n_features=2, n_redundant=0,\n",
140 " n_informative=2, n_clusters_per_class=1,\n",
141 " flip_y=0.1, # 10% ๋
ธ์ด์ฆ\n",
142 " random_state=42\n",
143 ")\n",
144 "\n",
145 "# ์ฌ๋ฌ C ๊ฐ ๋น๊ต\n",
146 "fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
147 "C_values = [0.1, 1, 100]\n",
148 "\n",
149 "for ax, C in zip(axes, C_values):\n",
150 " clf = svm.SVC(kernel='linear', C=C)\n",
151 " clf.fit(X, y)\n",
152 "\n",
153 " # ๊ฒฐ์ ๊ฒฝ๊ณ\n",
154 " xlim = [X[:, 0].min() - 0.5, X[:, 0].max() + 0.5]\n",
155 " ylim = [X[:, 1].min() - 0.5, X[:, 1].max() + 0.5]\n",
156 " xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100),\n",
157 " np.linspace(ylim[0], ylim[1], 100))\n",
158 "\n",
159 " Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])\n",
160 " Z = Z.reshape(xx.shape)\n",
161 "\n",
162 " ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')\n",
163 " ax.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1],\n",
164 " linestyles=['--', '-', '--'])\n",
165 " ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='black')\n",
166 " ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],\n",
167 " s=150, facecolors='none', edgecolors='green', linewidths=2)\n",
168 " ax.set_title(f'C = {C}\\nSupport Vectors: {len(clf.support_vectors_)}')\n",
169 "\n",
170 "plt.tight_layout()\n",
171 "plt.show()"
172 ]
173 },
174 {
175 "cell_type": "markdown",
176 "id": "cell-7",
177 "metadata": {},
178 "source": [
179 "## 3. ์ปค๋ ํธ๋ฆญ - ๋น์ ํ ๋ถ๋ฅ\n",
180 "\n",
181 "์ปค๋ ํจ์๋ก ๋ฐ์ดํฐ๋ฅผ ๊ณ ์ฐจ์ ๊ณต๊ฐ์ ๋งคํํ์ฌ ๋น์ ํ ํจํด์ ์ฒ๋ฆฌํฉ๋๋ค.\n",
182 "\n",
183 "์ฃผ์ ์ปค๋:\n",
184 "- **linear**: K(x, y) = xยทy\n",
185 "- **polynomial**: K(x, y) = (ฮณยทxยทy + r)^d\n",
186 "- **rbf** (Gaussian): K(x, y) = exp(-ฮณ||x - y||ยฒ)\n",
187 "- **sigmoid**: K(x, y) = tanh(ฮณยทxยทy + r)"
188 ]
189 },
190 {
191 "cell_type": "code",
192 "execution_count": null,
193 "id": "cell-8",
194 "metadata": {},
195 "outputs": [],
196 "source": [
197 "# ๋น์ ํ ๋ฐ์ดํฐ ์์ฑ\n",
198 "X_moons, y_moons = make_moons(n_samples=200, noise=0.1, random_state=42)\n",
199 "X_circles, y_circles = make_circles(n_samples=200, noise=0.1, factor=0.5, random_state=42)\n",
200 "\n",
201 "# ์ปค๋ ๋น๊ต\n",
202 "kernels = ['linear', 'poly', 'rbf']\n",
203 "\n",
204 "fig, axes = plt.subplots(2, 3, figsize=(15, 10))\n",
205 "\n",
206 "for row, (X_data, y_data, name) in enumerate([(X_moons, y_moons, 'Moons'),\n",
207 " (X_circles, y_circles, 'Circles')]):\n",
208 " for col, kernel in enumerate(kernels):\n",
209 " ax = axes[row, col]\n",
210 "\n",
211 " # SVM ํ์ต\n",
212 " if kernel == 'poly':\n",
213 " clf = svm.SVC(kernel=kernel, degree=3, gamma='scale')\n",
214 " else:\n",
215 " clf = svm.SVC(kernel=kernel, gamma='scale')\n",
216 " clf.fit(X_data, y_data)\n",
217 "\n",
218 " # ๊ฒฐ์ ๊ฒฝ๊ณ\n",
219 " xlim = [X_data[:, 0].min() - 0.5, X_data[:, 0].max() + 0.5]\n",
220 " ylim = [X_data[:, 1].min() - 0.5, X_data[:, 1].max() + 0.5]\n",
221 " xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100),\n",
222 " np.linspace(ylim[0], ylim[1], 100))\n",
223 "\n",
224 " Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])\n",
225 " Z = Z.reshape(xx.shape)\n",
226 "\n",
227 " ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')\n",
228 " ax.scatter(X_data[:, 0], X_data[:, 1], c=y_data, cmap='coolwarm', edgecolors='black')\n",
229 " ax.set_title(f'{name} - {kernel}\\nAccuracy: {clf.score(X_data, y_data):.3f}')\n",
230 "\n",
231 "plt.tight_layout()\n",
232 "plt.show()"
233 ]
234 },
235 {
236 "cell_type": "markdown",
237 "id": "cell-9",
238 "metadata": {},
239 "source": [
240 "## 4. RBF ์ปค๋๊ณผ gamma ํ๋ผ๋ฏธํฐ\n",
241 "\n",
242 "RBF ์ปค๋์์ gamma๋ ๊ฐ ๋ฐ์ดํฐ ํฌ์ธํธ์ ์ํฅ ๋ฒ์๋ฅผ ๊ฒฐ์ ํฉ๋๋ค.\n",
243 "\n",
244 "- **gamma ํผ**: ์ํฅ ๋ฒ์ ์ข์ โ ๋ณต์กํ ๊ฒฝ๊ณ, ๊ณผ์ ํฉ ์ํ\n",
245 "- **gamma ์์**: ์ํฅ ๋ฒ์ ๋์ โ ๋จ์ํ ๊ฒฝ๊ณ, ๊ณผ์์ ํฉ ์ํ"
246 ]
247 },
248 {
249 "cell_type": "code",
250 "execution_count": null,
251 "id": "cell-10",
252 "metadata": {},
253 "outputs": [],
254 "source": [
255 "# gamma ํจ๊ณผ ์๊ฐํ\n",
256 "fig, axes = plt.subplots(1, 4, figsize=(20, 5))\n",
257 "gamma_values = [0.1, 1, 10, 100]\n",
258 "\n",
259 "X, y = make_moons(n_samples=200, noise=0.1, random_state=42)\n",
260 "\n",
261 "for ax, gamma in zip(axes, gamma_values):\n",
262 " clf = svm.SVC(kernel='rbf', gamma=gamma, C=1)\n",
263 " clf.fit(X, y)\n",
264 "\n",
265 " xlim = [X[:, 0].min() - 0.5, X[:, 0].max() + 0.5]\n",
266 " ylim = [X[:, 1].min() - 0.5, X[:, 1].max() + 0.5]\n",
267 " xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100),\n",
268 " np.linspace(ylim[0], ylim[1], 100))\n",
269 "\n",
270 " Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])\n",
271 " Z = Z.reshape(xx.shape)\n",
272 "\n",
273 " ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')\n",
274 " ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='black')\n",
275 " ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],\n",
276 " s=100, facecolors='none', edgecolors='green', linewidths=2)\n",
277 " ax.set_title(f'gamma = {gamma}\\nSVs: {len(clf.support_vectors_)}')\n",
278 "\n",
279 "plt.tight_layout()\n",
280 "plt.show()"
281 ]
282 },
283 {
284 "cell_type": "markdown",
285 "id": "cell-11",
286 "metadata": {},
287 "source": [
288 "## 5. SVC - ์ค์ ๋ฐ์ดํฐ ๋ถ๋ฅ\n",
289 "\n",
290 "Iris ๋ฐ์ดํฐ์
์ผ๋ก ๋ค์ค ํด๋์ค ๋ถ๋ฅ๋ฅผ ์ํํฉ๋๋ค.\n",
291 "**์ค์**: SVM์ ํน์ฑ ์ค์ผ์ผ์ ๋ฏผ๊ฐํ๋ฏ๋ก ์ค์ผ์ผ๋ง์ด ํ์์
๋๋ค."
292 ]
293 },
294 {
295 "cell_type": "code",
296 "execution_count": null,
297 "id": "cell-12",
298 "metadata": {},
299 "outputs": [],
300 "source": [
301 "# ๋ฐ์ดํฐ ๋ก๋\n",
302 "iris = load_iris()\n",
303 "X, y = iris.data, iris.target\n",
304 "\n",
305 "# ๋ฐ์ดํฐ ๋ถํ \n",
306 "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
307 "\n",
308 "# ์ค์ผ์ผ๋ง (SVM์ ์ค์ผ์ผ์ ๋ฏผ๊ฐ)\n",
309 "scaler = StandardScaler()\n",
310 "X_train_scaled = scaler.fit_transform(X_train)\n",
311 "X_test_scaled = scaler.transform(X_test)\n",
312 "\n",
313 "# SVM ํ์ต\n",
314 "svm_clf = SVC(\n",
315 " C=1.0,\n",
316 " kernel='rbf',\n",
317 " gamma='scale',\n",
318 " probability=True, # ํ๋ฅ ์์ธก ํ์ฑํ\n",
319 " random_state=42\n",
320 ")\n",
321 "svm_clf.fit(X_train_scaled, y_train)\n",
322 "\n",
323 "# ์์ธก\n",
324 "y_pred = svm_clf.predict(X_test_scaled)\n",
325 "\n",
326 "print(\"SVM ๋ถ๋ฅ ๊ฒฐ๊ณผ:\")\n",
327 "print(f\" ์ ํ๋: {accuracy_score(y_test, y_pred):.4f}\")\n",
328 "print(f\" ์ํฌํธ ๋ฒกํฐ ์: {len(svm_clf.support_vectors_)}\")\n",
329 "print(\"\\n๋ถ๋ฅ ๋ฆฌํฌํธ:\")\n",
330 "print(classification_report(y_test, y_pred, target_names=iris.target_names))"
331 ]
332 },
333 {
334 "cell_type": "code",
335 "execution_count": null,
336 "id": "cell-13",
337 "metadata": {},
338 "outputs": [],
339 "source": [
340 "# ํ๋ฅ ์์ธก\n",
341 "y_proba = svm_clf.predict_proba(X_test_scaled[:5])\n",
342 "\n",
343 "print(\"ํ๋ฅ ์์ธก (์ฒ์ 5๊ฐ):\")\n",
344 "print(f\"ํด๋์ค: {iris.target_names}\")\n",
345 "print(y_proba)\n",
346 "print(f\"\\n์์ธก ํด๋์ค: {y_pred[:5]}\")\n",
347 "print(f\"์ค์ ํด๋์ค: {y_test[:5]}\")"
348 ]
349 },
350 {
351 "cell_type": "markdown",
352 "id": "cell-14",
353 "metadata": {},
354 "source": [
355 "## 6. ์ค์ผ์ผ๋ง์ ์ค์์ฑ\n",
356 "\n",
357 "SVM์ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ ์๊ณ ๋ฆฌ์ฆ์ด๋ฏ๋ก ํน์ฑ ์ค์ผ์ผ์ด ๋ค๋ฅด๋ฉด ์ฑ๋ฅ์ด ์ ํ๋ฉ๋๋ค."
358 ]
359 },
360 {
361 "cell_type": "code",
362 "execution_count": null,
363 "id": "cell-15",
364 "metadata": {},
365 "outputs": [],
366 "source": [
367 "# ์ค์ผ์ผ๋ง ํจ๊ณผ ๋น๊ต\n",
368 "cancer = load_breast_cancer()\n",
369 "X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(\n",
370 " cancer.data, cancer.target, test_size=0.2, random_state=42\n",
371 ")\n",
372 "\n",
373 "# ์ค์ผ์ผ๋ง ์์ด\n",
374 "svm_no_scale = SVC(kernel='rbf', C=1, gamma='scale')\n",
375 "svm_no_scale.fit(X_train_c, y_train_c)\n",
376 "acc_no_scale = svm_no_scale.score(X_test_c, y_test_c)\n",
377 "\n",
378 "# ์ค์ผ์ผ๋ง ํ\n",
379 "scaler = StandardScaler()\n",
380 "X_train_c_scaled = scaler.fit_transform(X_train_c)\n",
381 "X_test_c_scaled = scaler.transform(X_test_c)\n",
382 "\n",
383 "svm_scaled = SVC(kernel='rbf', C=1, gamma='scale')\n",
384 "svm_scaled.fit(X_train_c_scaled, y_train_c)\n",
385 "acc_scaled = svm_scaled.score(X_test_c_scaled, y_test_c)\n",
386 "\n",
387 "print(\"์ค์ผ์ผ๋ง ํจ๊ณผ:\")\n",
388 "print(f\" ์ค์ผ์ผ๋ง ์์ด: {acc_no_scale:.4f}\")\n",
389 "print(f\" ์ค์ผ์ผ๋ง ํ: {acc_scaled:.4f}\")\n",
390 "print(f\" ์ฑ๋ฅ ํฅ์: {(acc_scaled - acc_no_scale) * 100:.2f}%\")"
391 ]
392 },
393 {
394 "cell_type": "markdown",
395 "id": "cell-16",
396 "metadata": {},
397 "source": [
398 "## 7. ํ์ดํผํ๋ผ๋ฏธํฐ ํ๋ - Grid Search\n",
399 "\n",
400 "C์ gamma๋ฅผ ๋์์ ํ๋ํ์ฌ ์ต์ ์กฐํฉ์ ์ฐพ์ต๋๋ค."
401 ]
402 },
403 {
404 "cell_type": "code",
405 "execution_count": null,
406 "id": "cell-17",
407 "metadata": {},
408 "outputs": [],
409 "source": [
410 "# ํ๋ผ๋ฏธํฐ ๊ทธ๋ฆฌ๋\n",
411 "param_grid = {\n",
412 " 'C': [0.1, 1, 10, 100],\n",
413 " 'gamma': ['scale', 'auto', 0.01, 0.1, 1],\n",
414 " 'kernel': ['rbf', 'poly']\n",
415 "}\n",
416 "\n",
417 "# Grid Search\n",
418 "grid_search = GridSearchCV(\n",
419 " SVC(random_state=42),\n",
420 " param_grid,\n",
421 " cv=5,\n",
422 " scoring='accuracy',\n",
423 " n_jobs=-1,\n",
424 " verbose=1\n",
425 ")\n",
426 "\n",
427 "grid_search.fit(X_train_scaled, y_train)\n",
428 "\n",
429 "print(\"\\nGrid Search ๊ฒฐ๊ณผ:\")\n",
430 "print(f\" ์ต์ ํ๋ผ๋ฏธํฐ: {grid_search.best_params_}\")\n",
431 "print(f\" ์ต์ CV ์ ์: {grid_search.best_score_:.4f}\")\n",
432 "print(f\" ํ
์คํธ ์ ์: {grid_search.score(X_test_scaled, y_test):.4f}\")"
433 ]
434 },
435 {
436 "cell_type": "code",
437 "execution_count": null,
438 "id": "cell-18",
439 "metadata": {},
440 "outputs": [],
441 "source": [
442 "# C์ gamma ๋์ ํ๋ ์๊ฐํ (RBF ์ปค๋๋ง)\n",
443 "C_range = np.logspace(-2, 2, 5)\n",
444 "gamma_range = np.logspace(-3, 1, 5)\n",
445 "\n",
446 "# ์ ์ ๊ณ์ฐ\n",
447 "scores = np.zeros((len(C_range), len(gamma_range)))\n",
448 "\n",
449 "for i, C in enumerate(C_range):\n",
450 " for j, gamma in enumerate(gamma_range):\n",
451 " svm_clf = SVC(C=C, gamma=gamma, kernel='rbf')\n",
452 " svm_clf.fit(X_train_c_scaled, y_train_c)\n",
453 " scores[i, j] = svm_clf.score(X_test_c_scaled, y_test_c)\n",
454 "\n",
455 "# ํํธ๋งต ์๊ฐํ\n",
456 "plt.figure(figsize=(10, 8))\n",
457 "plt.imshow(scores, interpolation='nearest', cmap='viridis')\n",
458 "plt.xlabel('gamma')\n",
459 "plt.ylabel('C')\n",
460 "plt.colorbar(label='Accuracy')\n",
461 "plt.xticks(np.arange(len(gamma_range)), [f'{g:.3f}' for g in gamma_range])\n",
462 "plt.yticks(np.arange(len(C_range)), [f'{c:.2f}' for c in C_range])\n",
463 "plt.title('SVM Hyperparameter Tuning (RBF Kernel)')\n",
464 "\n",
465 "# ์ต์ ์ ํ์\n",
466 "best_i, best_j = np.unravel_index(scores.argmax(), scores.shape)\n",
467 "plt.scatter(best_j, best_i, marker='*', s=300, c='red', edgecolors='white')\n",
468 "\n",
469 "plt.tight_layout()\n",
470 "plt.show()\n",
471 "\n",
472 "print(f\"์ต์ C: {C_range[best_i]:.2f}\")\n",
473 "print(f\"์ต์ gamma: {gamma_range[best_j]:.3f}\")\n",
474 "print(f\"์ต๊ณ ์ ํ๋: {scores.max():.4f}\")"
475 ]
476 },
477 {
478 "cell_type": "markdown",
479 "id": "cell-19",
480 "metadata": {},
481 "source": [
482 "## 8. SVR - Support Vector Regression\n",
483 "\n",
484 "SVM์ ํ๊ท ๋ฌธ์ ์ ์ ์ฉํฉ๋๋ค.\n",
485 "epsilon-tube ๋ด์ ์ค์ฐจ๋ ๋ฌด์ํ๊ณ , ํ๋ธ ๋ฐ์ ์ค์ฐจ๋ง ํ๋ํฐ๋ฅผ ์ค๋๋ค."
486 ]
487 },
488 {
489 "cell_type": "code",
490 "execution_count": null,
491 "id": "cell-20",
492 "metadata": {},
493 "outputs": [],
494 "source": [
495 "# ๋ฐ์ดํฐ ๋ก๋\n",
496 "diabetes = load_diabetes()\n",
497 "X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(\n",
498 " diabetes.data, diabetes.target, test_size=0.2, random_state=42\n",
499 ")\n",
500 "\n",
501 "# ์ค์ผ์ผ๋ง\n",
502 "scaler = StandardScaler()\n",
503 "X_train_d_scaled = scaler.fit_transform(X_train_d)\n",
504 "X_test_d_scaled = scaler.transform(X_test_d)\n",
505 "\n",
506 "# SVR ํ์ต\n",
507 "svr = SVR(\n",
508 " kernel='rbf',\n",
509 " C=100,\n",
510 " epsilon=0.1, # ํ๋ธ ํญ: ์ด ์์ ์ค์ฐจ๋ ๋ฌด์\n",
511 " gamma='scale'\n",
512 ")\n",
513 "svr.fit(X_train_d_scaled, y_train_d)\n",
514 "\n",
515 "# ์์ธก\n",
516 "y_pred_d = svr.predict(X_test_d_scaled)\n",
517 "\n",
518 "print(\"SVR ํ๊ท ๊ฒฐ๊ณผ:\")\n",
519 "print(f\" MSE: {mean_squared_error(y_test_d, y_pred_d):.4f}\")\n",
520 "print(f\" RMSE: {np.sqrt(mean_squared_error(y_test_d, y_pred_d)):.4f}\")\n",
521 "print(f\" Rยฒ: {r2_score(y_test_d, y_pred_d):.4f}\")\n",
522 "print(f\" ์ํฌํธ ๋ฒกํฐ ์: {len(svr.support_vectors_)}\")"
523 ]
524 },
525 {
526 "cell_type": "code",
527 "execution_count": null,
528 "id": "cell-21",
529 "metadata": {},
530 "outputs": [],
531 "source": [
532 "# ์๊ฐํ\n",
533 "plt.figure(figsize=(8, 6))\n",
534 "plt.scatter(y_test_d, y_pred_d, alpha=0.7, edgecolors='black')\n",
535 "plt.plot([y_test_d.min(), y_test_d.max()], [y_test_d.min(), y_test_d.max()], 'r--', lw=2)\n",
536 "plt.xlabel('Actual')\n",
537 "plt.ylabel('Predicted')\n",
538 "plt.title(f'SVR Regression (Rยฒ = {r2_score(y_test_d, y_pred_d):.4f})')\n",
539 "plt.grid(True, alpha=0.3)\n",
540 "plt.tight_layout()\n",
541 "plt.show()"
542 ]
543 },
544 {
545 "cell_type": "markdown",
546 "id": "cell-22",
547 "metadata": {},
548 "source": [
549 "## 9. ๋ค์ค ํด๋์ค ๋ถ๋ฅ ์ ๋ต\n",
550 "\n",
551 "SVM์ ์ด์ง ๋ถ๋ฅ๊ธฐ์ด๋ฏ๋ก ๋ค์ค ํด๋์ค๋ ๋ค์ ์ ๋ต์ผ๋ก ์ฒ๋ฆฌํฉ๋๋ค.\n",
552 "\n",
553 "- **OvO (One-vs-One)**: k(k-1)/2 ๊ฐ ๋ถ๋ฅ๊ธฐ, SVC ๊ธฐ๋ณธ๊ฐ\n",
554 "- **OvR (One-vs-Rest)**: k ๊ฐ ๋ถ๋ฅ๊ธฐ, LinearSVC ๊ธฐ๋ณธ๊ฐ"
555 ]
556 },
557 {
558 "cell_type": "code",
559 "execution_count": null,
560 "id": "cell-23",
561 "metadata": {},
562 "outputs": [],
563 "source": [
564 "# OvO (๊ธฐ๋ณธ)\n",
565 "svm_ovo = SVC(kernel='rbf', decision_function_shape='ovo')\n",
566 "svm_ovo.fit(X_train_scaled, y_train)\n",
567 "print(f\"OvO ์ ํ๋: {svm_ovo.score(X_test_scaled, y_test):.4f}\")\n",
568 "\n",
569 "# OvR\n",
570 "svm_ovr = SVC(kernel='rbf', decision_function_shape='ovr')\n",
571 "svm_ovr.fit(X_train_scaled, y_train)\n",
572 "print(f\"OvR ์ ํ๋: {svm_ovr.score(X_test_scaled, y_test):.4f}\")\n",
573 "\n",
574 "# LinearSVC (OvR ๊ธฐ๋ณธ)\n",
575 "linear_svc = LinearSVC(dual=True, max_iter=10000)\n",
576 "linear_svc.fit(X_train_scaled, y_train)\n",
577 "print(f\"LinearSVC ์ ํ๋: {linear_svc.score(X_test_scaled, y_test):.4f}\")"
578 ]
579 },
580 {
581 "cell_type": "markdown",
582 "id": "cell-24",
583 "metadata": {},
584 "source": [
585 "## 10. ์ปค๋ ๋น๊ต - ์ค์ \n",
586 "\n",
587 "์ ๋ฐฉ์ ๋ฐ์ดํฐ๋ก ์ฌ๋ฌ ์ปค๋์ ์ฑ๋ฅ์ ๋น๊ตํฉ๋๋ค."
588 ]
589 },
590 {
591 "cell_type": "code",
592 "execution_count": null,
593 "id": "cell-25",
594 "metadata": {},
595 "outputs": [],
596 "source": [
597 "kernels = ['linear', 'poly', 'rbf', 'sigmoid']\n",
598 "\n",
599 "print(\"์ปค๋๋ณ ์ฑ๋ฅ ๋น๊ต (Breast Cancer):\")\n",
600 "print(\"-\" * 50)\n",
601 "\n",
602 "for kernel in kernels:\n",
603 " if kernel == 'poly':\n",
604 " svm_model = SVC(kernel=kernel, degree=3, gamma='scale')\n",
605 " else:\n",
606 " svm_model = SVC(kernel=kernel, gamma='scale')\n",
607 "\n",
608 " svm_model.fit(X_train_c_scaled, y_train_c)\n",
609 " acc = svm_model.score(X_test_c_scaled, y_test_c)\n",
610 " print(f\" {kernel:8s}: {acc:.4f} (SVs: {len(svm_model.support_vectors_)})\")"
611 ]
612 },
613 {
614 "cell_type": "markdown",
615 "id": "cell-26",
616 "metadata": {},
617 "source": [
618 "## ์ ๋ฆฌ\n",
619 "\n",
620 "### ํต์ฌ ๊ฐ๋
\n",
621 "\n",
622 "| ๊ฐ๋
| ์ค๋ช
|\n",
623 "|------|------|\n",
624 "| **์ํฌํธ ๋ฒกํฐ** | ๋ง์ง ๊ฒฝ๊ณ์ ์์นํ ํต์ฌ ๋ฐ์ดํฐ ํฌ์ธํธ |\n",
625 "| **๋ง์ง** | ๊ฒฐ์ ๊ฒฝ๊ณ์ ์ํฌํธ ๋ฒกํฐ ์ฌ์ด์ ๊ฑฐ๋ฆฌ |\n",
626 "| **C** | ๊ท์ ํ๋ผ๋ฏธํฐ (ํผ: ์ข์ ๋ง์ง, ์์: ๋์ ๋ง์ง) |\n",
627 "| **gamma** | RBF ์ปค๋ ๋ฒ์ (ํผ: ์ข์ ์ํฅ, ์์: ๋์ ์ํฅ) |\n",
628 "| **์ปค๋** | ๋ฐ์ดํฐ๋ฅผ ๊ณ ์ฐจ์์ผ๋ก ๋งคํํ๋ ํจ์ |\n",
629 "\n",
630 "### SVM ์ฌ์ฉ ์ฒดํฌ๋ฆฌ์คํธ\n",
631 "\n",
632 "1. โ
**์ค์ผ์ผ๋ง ํ์**: StandardScaler ๋๋ MinMaxScaler ์ ์ฉ\n",
633 "2. โ
**์ปค๋ ์ ํ**: ์ ํ ๋ถ๋ฆฌ ๊ฐ๋ฅ โ linear, ๋น์ ํ โ rbf\n",
634 "3. โ
**ํ๋ผ๋ฏธํฐ ํ๋**: C์ gamma๋ฅผ GridSearchCV๋ก ํ๋\n",
635 "4. โ
**๋์ฉ๋ ๋ฐ์ดํฐ**: LinearSVC ๋๋ SGDClassifier ์ฌ์ฉ\n",
636 "5. โ
**ํ๋ฅ ํ์์**: probability=True ์ค์ (์ถ๊ฐ ๋น์ฉ ๋ฐ์)\n",
637 "\n",
638 "### ์ฅ๋จ์ \n",
639 "\n",
640 "**์ฅ์ **:\n",
641 "- ๊ณ ์ฐจ์ ๋ฐ์ดํฐ์ ํจ๊ณผ์ \n",
642 "- ๋ฉ๋ชจ๋ฆฌ ํจ์จ์ (์ํฌํธ ๋ฒกํฐ๋ง ์ ์ฅ)\n",
643 "- ๋ค์ํ ์ปค๋๋ก ๋น์ ํ ๋ฌธ์ ํด๊ฒฐ\n",
644 "\n",
645 "**๋จ์ **:\n",
646 "- ๋์ฉ๋ ๋ฐ์ดํฐ์ ๋๋ฆผ (O(nยฒ) ~ O(nยณ))\n",
647 "- ์ค์ผ์ผ๋ง ํ์\n",
648 "- ํ๋ผ๋ฏธํฐ ํ๋ ํ์\n",
649 "\n",
650 "### ๋ค์ ๋จ๊ณ\n",
651 "- k-Nearest Neighbors (kNN)\n",
652 "- Naive Bayes\n",
653 "- Ensemble methods"
654 ]
655 }
656 ],
657 "metadata": {
658 "kernelspec": {
659 "display_name": "Python 3",
660 "language": "python",
661 "name": "python3"
662 },
663 "language_info": {
664 "name": "python",
665 "version": "3.9.0"
666 }
667 },
668 "nbformat": 4,
669 "nbformat_minor": 5
670}