1{
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {},
6 "source": [
7 "# ์ค์ ํ๋ก์ ํธ: ํ์ดํ๋ ์์กด ์์ธก (Kaggle ์คํ์ผ)\n",
8 "\n",
9 "์ค์ ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ์ฌ ๋จธ์ ๋ฌ๋ ํ๋ก์ ํธ๋ฅผ ์ฒ์๋ถํฐ ๋๊น์ง ์ํํฉ๋๋ค. Kaggle ๊ฒฝ์ง๋ํ ๋ฐฉ์์ผ๋ก ์ ๊ทผํ์ฌ ์ค๋ฌด ๋
ธํ์ฐ๋ฅผ ์ตํ๋๋ค.\n",
10 "\n",
11 "**ํ์ต ๋ชฉํ:**\n",
12 "- ์์ ํ ML ์ํฌํ๋ก์ฐ ๊ฒฝํ\n",
13 "- ํ์์ ๋ฐ์ดํฐ ๋ถ์ (EDA) ์ํ\n",
14 "- ํน์ฑ ์์ง๋์ด๋ง ๊ธฐ๋ฒ ์ ์ฉ\n",
15 "- ์ฌ๋ฌ ๋ชจ๋ธ ๋น๊ต ๋ฐ ์ ํ\n",
16 "- ํ์ดํผํ๋ผ๋ฏธํฐ ํ๋\n",
17 "- Kaggle ๊ฒฝ์ง๋ํ ์ ๋ต ์ดํด"
18 ]
19 },
20 {
21 "cell_type": "code",
22 "execution_count": null,
23 "metadata": {},
24 "outputs": [],
25 "source": [
26 "import numpy as np\n",
27 "import pandas as pd\n",
28 "import matplotlib.pyplot as plt\n",
29 "import seaborn as sns\n",
30 "\n",
31 "from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, KFold, StratifiedKFold\n",
32 "from sklearn.preprocessing import StandardScaler, LabelEncoder\n",
33 "from sklearn.impute import SimpleImputer\n",
34 "from sklearn.metrics import (\n",
35 " accuracy_score, classification_report, confusion_matrix,\n",
36 " roc_auc_score, roc_curve\n",
37 ")\n",
38 "\n",
39 "from sklearn.linear_model import LogisticRegression\n",
40 "from sklearn.tree import DecisionTreeClassifier\n",
41 "from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier\n",
42 "from sklearn.svm import SVC\n",
43 "\n",
44 "import warnings\n",
45 "warnings.filterwarnings('ignore')\n",
46 "\n",
47 "# ์๊ฐํ ์ค์ \n",
48 "plt.style.use('seaborn-v0_8-darkgrid')\n",
49 "sns.set_palette('husl')"
50 ]
51 },
52 {
53 "cell_type": "markdown",
54 "metadata": {},
55 "source": [
56 "## 1. ๋ฌธ์ ์ ์\n",
57 "\n",
58 "**๋ชฉํ**: ํ์ดํ๋ ์น๊ฐ์ ์์กด ์ฌ๋ถ๋ฅผ ์์ธกํ๋ ๋ถ๋ฅ ๋ชจ๋ธ ๊ฐ๋ฐ\n",
59 "\n",
60 "**ํ๊ฐ ์งํ**: Accuracy (์ ํ๋)\n",
61 "\n",
62 "**๋ฐ์ดํฐ**: ์น๊ฐ์ ๋์ด, ์ฑ๋ณ, ๊ฐ์ค ๋ฑ๊ธ, ์๊ธ ๋ฑ์ ์ ๋ณด"
63 ]
64 },
65 {
66 "cell_type": "markdown",
67 "metadata": {},
68 "source": [
69 "## 2. ๋ฐ์ดํฐ ๋ก๋ ๋ฐ ๊ธฐ๋ณธ ํ์"
70 ]
71 },
72 {
73 "cell_type": "code",
74 "execution_count": null,
75 "metadata": {},
76 "outputs": [],
77 "source": [
78 "# seaborn ๋ด์ฅ ํ์ดํ๋ ๋ฐ์ดํฐ์
์ฌ์ฉ\n",
79 "# Kaggle์์๋ train.csv, test.csv๋ฅผ ๋ค์ด๋ก๋ํ์ฌ ์ฌ์ฉ\n",
80 "df = sns.load_dataset('titanic')\n",
81 "\n",
82 "print(\"=== ๋ฐ์ดํฐ ๊ธฐ๋ณธ ์ ๋ณด ===\")\n",
83 "print(f\"๋ฐ์ดํฐ ํ์: {df.shape}\")\n",
84 "print(f\"\\n์ปฌ๋ผ ๋ชฉ๋ก:\")\n",
85 "print(df.columns.tolist())\n",
86 "print(f\"\\n๋ฐ์ดํฐ ํ์
:\")\n",
87 "print(df.dtypes)"
88 ]
89 },
90 {
91 "cell_type": "code",
92 "execution_count": null,
93 "metadata": {},
94 "outputs": [],
95 "source": [
96 "# ์ฒ์ ๋ช ํ ํ์ธ\n",
97 "print(\"์ฒ์ 5ํ:\")\n",
98 "df.head()"
99 ]
100 },
101 {
102 "cell_type": "code",
103 "execution_count": null,
104 "metadata": {},
105 "outputs": [],
106 "source": [
107 "# ๊ธฐ์ ํต๊ณ\n",
108 "print(\"๊ธฐ์ ํต๊ณ:\")\n",
109 "df.describe()"
110 ]
111 },
112 {
113 "cell_type": "code",
114 "execution_count": null,
115 "metadata": {},
116 "outputs": [],
117 "source": [
118 "# ํ๊ฒ ๋ณ์ ๋ถํฌ\n",
119 "print(\"=== ์์กด ์ฌ๋ถ ๋ถํฌ ===\")\n",
120 "print(df['survived'].value_counts())\n",
121 "print(f\"\\n์์กด ๋น์จ:\")\n",
122 "print(df['survived'].value_counts(normalize=True))\n",
123 "\n",
124 "# ์๊ฐํ\n",
125 "fig, ax = plt.subplots(1, 2, figsize=(12, 4))\n",
126 "\n",
127 "df['survived'].value_counts().plot(kind='bar', ax=ax[0])\n",
128 "ax[0].set_title('Survival Count')\n",
129 "ax[0].set_xlabel('Survived (0=No, 1=Yes)')\n",
130 "ax[0].set_ylabel('Count')\n",
131 "\n",
132 "df['survived'].value_counts(normalize=True).plot(kind='pie', autopct='%1.1f%%', ax=ax[1])\n",
133 "ax[1].set_title('Survival Proportion')\n",
134 "ax[1].set_ylabel('')\n",
135 "\n",
136 "plt.tight_layout()\n",
137 "plt.show()"
138 ]
139 },
140 {
141 "cell_type": "markdown",
142 "metadata": {},
143 "source": [
144 "## 3. ํ์์ ๋ฐ์ดํฐ ๋ถ์ (EDA)\n",
145 "\n",
146 "### 3.1 ๊ฒฐ์ธก์น ๋ถ์"
147 ]
148 },
149 {
150 "cell_type": "code",
151 "execution_count": null,
152 "metadata": {},
153 "outputs": [],
154 "source": [
155 "# ๊ฒฐ์ธก์น ํ์ธ\n",
156 "print(\"=== ๊ฒฐ์ธก์น ๋ถ์ ===\")\n",
157 "missing = df.isnull().sum()\n",
158 "missing_pct = (missing / len(df) * 100).round(2)\n",
159 "missing_df = pd.DataFrame({\n",
160 " '๊ฒฐ์ธก์น ์': missing,\n",
161 " '๊ฒฐ์ธก์น ๋น์จ(%)': missing_pct\n",
162 "})\n",
163 "print(missing_df[missing_df['๊ฒฐ์ธก์น ์'] > 0].sort_values(by='๊ฒฐ์ธก์น ์', ascending=False))\n",
164 "\n",
165 "# ์๊ฐํ\n",
166 "plt.figure(figsize=(10, 6))\n",
167 "missing_data = missing_df[missing_df['๊ฒฐ์ธก์น ์'] > 0].sort_values(by='๊ฒฐ์ธก์น ์', ascending=False)\n",
168 "plt.barh(missing_data.index, missing_data['๊ฒฐ์ธก์น ๋น์จ(%)'])\n",
169 "plt.xlabel('Missing Percentage (%)')\n",
170 "plt.title('Missing Values by Feature')\n",
171 "plt.tight_layout()\n",
172 "plt.show()"
173 ]
174 },
175 {
176 "cell_type": "markdown",
177 "metadata": {},
178 "source": [
179 "### 3.2 ๋ฒ์ฃผํ ๋ณ์์ ์์กด์ ๊ด๊ณ"
180 ]
181 },
182 {
183 "cell_type": "code",
184 "execution_count": null,
185 "metadata": {},
186 "outputs": [],
187 "source": [
188 "# ์ฃผ์ ๋ฒ์ฃผํ ๋ณ์์ ์์กด์ ๊ด๊ณ\n",
189 "fig, axes = plt.subplots(2, 3, figsize=(15, 10))\n",
190 "\n",
191 "# ์ฑ๋ณ\n",
192 "sns.countplot(data=df, x='sex', hue='survived', ax=axes[0, 0])\n",
193 "axes[0, 0].set_title('Survival by Sex')\n",
194 "\n",
195 "# ๊ฐ์ค ๋ฑ๊ธ\n",
196 "sns.countplot(data=df, x='pclass', hue='survived', ax=axes[0, 1])\n",
197 "axes[0, 1].set_title('Survival by Class')\n",
198 "\n",
199 "# ์น์ ํญ๊ตฌ\n",
200 "sns.countplot(data=df, x='embarked', hue='survived', ax=axes[0, 2])\n",
201 "axes[0, 2].set_title('Survival by Embarked')\n",
202 "\n",
203 "# ํ์ /๋ฐฐ์ฐ์ ์\n",
204 "sns.countplot(data=df, x='sibsp', hue='survived', ax=axes[1, 0])\n",
205 "axes[1, 0].set_title('Survival by SibSp')\n",
206 "\n",
207 "# ๋ถ๋ชจ/์๋
์\n",
208 "sns.countplot(data=df, x='parch', hue='survived', ax=axes[1, 1])\n",
209 "axes[1, 1].set_title('Survival by Parch')\n",
210 "\n",
211 "# ํผ์ ์ฌํ ์ฌ๋ถ\n",
212 "df['alone'] = ((df['sibsp'] + df['parch']) == 0).astype(int)\n",
213 "sns.countplot(data=df, x='alone', hue='survived', ax=axes[1, 2])\n",
214 "axes[1, 2].set_title('Survival by Alone')\n",
215 "\n",
216 "plt.tight_layout()\n",
217 "plt.show()"
218 ]
219 },
220 {
221 "cell_type": "code",
222 "execution_count": null,
223 "metadata": {},
224 "outputs": [],
225 "source": [
226 "# ์์กด์จ ํต๊ณ\n",
227 "print(\"=== ๋ฒ์ฃผ๋ณ ์์กด์จ ===\")\n",
228 "print(\"\\n์ฑ๋ณ:\")\n",
229 "print(df.groupby('sex')['survived'].mean())\n",
230 "print(\"\\n๊ฐ์ค ๋ฑ๊ธ:\")\n",
231 "print(df.groupby('pclass')['survived'].mean())\n",
232 "print(\"\\n์น์ ํญ๊ตฌ:\")\n",
233 "print(df.groupby('embarked')['survived'].mean())"
234 ]
235 },
236 {
237 "cell_type": "markdown",
238 "metadata": {},
239 "source": [
240 "### 3.3 ์์นํ ๋ณ์ ๋ถ์"
241 ]
242 },
243 {
244 "cell_type": "code",
245 "execution_count": null,
246 "metadata": {},
247 "outputs": [],
248 "source": [
249 "# ๋์ด์ ์๊ธ ๋ถํฌ (์์กด ์ฌ๋ถ๋ณ)\n",
250 "fig, axes = plt.subplots(2, 2, figsize=(14, 10))\n",
251 "\n",
252 "# ๋์ด ๋ถํฌ\n",
253 "for survived in [0, 1]:\n",
254 " axes[0, 0].hist(df[df['survived'] == survived]['age'].dropna(), \n",
255 " bins=30, alpha=0.5, label=f'Survived={survived}')\n",
256 "axes[0, 0].set_xlabel('Age')\n",
257 "axes[0, 0].set_ylabel('Count')\n",
258 "axes[0, 0].set_title('Age Distribution by Survival')\n",
259 "axes[0, 0].legend()\n",
260 "\n",
261 "# ๋์ด ๋ฐ์คํ๋กฏ\n",
262 "sns.boxplot(data=df, x='survived', y='age', ax=axes[0, 1])\n",
263 "axes[0, 1].set_title('Age by Survival')\n",
264 "\n",
265 "# ์๊ธ ๋ถํฌ (๋ก๊ทธ ์ค์ผ์ผ)\n",
266 "for survived in [0, 1]:\n",
267 " axes[1, 0].hist(np.log1p(df[df['survived'] == survived]['fare'].dropna()), \n",
268 " bins=30, alpha=0.5, label=f'Survived={survived}')\n",
269 "axes[1, 0].set_xlabel('Log(Fare + 1)')\n",
270 "axes[1, 0].set_ylabel('Count')\n",
271 "axes[1, 0].set_title('Fare Distribution by Survival (Log Scale)')\n",
272 "axes[1, 0].legend()\n",
273 "\n",
274 "# ์๊ธ ๋ฐ์คํ๋กฏ\n",
275 "sns.boxplot(data=df, x='survived', y='fare', ax=axes[1, 1])\n",
276 "axes[1, 1].set_title('Fare by Survival')\n",
277 "axes[1, 1].set_ylim(0, 300)\n",
278 "\n",
279 "plt.tight_layout()\n",
280 "plt.show()"
281 ]
282 },
283 {
284 "cell_type": "code",
285 "execution_count": null,
286 "metadata": {},
287 "outputs": [],
288 "source": [
289 "# ์๊ด๊ด๊ณ ๋ถ์\n",
290 "print(\"=== ์์นํ ๋ณ์ ์๊ด๊ด๊ณ ===\")\n",
291 "numeric_cols = df.select_dtypes(include=[np.number]).columns\n",
292 "correlation = df[numeric_cols].corr()\n",
293 "\n",
294 "plt.figure(figsize=(10, 8))\n",
295 "sns.heatmap(correlation, annot=True, fmt='.2f', cmap='coolwarm', center=0)\n",
296 "plt.title('Correlation Matrix')\n",
297 "plt.tight_layout()\n",
298 "plt.show()\n",
299 "\n",
300 "print(\"\\nํ๊ฒ(survived)๊ณผ์ ์๊ด๊ด๊ณ:\")\n",
301 "print(correlation['survived'].sort_values(ascending=False))"
302 ]
303 },
304 {
305 "cell_type": "markdown",
306 "metadata": {},
307 "source": [
308 "## 4. ๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ ๋ฐ ํน์ฑ ์์ง๋์ด๋ง"
309 ]
310 },
311 {
312 "cell_type": "code",
313 "execution_count": null,
314 "metadata": {},
315 "outputs": [],
316 "source": [
317 "# ์์
์ฉ ๋ฐ์ดํฐ ๋ณต์ฌ\n",
318 "df_clean = df.copy()\n",
319 "\n",
320 "print(\"=== ์ ์ฒ๋ฆฌ ์์ ===\")\n",
321 "print(f\"์ด๊ธฐ ๋ฐ์ดํฐ ํ์: {df_clean.shape}\")"
322 ]
323 },
324 {
325 "cell_type": "markdown",
326 "metadata": {},
327 "source": [
328 "### 4.1 ๋ถํ์ํ ์ปฌ๋ผ ์ ๊ฑฐ"
329 ]
330 },
331 {
332 "cell_type": "code",
333 "execution_count": null,
334 "metadata": {},
335 "outputs": [],
336 "source": [
337 "# ์ค๋ณต๋๊ฑฐ๋ ๋ถํ์ํ ์ปฌ๋ผ ์ ๊ฑฐ\n",
338 "drop_cols = ['deck', 'embark_town', 'alive', 'who', 'adult_male', 'class']\n",
339 "df_clean = df_clean.drop(columns=drop_cols, errors='ignore')\n",
340 "\n",
341 "print(f\"์ปฌ๋ผ ์ ๊ฑฐ ํ: {df_clean.shape}\")\n",
342 "print(f\"๋จ์ ์ปฌ๋ผ: {df_clean.columns.tolist()}\")"
343 ]
344 },
345 {
346 "cell_type": "markdown",
347 "metadata": {},
348 "source": [
349 "### 4.2 ๊ฒฐ์ธก์น ์ฒ๋ฆฌ"
350 ]
351 },
352 {
353 "cell_type": "code",
354 "execution_count": null,
355 "metadata": {},
356 "outputs": [],
357 "source": [
358 "# ๋์ด: ์ค๊ฐ๊ฐ์ผ๋ก ๋์ฒด\n",
359 "age_median = df_clean['age'].median()\n",
360 "df_clean['age'] = df_clean['age'].fillna(age_median)\n",
361 "print(f\"๋์ด ๊ฒฐ์ธก์น๋ฅผ ์ค๊ฐ๊ฐ({age_median})์ผ๋ก ๋์ฒด\")\n",
362 "\n",
363 "# ์น์ ํญ๊ตฌ: ์ต๋น๊ฐ์ผ๋ก ๋์ฒด\n",
364 "embarked_mode = df_clean['embarked'].mode()[0]\n",
365 "df_clean['embarked'] = df_clean['embarked'].fillna(embarked_mode)\n",
366 "print(f\"์น์ ํญ๊ตฌ ๊ฒฐ์ธก์น๋ฅผ ์ต๋น๊ฐ({embarked_mode})์ผ๋ก ๋์ฒด\")\n",
367 "\n",
368 "# ์๊ธ: ์ค๊ฐ๊ฐ์ผ๋ก ๋์ฒด\n",
369 "fare_median = df_clean['fare'].median()\n",
370 "df_clean['fare'] = df_clean['fare'].fillna(fare_median)\n",
371 "\n",
372 "print(f\"\\n๊ฒฐ์ธก์น ์ฒ๋ฆฌ ํ:\")\n",
373 "print(df_clean.isnull().sum()[df_clean.isnull().sum() > 0])"
374 ]
375 },
376 {
377 "cell_type": "markdown",
378 "metadata": {},
379 "source": [
380 "### 4.3 ํน์ฑ ์์ง๋์ด๋ง\n",
381 "\n",
382 "๋๋ฉ์ธ ์ง์์ ํ์ฉํ์ฌ ์๋ก์ด ํน์ฑ์ ์์ฑํฉ๋๋ค."
383 ]
384 },
385 {
386 "cell_type": "code",
387 "execution_count": null,
388 "metadata": {},
389 "outputs": [],
390 "source": [
391 "# 1. ๊ฐ์กฑ ํฌ๊ธฐ\n",
392 "df_clean['family_size'] = df_clean['sibsp'] + df_clean['parch'] + 1\n",
393 "print(\"๊ฐ์กฑ ํฌ๊ธฐ ํน์ฑ ์์ฑ: sibsp + parch + 1\")\n",
394 "\n",
395 "# 2. ํผ์ ์ฌํ ์ฌ๋ถ\n",
396 "df_clean['is_alone'] = (df_clean['family_size'] == 1).astype(int)\n",
397 "print(\"ํผ์ ์ฌํ ์ฌ๋ถ ํน์ฑ ์์ฑ\")\n",
398 "\n",
399 "# 3. ๋์ด ๊ทธ๋ฃน\n",
400 "df_clean['age_group'] = pd.cut(df_clean['age'],\n",
401 " bins=[0, 12, 18, 35, 60, 100],\n",
402 " labels=['Child', 'Teen', 'Young', 'Middle', 'Senior'])\n",
403 "print(\"๋์ด ๊ทธ๋ฃน ํน์ฑ ์์ฑ\")\n",
404 "\n",
405 "# 4. ์๊ธ ๊ตฌ๊ฐ\n",
406 "df_clean['fare_bin'] = pd.qcut(df_clean['fare'], q=4, labels=['Low', 'Medium', 'High', 'Very High'])\n",
407 "print(\"์๊ธ ๊ตฌ๊ฐ ํน์ฑ ์์ฑ\")\n",
408 "\n",
409 "# 5. ํธ์นญ ์ถ์ถ (์ ํ์ )\n",
410 "# df_clean['title'] = df_clean['name'].str.extract(' ([A-Za-z]+)\\.', expand=False)\n",
411 "\n",
412 "print(f\"\\nํน์ฑ ์์ง๋์ด๋ง ํ ํ์: {df_clean.shape}\")"
413 ]
414 },
415 {
416 "cell_type": "code",
417 "execution_count": null,
418 "metadata": {},
419 "outputs": [],
420 "source": [
421 "# ์๋ก์ด ํน์ฑ๊ณผ ์์กด์ ๊ด๊ณ ํ์ธ\n",
422 "fig, axes = plt.subplots(1, 3, figsize=(15, 4))\n",
423 "\n",
424 "sns.countplot(data=df_clean, x='family_size', hue='survived', ax=axes[0])\n",
425 "axes[0].set_title('Survival by Family Size')\n",
426 "\n",
427 "sns.countplot(data=df_clean, x='age_group', hue='survived', ax=axes[1])\n",
428 "axes[1].set_title('Survival by Age Group')\n",
429 "axes[1].tick_params(axis='x', rotation=45)\n",
430 "\n",
431 "sns.countplot(data=df_clean, x='fare_bin', hue='survived', ax=axes[2])\n",
432 "axes[2].set_title('Survival by Fare Bin')\n",
433 "axes[2].tick_params(axis='x', rotation=45)\n",
434 "\n",
435 "plt.tight_layout()\n",
436 "plt.show()"
437 ]
438 },
439 {
440 "cell_type": "markdown",
441 "metadata": {},
442 "source": [
443 "### 4.4 ๋ฒ์ฃผํ ๋ณ์ ์ธ์ฝ๋ฉ"
444 ]
445 },
446 {
447 "cell_type": "code",
448 "execution_count": null,
449 "metadata": {},
450 "outputs": [],
451 "source": [
452 "# LabelEncoder ์ฌ์ฉ\n",
453 "le = LabelEncoder()\n",
454 "\n",
455 "df_clean['sex'] = le.fit_transform(df_clean['sex'])\n",
456 "df_clean['embarked'] = le.fit_transform(df_clean['embarked'])\n",
457 "df_clean['age_group'] = le.fit_transform(df_clean['age_group'])\n",
458 "df_clean['fare_bin'] = le.fit_transform(df_clean['fare_bin'])\n",
459 "\n",
460 "print(\"๋ฒ์ฃผํ ๋ณ์ ์ธ์ฝ๋ฉ ์๋ฃ\")\n",
461 "print(f\"\\n์ธ์ฝ๋ฉ ํ ๋ฐ์ดํฐ ํ์
:\")\n",
462 "print(df_clean.dtypes)"
463 ]
464 },
465 {
466 "cell_type": "markdown",
467 "metadata": {},
468 "source": [
469 "### 4.5 ์ต์ข
ํน์ฑ ์ ํ"
470 ]
471 },
472 {
473 "cell_type": "code",
474 "execution_count": null,
475 "metadata": {},
476 "outputs": [],
477 "source": [
478 "# ๋ชจ๋ธ๋ง์ ์ฌ์ฉํ ํน์ฑ ์ ํ\n",
479 "features = ['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',\n",
480 " 'embarked', 'family_size', 'is_alone', 'age_group', 'fare_bin']\n",
481 "\n",
482 "X = df_clean[features]\n",
483 "y = df_clean['survived']\n",
484 "\n",
485 "print(f\"์ต์ข
ํน์ฑ: {features}\")\n",
486 "print(f\"X ํ์: {X.shape}\")\n",
487 "print(f\"y ๋ถํฌ: {y.value_counts().to_dict()}\")"
488 ]
489 },
490 {
491 "cell_type": "markdown",
492 "metadata": {},
493 "source": [
494 "## 5. ๋ชจ๋ธ๋ง\n",
495 "\n",
496 "### 5.1 ๋ฐ์ดํฐ ๋ถํ "
497 ]
498 },
499 {
500 "cell_type": "code",
501 "execution_count": null,
502 "metadata": {},
503 "outputs": [],
504 "source": [
505 "# Train/Test ๋ถํ (Stratified)\n",
506 "X_train, X_test, y_train, y_test = train_test_split(\n",
507 " X, y, test_size=0.2, random_state=42, stratify=y\n",
508 ")\n",
509 "\n",
510 "print(f\"ํ์ต ๋ฐ์ดํฐ: {X_train.shape}\")\n",
511 "print(f\"ํ
์คํธ ๋ฐ์ดํฐ: {X_test.shape}\")\n",
512 "print(f\"\\nํ์ต ๋ฐ์ดํฐ ํ๊ฒ ๋ถํฌ: {y_train.value_counts().to_dict()}\")\n",
513 "print(f\"ํ
์คํธ ๋ฐ์ดํฐ ํ๊ฒ ๋ถํฌ: {y_test.value_counts().to_dict()}\")"
514 ]
515 },
516 {
517 "cell_type": "code",
518 "execution_count": null,
519 "metadata": {},
520 "outputs": [],
521 "source": [
522 "# ์ค์ผ์ผ๋ง (์ ํ ๋ชจ๋ธ์ฉ)\n",
523 "scaler = StandardScaler()\n",
524 "X_train_scaled = scaler.fit_transform(X_train)\n",
525 "X_test_scaled = scaler.transform(X_test)\n",
526 "\n",
527 "print(\"์ค์ผ์ผ๋ง ์๋ฃ\")"
528 ]
529 },
530 {
531 "cell_type": "markdown",
532 "metadata": {},
533 "source": [
534 "### 5.2 Baseline ๋ชจ๋ธ\n",
535 "\n",
536 "๊ฐ๋จํ ๋ชจ๋ธ๋ก ๊ธฐ์ค์ ์ ์ค์ ํฉ๋๋ค."
537 ]
538 },
539 {
540 "cell_type": "code",
541 "execution_count": null,
542 "metadata": {},
543 "outputs": [],
544 "source": [
545 "# ๊ธฐ์ค์ : ํญ์ ๋ค์ ํด๋์ค ์์ธก\n",
546 "baseline_pred = np.zeros(len(y_test)) # ๋ชจ๋ 0 (์ฌ๋ง) ์์ธก\n",
547 "baseline_acc = accuracy_score(y_test, baseline_pred)\n",
548 "\n",
549 "print(f\"Baseline ์ ํ๋ (ํญ์ ์ฌ๋ง ์์ธก): {baseline_acc:.4f}\")\n",
550 "print(\"\\n์ด ๊ฐ๋ณด๋ค ๋์ ์ฑ๋ฅ์ ๋ชฉํ๋ก ํฉ๋๋ค.\")"
551 ]
552 },
553 {
554 "cell_type": "markdown",
555 "metadata": {},
556 "source": [
557 "### 5.3 ์ฌ๋ฌ ๋ชจ๋ธ ๋น๊ต"
558 ]
559 },
560 {
561 "cell_type": "code",
562 "execution_count": null,
563 "metadata": {},
564 "outputs": [],
565 "source": [
566 "# ๋ค์ํ ๋ชจ๋ธ ์ ์\n",
567 "models = {\n",
568 " 'Logistic Regression': LogisticRegression(max_iter=1000, random_state=42),\n",
569 " 'Decision Tree': DecisionTreeClassifier(random_state=42),\n",
570 " 'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),\n",
571 " 'Gradient Boosting': GradientBoostingClassifier(random_state=42),\n",
572 " 'SVM': SVC(random_state=42)\n",
573 "}\n",
574 "\n",
575 "# ๋ชจ๋ธ ๋น๊ต\n",
576 "print(\"=== ๋ชจ๋ธ ๋น๊ต (5-Fold Cross Validation) ===\")\n",
577 "results = []\n",
578 "\n",
579 "for name, model in models.items():\n",
580 " # ์ ํ ๋ชจ๋ธ์ ์ค์ผ์ผ๋ง๋ ๋ฐ์ดํฐ ์ฌ์ฉ\n",
581 " if name in ['Logistic Regression', 'SVM']:\n",
582 " X_tr, X_te = X_train_scaled, X_test_scaled\n",
583 " else:\n",
584 " X_tr, X_te = X_train, X_test\n",
585 " \n",
586 " # ๊ต์ฐจ ๊ฒ์ฆ\n",
587 " cv_scores = cross_val_score(model, X_tr, y_train, cv=5, scoring='accuracy')\n",
588 " \n",
589 " # ํ์ต ๋ฐ ํ
์คํธ\n",
590 " model.fit(X_tr, y_train)\n",
591 " test_score = model.score(X_te, y_test)\n",
592 " \n",
593 " results.append({\n",
594 " 'Model': name,\n",
595 " 'CV Mean': cv_scores.mean(),\n",
596 " 'CV Std': cv_scores.std(),\n",
597 " 'Test Score': test_score\n",
598 " })\n",
599 " \n",
600 " print(f\"{name}:\")\n",
601 " print(f\" CV = {cv_scores.mean():.4f} (+/- {cv_scores.std():.4f})\")\n",
602 " print(f\" Test = {test_score:.4f}\")\n",
603 " print()\n",
604 "\n",
605 "results_df = pd.DataFrame(results)\n",
606 "results_df = results_df.sort_values(by='CV Mean', ascending=False)\n",
607 "print(\"\\n๋ชจ๋ธ ์์:\")\n",
608 "print(results_df)"
609 ]
610 },
611 {
612 "cell_type": "code",
613 "execution_count": null,
614 "metadata": {},
615 "outputs": [],
616 "source": [
617 "# ๊ฒฐ๊ณผ ์๊ฐํ\n",
618 "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
619 "\n",
620 "# CV ์ ์\n",
621 "axes[0].barh(results_df['Model'], results_df['CV Mean'])\n",
622 "axes[0].set_xlabel('CV Accuracy')\n",
623 "axes[0].set_title('Cross-Validation Scores')\n",
624 "axes[0].set_xlim(0.7, 0.9)\n",
625 "\n",
626 "# Test ์ ์\n",
627 "axes[1].barh(results_df['Model'], results_df['Test Score'])\n",
628 "axes[1].set_xlabel('Test Accuracy')\n",
629 "axes[1].set_title('Test Scores')\n",
630 "axes[1].set_xlim(0.7, 0.9)\n",
631 "\n",
632 "plt.tight_layout()\n",
633 "plt.show()"
634 ]
635 },
636 {
637 "cell_type": "markdown",
638 "metadata": {},
639 "source": [
640 "### 5.4 ํ์ดํผํ๋ผ๋ฏธํฐ ํ๋\n",
641 "\n",
642 "์ต๊ณ ์ฑ๋ฅ ๋ชจ๋ธ์ ๋ํด ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ํ๋ํฉ๋๋ค."
643 ]
644 },
645 {
646 "cell_type": "code",
647 "execution_count": null,
648 "metadata": {},
649 "outputs": [],
650 "source": [
651 "# Random Forest ํ๋\n",
652 "rf_param_grid = {\n",
653 " 'n_estimators': [100, 200, 300],\n",
654 " 'max_depth': [5, 10, 15, None],\n",
655 " 'min_samples_split': [2, 5, 10],\n",
656 " 'min_samples_leaf': [1, 2, 4],\n",
657 " 'max_features': ['sqrt', 'log2']\n",
658 "}\n",
659 "\n",
660 "rf = RandomForestClassifier(random_state=42)\n",
661 "grid_search = GridSearchCV(\n",
662 " rf, rf_param_grid, \n",
663 " cv=5, \n",
664 " scoring='accuracy', \n",
665 " n_jobs=-1, \n",
666 " verbose=1\n",
667 ")\n",
668 "\n",
669 "print(\"Grid Search ์์...\")\n",
670 "grid_search.fit(X_train, y_train)\n",
671 "\n",
672 "print(\"\\n=== ํ์ดํผํ๋ผ๋ฏธํฐ ํ๋ ๊ฒฐ๊ณผ ===\")\n",
673 "print(f\"์ต์ ํ๋ผ๋ฏธํฐ: {grid_search.best_params_}\")\n",
674 "print(f\"์ต์ CV ์ ์: {grid_search.best_score_:.4f}\")\n",
675 "print(f\"ํ
์คํธ ์ ์: {grid_search.score(X_test, y_test):.4f}\")\n",
676 "\n",
677 "best_model = grid_search.best_estimator_"
678 ]
679 },
680 {
681 "cell_type": "markdown",
682 "metadata": {},
683 "source": [
684 "## 6. ๋ชจ๋ธ ํ๊ฐ\n",
685 "\n",
686 "### 6.1 ๋ถ๋ฅ ์ฑ๋ฅ ์งํ"
687 ]
688 },
689 {
690 "cell_type": "code",
691 "execution_count": null,
692 "metadata": {},
693 "outputs": [],
694 "source": [
695 "# ์์ธก\n",
696 "y_pred = best_model.predict(X_test)\n",
697 "y_pred_proba = best_model.predict_proba(X_test)[:, 1]\n",
698 "\n",
699 "# ๋ถ๋ฅ ๋ฆฌํฌํธ\n",
700 "print(\"=== ๋ถ๋ฅ ๋ฆฌํฌํธ ===\")\n",
701 "print(classification_report(y_test, y_pred, target_names=['Not Survived', 'Survived']))\n",
702 "\n",
703 "# ROC AUC\n",
704 "roc_auc = roc_auc_score(y_test, y_pred_proba)\n",
705 "print(f\"\\nROC AUC Score: {roc_auc:.4f}\")"
706 ]
707 },
708 {
709 "cell_type": "markdown",
710 "metadata": {},
711 "source": [
712 "### 6.2 ํผ๋ ํ๋ ฌ"
713 ]
714 },
715 {
716 "cell_type": "code",
717 "execution_count": null,
718 "metadata": {},
719 "outputs": [],
720 "source": [
721 "# ํผ๋ ํ๋ ฌ ์๊ฐํ\n",
722 "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
723 "\n",
724 "# ํผ๋ ํ๋ ฌ\n",
725 "cm = confusion_matrix(y_test, y_pred)\n",
726 "sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',\n",
727 " xticklabels=['Not Survived', 'Survived'],\n",
728 " yticklabels=['Not Survived', 'Survived'],\n",
729 " ax=axes[0])\n",
730 "axes[0].set_xlabel('Predicted')\n",
731 "axes[0].set_ylabel('Actual')\n",
732 "axes[0].set_title('Confusion Matrix')\n",
733 "\n",
734 "# ROC Curve\n",
735 "fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)\n",
736 "axes[1].plot(fpr, tpr, label=f'ROC Curve (AUC = {roc_auc:.4f})')\n",
737 "axes[1].plot([0, 1], [0, 1], 'k--', label='Random')\n",
738 "axes[1].set_xlabel('False Positive Rate')\n",
739 "axes[1].set_ylabel('True Positive Rate')\n",
740 "axes[1].set_title('ROC Curve')\n",
741 "axes[1].legend()\n",
742 "axes[1].grid(True)\n",
743 "\n",
744 "plt.tight_layout()\n",
745 "plt.show()"
746 ]
747 },
748 {
749 "cell_type": "markdown",
750 "metadata": {},
751 "source": [
752 "### 6.3 ํน์ฑ ์ค์๋"
753 ]
754 },
755 {
756 "cell_type": "code",
757 "execution_count": null,
758 "metadata": {},
759 "outputs": [],
760 "source": [
761 "# ํน์ฑ ์ค์๋\n",
762 "importances = best_model.feature_importances_\n",
763 "indices = np.argsort(importances)[::-1]\n",
764 "\n",
765 "plt.figure(figsize=(12, 6))\n",
766 "plt.bar(range(len(importances)), importances[indices])\n",
767 "plt.xticks(range(len(importances)), [features[i] for i in indices], rotation=45)\n",
768 "plt.xlabel('Feature')\n",
769 "plt.ylabel('Importance')\n",
770 "plt.title('Feature Importance')\n",
771 "plt.tight_layout()\n",
772 "plt.show()\n",
773 "\n",
774 "print(\"\\nํน์ฑ ์ค์๋ ์์:\")\n",
775 "for i in indices:\n",
776 " print(f\" {features[i]:15s}: {importances[i]:.4f}\")"
777 ]
778 },
779 {
780 "cell_type": "markdown",
781 "metadata": {},
782 "source": [
783 "### 6.4 ์ค๋ฅ ๋ถ์"
784 ]
785 },
786 {
787 "cell_type": "code",
788 "execution_count": null,
789 "metadata": {},
790 "outputs": [],
791 "source": [
792 "# ์๋ชป ์์ธก๋ ์ผ์ด์ค ๋ถ์\n",
793 "X_test_df = X_test.copy()\n",
794 "X_test_df['actual'] = y_test.values\n",
795 "X_test_df['predicted'] = y_pred\n",
796 "X_test_df['correct'] = X_test_df['actual'] == X_test_df['predicted']\n",
797 "\n",
798 "print(\"=== ์์ธก ๊ฒฐ๊ณผ ===\")\n",
799 "print(f\"์ ํํ ์์ธก: {X_test_df['correct'].sum()} / {len(X_test_df)}\")\n",
800 "print(f\"์๋ชป ์์ธก: {(~X_test_df['correct']).sum()} / {len(X_test_df)}\")\n",
801 "\n",
802 "# False Positive์ False Negative\n",
803 "fp = X_test_df[(X_test_df['actual'] == 0) & (X_test_df['predicted'] == 1)]\n",
804 "fn = X_test_df[(X_test_df['actual'] == 1) & (X_test_df['predicted'] == 0)]\n",
805 "\n",
806 "print(f\"\\nFalse Positive (์ค์ ์ฌ๋ง, ์์ธก ์์กด): {len(fp)}\")\n",
807 "print(f\"False Negative (์ค์ ์์กด, ์์ธก ์ฌ๋ง): {len(fn)}\")\n",
808 "\n",
809 "print(\"\\nFalse Negative ์ํ (์ฒ์ 5๊ฐ):\")\n",
810 "print(fn.head())"
811 ]
812 },
813 {
814 "cell_type": "markdown",
815 "metadata": {},
816 "source": [
817 "## 7. Kaggle ๊ฒฝ์ง๋ํ ์ ๋ต\n",
818 "\n",
819 "### 7.1 ์์๋ธ ๊ธฐ๋ฒ"
820 ]
821 },
822 {
823 "cell_type": "code",
824 "execution_count": null,
825 "metadata": {},
826 "outputs": [],
827 "source": [
828 "# ์ฌ๋ฌ ๋ชจ๋ธ์ ์์ธก์ ๊ฒฐํฉ\n",
829 "def simple_blend(models, X_train, y_train, X_test, weights=None):\n",
830 " \"\"\"๊ฐ๋จํ ๋ธ๋ ๋ฉ ์์๋ธ\"\"\"\n",
831 " if weights is None:\n",
832 " weights = [1/len(models)] * len(models)\n",
833 " \n",
834 " predictions = np.zeros(len(X_test))\n",
835 " \n",
836 " for model, weight in zip(models, weights):\n",
837 " model.fit(X_train, y_train)\n",
838 " pred_proba = model.predict_proba(X_test)[:, 1]\n",
839 " predictions += weight * pred_proba\n",
840 " \n",
841 " return (predictions > 0.5).astype(int)\n",
842 "\n",
843 "\n",
844 "# ์์๋ธ ๋ชจ๋ธ\n",
845 "ensemble_models = [\n",
846 " RandomForestClassifier(n_estimators=200, random_state=42),\n",
847 " GradientBoostingClassifier(n_estimators=100, random_state=42),\n",
848 " LogisticRegression(max_iter=1000, random_state=42)\n",
849 "]\n",
850 "\n",
851 "# ์ธ ๋ฒ์งธ ๋ชจ๋ธ์ ์ค์ผ์ผ๋ง๋ ๋ฐ์ดํฐ ์ฌ์ฉ\n",
852 "y_pred_ensemble = simple_blend(\n",
853 " [ensemble_models[0], ensemble_models[1]], \n",
854 " X_train, y_train, X_test\n",
855 ")\n",
856 "\n",
857 "# ํ๊ฐ\n",
858 "ensemble_acc = accuracy_score(y_test, y_pred_ensemble)\n",
859 "print(f\"์์๋ธ ์ ํ๋: {ensemble_acc:.4f}\")\n",
860 "print(f\"์ต๊ณ ๋จ์ผ ๋ชจ๋ธ ์ ํ๋: {best_model.score(X_test, y_test):.4f}\")\n",
861 "print(f\"ํฅ์: {(ensemble_acc - best_model.score(X_test, y_test)):.4f}\")"
862 ]
863 },
864 {
865 "cell_type": "markdown",
866 "metadata": {},
867 "source": [
868 "### 7.2 Kaggle ์ ์ถ ํ์ผ ํ์"
869 ]
870 },
871 {
872 "cell_type": "code",
873 "execution_count": null,
874 "metadata": {},
875 "outputs": [],
876 "source": [
877 "# Kaggle ์ ์ถ์ฉ ์์ธก ์์ฑ (์ค์ Kaggle์์๋ test.csv ์ฌ์ฉ)\n",
878 "# ์ฌ๊ธฐ์๋ ์์๋ก ํ
์คํธ ๋ฐ์ดํฐ ์ฌ์ฉ\n",
879 "\n",
880 "submission = pd.DataFrame({\n",
881 " 'PassengerId': range(1, len(y_pred) + 1), # ์ค์ ๋ก๋ test.csv์ PassengerId ์ฌ์ฉ\n",
882 " 'Survived': y_pred\n",
883 "})\n",
884 "\n",
885 "print(\"์ ์ถ ํ์ผ ํ์:\")\n",
886 "print(submission.head(10))\n",
887 "\n",
888 "# CSV๋ก ์ ์ฅ\n",
889 "# submission.to_csv('titanic_submission.csv', index=False)\n",
890 "# print(\"\\nsubmission.csv ์ ์ฅ ์๋ฃ\")"
891 ]
892 },
893 {
894 "cell_type": "markdown",
895 "metadata": {},
896 "source": [
897 "## 8. Kaggle ํ์ ํ\n",
898 "\n",
899 "### 8.1 ๊ฒฝ์ง๋ํ ์ฒดํฌ๋ฆฌ์คํธ\n",
900 "\n",
901 "**1. ๋น ๋ฅธ ์์**\n",
902 "- Baseline ์ฝ๋ ์คํํ์ฌ ์ฒซ ์ ์ถ\n",
903 "- ๋ฆฌ๋๋ณด๋ ์์น ํ์ธ\n",
904 "\n",
905 "**2. EDA ์ง์ค**\n",
906 "- ๋ฐ์ดํฐ ์ดํด๊ฐ ํต์ฌ\n",
907 "- ๊ฒฐ์ธก์น, ์ด์์น, ๋ถํฌ ํ์
\n",
908 "- ํ๊ฒ๊ณผ์ ๊ด๊ณ ๋ถ์\n",
909 "\n",
910 "**3. ํน์ฑ ์์ง๋์ด๋ง**\n",
911 "- ๋๋ฉ์ธ ์ง์ ํ์ฉ\n",
912 "- ๊ต์ฐจ ํน์ฑ ์์ฑ (์: family_size)\n",
913 "- ๊ทธ๋ฃน๋ณ ํต๊ณ๋ (์: ๊ทธ๋ฃน๋ณ ํ๊ท )\n",
914 "\n",
915 "**4. ๋ค์ํ ๋ชจ๋ธ ์๋**\n",
916 "- ์ ํ ๋ชจ๋ธ โ ํธ๋ฆฌ ๊ธฐ๋ฐ โ ์์๋ธ\n",
917 "- ํ์ดํผํ๋ผ๋ฏธํฐ ํ๋\n",
918 "\n",
919 "**5. ์์๋ธ**\n",
920 "- ๋ค๋ฅธ ๋ชจ๋ธ ์์ธก ๊ฒฐํฉ\n",
921 "- ๋ธ๋ ๋ฉ, ์คํํน\n",
922 "\n",
923 "**6. ๊ฒ์ฆ ์ ๋ต**\n",
924 "- ๋ก์ปฌ CV์ ๋ฆฌ๋๋ณด๋ ์ ์ ์ผ์น ํ์ธ\n",
925 "- ๊ณผ์ ํฉ ์ฃผ์ (Public LB์ ๋ง์ถ์ง ๋ง ๊ฒ)\n",
926 "\n",
927 "### 8.2 ๊ต์ฐจ ๊ฒ์ฆ ์ ๋ต"
928 ]
929 },
930 {
931 "cell_type": "code",
932 "execution_count": null,
933 "metadata": {},
934 "outputs": [],
935 "source": [
936 "def cross_validate_model(model, X, y, n_splits=5, stratified=True):\n",
937 " \"\"\"\n",
938 " ๊ต์ฐจ ๊ฒ์ฆ ์ํ\n",
939 " \n",
940 " Parameters:\n",
941 " -----------\n",
942 " model : sklearn estimator\n",
943 " X : features\n",
944 " y : target\n",
945 " n_splits : ํด๋ ์\n",
946 " stratified : ๊ณ์ธตํ ์ฌ๋ถ\n",
947 " \"\"\"\n",
948 " if stratified:\n",
949 " kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)\n",
950 " else:\n",
951 " kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)\n",
952 " \n",
953 " scores = []\n",
954 " \n",
955 " for fold, (train_idx, val_idx) in enumerate(kf.split(X, y)):\n",
956 " X_train_fold = X.iloc[train_idx]\n",
957 " X_val_fold = X.iloc[val_idx]\n",
958 " y_train_fold = y.iloc[train_idx]\n",
959 " y_val_fold = y.iloc[val_idx]\n",
960 " \n",
961 " model.fit(X_train_fold, y_train_fold)\n",
962 " score = model.score(X_val_fold, y_val_fold)\n",
963 " scores.append(score)\n",
964 " \n",
965 " print(f\"Fold {fold+1}: {score:.4f}\")\n",
966 " \n",
967 " print(f\"\\nMean: {np.mean(scores):.4f} (+/- {np.std(scores):.4f})\")\n",
968 " return np.mean(scores)\n",
969 "\n",
970 "\n",
971 "# ์ฌ์ฉ ์์\n",
972 "print(\"=== Random Forest ๊ต์ฐจ ๊ฒ์ฆ ===\")\n",
973 "cv_score = cross_validate_model(\n",
974 " RandomForestClassifier(n_estimators=100, random_state=42),\n",
975 " X, y, n_splits=5\n",
976 ")"
977 ]
978 },
979 {
980 "cell_type": "markdown",
981 "metadata": {},
982 "source": [
983 "## ์์ฝ\n",
984 "\n",
985 "### ํ๋ก์ ํธ ์ํฌํ๋ก์ฐ\n",
986 "\n",
987 "1. **๋ฌธ์ ์ ์**: ๋ชฉํ์ ํ๊ฐ ์งํ ์ค์ \n",
988 "2. **๋ฐ์ดํฐ ํ์**: EDA๋ก ๋ฐ์ดํฐ ์ดํด\n",
989 "3. **์ ์ฒ๋ฆฌ**: ๊ฒฐ์ธก์น ์ฒ๋ฆฌ, ์ธ์ฝ๋ฉ\n",
990 "4. **ํน์ฑ ์์ง๋์ด๋ง**: ๋๋ฉ์ธ ์ง์ ํ์ฉ\n",
991 "5. **๋ชจ๋ธ๋ง**: ์ฌ๋ฌ ๋ชจ๋ธ ๋น๊ต\n",
992 "6. **ํ๋**: ํ์ดํผํ๋ผ๋ฏธํฐ ์ต์ ํ\n",
993 "7. **ํ๊ฐ**: ๋ค์ํ ์งํ๋ก ์ฑ๋ฅ ํ๊ฐ\n",
994 "8. **์์๋ธ**: ์ฌ๋ฌ ๋ชจ๋ธ ๊ฒฐํฉ\n",
995 "\n",
996 "### ํต์ฌ ํฌ์ธํธ\n",
997 "\n",
998 "- **EDA๊ฐ ๊ฐ์ฅ ์ค์**: ๋ฐ์ดํฐ ์ดํด ์์ด๋ ์ข์ ๋ชจ๋ธ์ ๋ง๋ค ์ ์์\n",
999 "- **ํน์ฑ ์์ง๋์ด๋ง**: ๋ชจ๋ธ ์ฑ๋ฅ ํฅ์์ ํต์ฌ\n",
1000 "- **๊ต์ฐจ ๊ฒ์ฆ**: ๊ณผ์ ํฉ ๋ฐฉ์ง์ ์ผ๋ฐํ ์ฑ๋ฅ ํ์ธ\n",
1001 "- **์์๋ธ**: ๋ค์ํ ๋ชจ๋ธ ๊ฒฐํฉ์ผ๋ก ์ฑ๋ฅ ํฅ์\n",
1002 "- **๋ฐ๋ณต ๊ฐ์ **: ํ ๋ฒ์ ์๋ฒฝํ ๋ชจ๋ธ์ ์์, ์ง์์ ๊ฐ์ ์ด ํ์"
1003 ]
1004 }
1005 ],
1006 "metadata": {
1007 "kernelspec": {
1008 "display_name": "Python 3",
1009 "language": "python",
1010 "name": "python3"
1011 },
1012 "language_info": {
1013 "codemirror_mode": {
1014 "name": "ipython",
1015 "version": 3
1016 },
1017 "file_extension": ".py",
1018 "mimetype": "text/x-python",
1019 "name": "python",
1020 "nbconvert_exporter": "python",
1021 "pygments_lexer": "ipython3",
1022 "version": "3.8.0"
1023 }
1024 },
1025 "nbformat": 4,
1026 "nbformat_minor": 4
1027}