07_visualization_matplotlib.ipynb

Download
json 729 lines 24.4 KB
  1{
  2 "cells": [
  3  {
  4   "cell_type": "markdown",
  5   "metadata": {},
  6   "source": [
  7    "# Data Visualization with Matplotlib and Seaborn\n",
  8    "\n",
  9    "This notebook demonstrates comprehensive data visualization techniques using Matplotlib and Seaborn.\n",
 10    "\n",
 11    "## Topics Covered:\n",
 12    "- Line plots and time series\n",
 13    "- Bar charts (single and grouped)\n",
 14    "- Scatter plots\n",
 15    "- Histograms and distributions\n",
 16    "- Box plots and violin plots\n",
 17    "- Heatmaps (correlation matrices)\n",
 18    "- Pie charts\n",
 19    "- Subplots and figure composition\n",
 20    "- Customization: titles, labels, legends, colors, styles"
 21   ]
 22  },
 23  {
 24   "cell_type": "code",
 25   "execution_count": null,
 26   "metadata": {},
 27   "outputs": [],
 28   "source": [
 29    "import numpy as np\n",
 30    "import pandas as pd\n",
 31    "import matplotlib.pyplot as plt\n",
 32    "import seaborn as sns\n",
 33    "\n",
 34    "# Enable inline plotting\n",
 35    "%matplotlib inline\n",
 36    "\n",
 37    "# Set style\n",
 38    "plt.style.use('seaborn-v0_8-darkgrid')\n",
 39    "sns.set_palette('husl')\n",
 40    "\n",
 41    "# Set random seed\n",
 42    "np.random.seed(42)\n",
 43    "\n",
 44    "# Figure size default\n",
 45    "plt.rcParams['figure.figsize'] = (10, 6)\n",
 46    "plt.rcParams['font.size'] = 10"
 47   ]
 48  },
 49  {
 50   "cell_type": "markdown",
 51   "metadata": {},
 52   "source": [
 53    "## 1. Generate Sample Data\n",
 54    "\n",
 55    "We'll create multiple datasets for demonstrating different visualization techniques."
 56   ]
 57  },
 58  {
 59   "cell_type": "code",
 60   "execution_count": null,
 61   "metadata": {},
 62   "outputs": [],
 63   "source": [
 64    "# Time series data\n",
 65    "dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')\n",
 66    "n_days = len(dates)\n",
 67    "\n",
 68    "# Sales data with trend and seasonality\n",
 69    "trend = np.linspace(100, 150, n_days)\n",
 70    "seasonality = 20 * np.sin(2 * np.pi * np.arange(n_days) / 365)\n",
 71    "noise = np.random.normal(0, 5, n_days)\n",
 72    "sales = trend + seasonality + noise\n",
 73    "\n",
 74    "ts_df = pd.DataFrame({\n",
 75    "    'date': dates,\n",
 76    "    'sales': sales\n",
 77    "})\n",
 78    "\n",
 79    "# Categorical data\n",
 80    "categories = ['Electronics', 'Clothing', 'Food', 'Books', 'Home & Garden']\n",
 81    "category_sales = [45000, 32000, 28000, 18000, 25000]\n",
 82    "category_counts = [450, 820, 950, 380, 520]\n",
 83    "\n",
 84    "category_df = pd.DataFrame({\n",
 85    "    'category': categories,\n",
 86    "    'sales': category_sales,\n",
 87    "    'transactions': category_counts\n",
 88    "})\n",
 89    "\n",
 90    "# Scatter plot data (with correlation)\n",
 91    "n_samples = 200\n",
 92    "advertising_spend = np.random.uniform(1000, 10000, n_samples)\n",
 93    "sales_revenue = 2.5 * advertising_spend + np.random.normal(0, 3000, n_samples)\n",
 94    "\n",
 95    "scatter_df = pd.DataFrame({\n",
 96    "    'advertising': advertising_spend,\n",
 97    "    'revenue': sales_revenue\n",
 98    "})\n",
 99    "\n",
100    "# Multi-variable data for correlation\n",
101    "n_obs = 300\n",
102    "corr_df = pd.DataFrame({\n",
103    "    'price': np.random.uniform(10, 100, n_obs),\n",
104    "    'advertising': np.random.uniform(500, 5000, n_obs),\n",
105    "    'competitor_price': np.random.uniform(15, 95, n_obs),\n",
106    "    'season': np.random.choice(['Winter', 'Spring', 'Summer', 'Fall'], n_obs)\n",
107    "})\n",
108    "corr_df['sales'] = (50 - 0.3 * corr_df['price'] + 0.002 * corr_df['advertising'] + \n",
109    "                    0.2 * corr_df['competitor_price'] + np.random.normal(0, 5, n_obs))\n",
110    "\n",
111    "print(\"Sample data generated successfully!\")\n",
112    "print(f\"Time series data: {len(ts_df)} rows\")\n",
113    "print(f\"Category data: {len(category_df)} rows\")\n",
114    "print(f\"Scatter data: {len(scatter_df)} rows\")\n",
115    "print(f\"Correlation data: {len(corr_df)} rows\")"
116   ]
117  },
118  {
119   "cell_type": "markdown",
120   "metadata": {},
121   "source": [
122    "## 2. Line Plots\n",
123    "\n",
124    "Line plots are ideal for visualizing trends over time or continuous data."
125   ]
126  },
127  {
128   "cell_type": "code",
129   "execution_count": null,
130   "metadata": {},
131   "outputs": [],
132   "source": [
133    "# Basic line plot\n",
134    "plt.figure(figsize=(12, 6))\n",
135    "plt.plot(ts_df['date'], ts_df['sales'], linewidth=2, color='steelblue', alpha=0.8)\n",
136    "plt.title('Daily Sales Over Time', fontsize=16, fontweight='bold')\n",
137    "plt.xlabel('Date', fontsize=12)\n",
138    "plt.ylabel('Sales ($)', fontsize=12)\n",
139    "plt.grid(True, alpha=0.3)\n",
140    "plt.xticks(rotation=45)\n",
141    "plt.tight_layout()\n",
142    "plt.show()"
143   ]
144  },
145  {
146   "cell_type": "code",
147   "execution_count": null,
148   "metadata": {},
149   "outputs": [],
150   "source": [
151    "# Multiple line plots with legend\n",
152    "# Calculate rolling averages\n",
153    "ts_df['sales_7d_ma'] = ts_df['sales'].rolling(window=7).mean()\n",
154    "ts_df['sales_30d_ma'] = ts_df['sales'].rolling(window=30).mean()\n",
155    "\n",
156    "plt.figure(figsize=(12, 6))\n",
157    "plt.plot(ts_df['date'], ts_df['sales'], label='Daily Sales', alpha=0.5, linewidth=1)\n",
158    "plt.plot(ts_df['date'], ts_df['sales_7d_ma'], label='7-Day Moving Average', linewidth=2, color='orange')\n",
159    "plt.plot(ts_df['date'], ts_df['sales_30d_ma'], label='30-Day Moving Average', linewidth=2, color='red')\n",
160    "plt.title('Sales with Moving Averages', fontsize=16, fontweight='bold')\n",
161    "plt.xlabel('Date', fontsize=12)\n",
162    "plt.ylabel('Sales ($)', fontsize=12)\n",
163    "plt.legend(loc='upper left', fontsize=10)\n",
164    "plt.grid(True, alpha=0.3)\n",
165    "plt.xticks(rotation=45)\n",
166    "plt.tight_layout()\n",
167    "plt.show()"
168   ]
169  },
170  {
171   "cell_type": "markdown",
172   "metadata": {},
173   "source": [
174    "## 3. Bar Charts\n",
175    "\n",
176    "Bar charts are excellent for comparing categorical data."
177   ]
178  },
179  {
180   "cell_type": "code",
181   "execution_count": null,
182   "metadata": {},
183   "outputs": [],
184   "source": [
185    "# Vertical bar chart\n",
186    "plt.figure(figsize=(10, 6))\n",
187    "bars = plt.bar(category_df['category'], category_df['sales'], color='skyblue', edgecolor='navy', linewidth=1.5)\n",
188    "\n",
189    "# Add value labels on bars\n",
190    "for bar in bars:\n",
191    "    height = bar.get_height()\n",
192    "    plt.text(bar.get_x() + bar.get_width()/2., height,\n",
193    "             f'${height:,.0f}',\n",
194    "             ha='center', va='bottom', fontsize=10, fontweight='bold')\n",
195    "\n",
196    "plt.title('Sales by Category', fontsize=16, fontweight='bold')\n",
197    "plt.xlabel('Category', fontsize=12)\n",
198    "plt.ylabel('Total Sales ($)', fontsize=12)\n",
199    "plt.xticks(rotation=45, ha='right')\n",
200    "plt.grid(axis='y', alpha=0.3)\n",
201    "plt.tight_layout()\n",
202    "plt.show()"
203   ]
204  },
205  {
206   "cell_type": "code",
207   "execution_count": null,
208   "metadata": {},
209   "outputs": [],
210   "source": [
211    "# Horizontal bar chart\n",
212    "plt.figure(figsize=(10, 6))\n",
213    "plt.barh(category_df['category'], category_df['transactions'], color='coral', edgecolor='darkred', linewidth=1.5)\n",
214    "plt.title('Number of Transactions by Category', fontsize=16, fontweight='bold')\n",
215    "plt.xlabel('Number of Transactions', fontsize=12)\n",
216    "plt.ylabel('Category', fontsize=12)\n",
217    "plt.grid(axis='x', alpha=0.3)\n",
218    "plt.tight_layout()\n",
219    "plt.show()"
220   ]
221  },
222  {
223   "cell_type": "code",
224   "execution_count": null,
225   "metadata": {},
226   "outputs": [],
227   "source": [
228    "# Grouped bar chart\n",
229    "x = np.arange(len(categories))\n",
230    "width = 0.35\n",
231    "\n",
232    "fig, ax = plt.subplots(figsize=(12, 6))\n",
233    "bars1 = ax.bar(x - width/2, category_df['sales']/1000, width, label='Sales ($K)', color='steelblue')\n",
234    "bars2 = ax.bar(x + width/2, category_df['transactions'], width, label='Transactions', color='orange')\n",
235    "\n",
236    "ax.set_title('Sales and Transactions by Category', fontsize=16, fontweight='bold')\n",
237    "ax.set_xlabel('Category', fontsize=12)\n",
238    "ax.set_ylabel('Values', fontsize=12)\n",
239    "ax.set_xticks(x)\n",
240    "ax.set_xticklabels(categories, rotation=45, ha='right')\n",
241    "ax.legend(fontsize=10)\n",
242    "ax.grid(axis='y', alpha=0.3)\n",
243    "plt.tight_layout()\n",
244    "plt.show()"
245   ]
246  },
247  {
248   "cell_type": "markdown",
249   "metadata": {},
250   "source": [
251    "## 4. Scatter Plots\n",
252    "\n",
253    "Scatter plots reveal relationships and correlations between two continuous variables."
254   ]
255  },
256  {
257   "cell_type": "code",
258   "execution_count": null,
259   "metadata": {},
260   "outputs": [],
261   "source": [
262    "# Basic scatter plot\n",
263    "plt.figure(figsize=(10, 6))\n",
264    "plt.scatter(scatter_df['advertising'], scatter_df['revenue'], \n",
265    "            alpha=0.6, s=50, color='purple', edgecolors='black', linewidth=0.5)\n",
266    "plt.title('Advertising Spend vs Revenue', fontsize=16, fontweight='bold')\n",
267    "plt.xlabel('Advertising Spend ($)', fontsize=12)\n",
268    "plt.ylabel('Revenue ($)', fontsize=12)\n",
269    "plt.grid(True, alpha=0.3)\n",
270    "plt.tight_layout()\n",
271    "plt.show()"
272   ]
273  },
274  {
275   "cell_type": "code",
276   "execution_count": null,
277   "metadata": {},
278   "outputs": [],
279   "source": [
280    "# Scatter plot with regression line\n",
281    "plt.figure(figsize=(10, 6))\n",
282    "plt.scatter(scatter_df['advertising'], scatter_df['revenue'], \n",
283    "            alpha=0.6, s=50, color='green', edgecolors='black', linewidth=0.5, label='Data points')\n",
284    "\n",
285    "# Add regression line\n",
286    "z = np.polyfit(scatter_df['advertising'], scatter_df['revenue'], 1)\n",
287    "p = np.poly1d(z)\n",
288    "plt.plot(scatter_df['advertising'], p(scatter_df['advertising']), \n",
289    "         \"r--\", linewidth=2, label=f'Fit: y={z[0]:.2f}x+{z[1]:.2f}')\n",
290    "\n",
291    "plt.title('Advertising Spend vs Revenue (with Trend Line)', fontsize=16, fontweight='bold')\n",
292    "plt.xlabel('Advertising Spend ($)', fontsize=12)\n",
293    "plt.ylabel('Revenue ($)', fontsize=12)\n",
294    "plt.legend(fontsize=10)\n",
295    "plt.grid(True, alpha=0.3)\n",
296    "plt.tight_layout()\n",
297    "plt.show()"
298   ]
299  },
300  {
301   "cell_type": "markdown",
302   "metadata": {},
303   "source": [
304    "## 5. Histograms and Distributions\n",
305    "\n",
306    "Histograms show the distribution of a continuous variable."
307   ]
308  },
309  {
310   "cell_type": "code",
311   "execution_count": null,
312   "metadata": {},
313   "outputs": [],
314   "source": [
315    "# Basic histogram\n",
316    "plt.figure(figsize=(10, 6))\n",
317    "plt.hist(corr_df['sales'], bins=30, color='teal', edgecolor='black', alpha=0.7)\n",
318    "plt.title('Distribution of Sales', fontsize=16, fontweight='bold')\n",
319    "plt.xlabel('Sales Value', fontsize=12)\n",
320    "plt.ylabel('Frequency', fontsize=12)\n",
321    "plt.axvline(corr_df['sales'].mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {corr_df[\"sales\"].mean():.2f}')\n",
322    "plt.axvline(corr_df['sales'].median(), color='orange', linestyle='--', linewidth=2, label=f'Median: {corr_df[\"sales\"].median():.2f}')\n",
323    "plt.legend(fontsize=10)\n",
324    "plt.grid(axis='y', alpha=0.3)\n",
325    "plt.tight_layout()\n",
326    "plt.show()"
327   ]
328  },
329  {
330   "cell_type": "code",
331   "execution_count": null,
332   "metadata": {},
333   "outputs": [],
334   "source": [
335    "# Seaborn histogram with KDE (Kernel Density Estimate)\n",
336    "plt.figure(figsize=(10, 6))\n",
337    "sns.histplot(corr_df['sales'], bins=30, kde=True, color='purple', edgecolor='black', alpha=0.6)\n",
338    "plt.title('Sales Distribution with KDE', fontsize=16, fontweight='bold')\n",
339    "plt.xlabel('Sales Value', fontsize=12)\n",
340    "plt.ylabel('Density', fontsize=12)\n",
341    "plt.grid(axis='y', alpha=0.3)\n",
342    "plt.tight_layout()\n",
343    "plt.show()"
344   ]
345  },
346  {
347   "cell_type": "code",
348   "execution_count": null,
349   "metadata": {},
350   "outputs": [],
351   "source": [
352    "# Multiple overlapping histograms\n",
353    "plt.figure(figsize=(10, 6))\n",
354    "for season in corr_df['season'].unique():\n",
355    "    season_data = corr_df[corr_df['season'] == season]['sales']\n",
356    "    plt.hist(season_data, bins=20, alpha=0.5, label=season)\n",
357    "\n",
358    "plt.title('Sales Distribution by Season', fontsize=16, fontweight='bold')\n",
359    "plt.xlabel('Sales Value', fontsize=12)\n",
360    "plt.ylabel('Frequency', fontsize=12)\n",
361    "plt.legend(fontsize=10)\n",
362    "plt.grid(axis='y', alpha=0.3)\n",
363    "plt.tight_layout()\n",
364    "plt.show()"
365   ]
366  },
367  {
368   "cell_type": "markdown",
369   "metadata": {},
370   "source": [
371    "## 6. Box Plots and Violin Plots\n",
372    "\n",
373    "These plots show distribution, quartiles, and outliers."
374   ]
375  },
376  {
377   "cell_type": "code",
378   "execution_count": null,
379   "metadata": {},
380   "outputs": [],
381   "source": [
382    "# Box plot\n",
383    "plt.figure(figsize=(10, 6))\n",
384    "sns.boxplot(data=corr_df, x='season', y='sales', palette='Set2')\n",
385    "plt.title('Sales Distribution by Season (Box Plot)', fontsize=16, fontweight='bold')\n",
386    "plt.xlabel('Season', fontsize=12)\n",
387    "plt.ylabel('Sales Value', fontsize=12)\n",
388    "plt.grid(axis='y', alpha=0.3)\n",
389    "plt.tight_layout()\n",
390    "plt.show()"
391   ]
392  },
393  {
394   "cell_type": "code",
395   "execution_count": null,
396   "metadata": {},
397   "outputs": [],
398   "source": [
399    "# Violin plot (combines box plot and KDE)\n",
400    "plt.figure(figsize=(10, 6))\n",
401    "sns.violinplot(data=corr_df, x='season', y='sales', palette='muted', inner='quartile')\n",
402    "plt.title('Sales Distribution by Season (Violin Plot)', fontsize=16, fontweight='bold')\n",
403    "plt.xlabel('Season', fontsize=12)\n",
404    "plt.ylabel('Sales Value', fontsize=12)\n",
405    "plt.grid(axis='y', alpha=0.3)\n",
406    "plt.tight_layout()\n",
407    "plt.show()"
408   ]
409  },
410  {
411   "cell_type": "code",
412   "execution_count": null,
413   "metadata": {},
414   "outputs": [],
415   "source": [
416    "# Side-by-side comparison\n",
417    "fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
418    "\n",
419    "sns.boxplot(data=corr_df, x='season', y='sales', palette='Set2', ax=axes[0])\n",
420    "axes[0].set_title('Box Plot', fontsize=14, fontweight='bold')\n",
421    "axes[0].set_xlabel('Season', fontsize=12)\n",
422    "axes[0].set_ylabel('Sales Value', fontsize=12)\n",
423    "axes[0].grid(axis='y', alpha=0.3)\n",
424    "\n",
425    "sns.violinplot(data=corr_df, x='season', y='sales', palette='muted', ax=axes[1])\n",
426    "axes[1].set_title('Violin Plot', fontsize=14, fontweight='bold')\n",
427    "axes[1].set_xlabel('Season', fontsize=12)\n",
428    "axes[1].set_ylabel('Sales Value', fontsize=12)\n",
429    "axes[1].grid(axis='y', alpha=0.3)\n",
430    "\n",
431    "plt.tight_layout()\n",
432    "plt.show()"
433   ]
434  },
435  {
436   "cell_type": "markdown",
437   "metadata": {},
438   "source": [
439    "## 7. Heatmaps (Correlation Matrices)\n",
440    "\n",
441    "Heatmaps visualize correlation matrices and patterns in data."
442   ]
443  },
444  {
445   "cell_type": "code",
446   "execution_count": null,
447   "metadata": {},
448   "outputs": [],
449   "source": [
450    "# Correlation matrix heatmap\n",
451    "numerical_cols = corr_df.select_dtypes(include=[np.number])\n",
452    "correlation_matrix = numerical_cols.corr()\n",
453    "\n",
454    "plt.figure(figsize=(10, 8))\n",
455    "sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', \n",
456    "            square=True, linewidths=0.5, cbar_kws={'shrink': 0.8})\n",
457    "plt.title('Correlation Matrix Heatmap', fontsize=16, fontweight='bold', pad=20)\n",
458    "plt.tight_layout()\n",
459    "plt.show()"
460   ]
461  },
462  {
463   "cell_type": "code",
464   "execution_count": null,
465   "metadata": {},
466   "outputs": [],
467   "source": [
468    "# Create a pivot table for heatmap\n",
469    "pivot_data = corr_df.pivot_table(values='sales', index='season', \n",
470    "                                  columns=pd.cut(corr_df['price'], bins=3, labels=['Low', 'Medium', 'High']),\n",
471    "                                  aggfunc='mean')\n",
472    "\n",
473    "plt.figure(figsize=(8, 6))\n",
474    "sns.heatmap(pivot_data, annot=True, fmt='.1f', cmap='YlGnBu', linewidths=0.5)\n",
475    "plt.title('Average Sales by Season and Price Range', fontsize=16, fontweight='bold', pad=20)\n",
476    "plt.xlabel('Price Range', fontsize=12)\n",
477    "plt.ylabel('Season', fontsize=12)\n",
478    "plt.tight_layout()\n",
479    "plt.show()"
480   ]
481  },
482  {
483   "cell_type": "markdown",
484   "metadata": {},
485   "source": [
486    "## 8. Pie Charts\n",
487    "\n",
488    "Pie charts show proportions and percentages."
489   ]
490  },
491  {
492   "cell_type": "code",
493   "execution_count": null,
494   "metadata": {},
495   "outputs": [],
496   "source": [
497    "# Basic pie chart\n",
498    "plt.figure(figsize=(10, 8))\n",
499    "colors = plt.cm.Set3(range(len(categories)))\n",
500    "explode = (0.05, 0, 0, 0, 0)  # Explode the first slice\n",
501    "\n",
502    "plt.pie(category_df['sales'], labels=category_df['category'], autopct='%1.1f%%',\n",
503    "        startangle=90, colors=colors, explode=explode, shadow=True)\n",
504    "plt.title('Sales Distribution by Category', fontsize=16, fontweight='bold', pad=20)\n",
505    "plt.axis('equal')\n",
506    "plt.tight_layout()\n",
507    "plt.show()"
508   ]
509  },
510  {
511   "cell_type": "code",
512   "execution_count": null,
513   "metadata": {},
514   "outputs": [],
515   "source": [
516    "# Donut chart\n",
517    "plt.figure(figsize=(10, 8))\n",
518    "colors = plt.cm.Pastel1(range(len(categories)))\n",
519    "\n",
520    "wedges, texts, autotexts = plt.pie(category_df['sales'], labels=category_df['category'], \n",
521    "                                     autopct='%1.1f%%', startangle=90, colors=colors)\n",
522    "\n",
523    "# Draw a white circle at the center to create donut\n",
524    "centre_circle = plt.Circle((0, 0), 0.70, fc='white')\n",
525    "fig = plt.gcf()\n",
526    "fig.gca().add_artist(centre_circle)\n",
527    "\n",
528    "plt.title('Sales Distribution (Donut Chart)', fontsize=16, fontweight='bold', pad=20)\n",
529    "plt.axis('equal')\n",
530    "plt.tight_layout()\n",
531    "plt.show()"
532   ]
533  },
534  {
535   "cell_type": "markdown",
536   "metadata": {},
537   "source": [
538    "## 9. Seaborn Advanced Plots\n",
539    "\n",
540    "Seaborn provides high-level statistical visualizations."
541   ]
542  },
543  {
544   "cell_type": "code",
545   "execution_count": null,
546   "metadata": {},
547   "outputs": [],
548   "source": [
549    "# Pairplot - shows relationships between all numerical variables\n",
550    "sns.pairplot(corr_df[['price', 'advertising', 'competitor_price', 'sales']], \n",
551    "             diag_kind='kde', plot_kws={'alpha': 0.6})\n",
552    "plt.suptitle('Pairplot of Numerical Variables', y=1.02, fontsize=16, fontweight='bold')\n",
553    "plt.tight_layout()\n",
554    "plt.show()"
555   ]
556  },
557  {
558   "cell_type": "code",
559   "execution_count": null,
560   "metadata": {},
561   "outputs": [],
562   "source": [
563    "# Jointplot - scatter plot with marginal distributions\n",
564    "sns.jointplot(data=scatter_df, x='advertising', y='revenue', \n",
565    "              kind='reg', height=8, color='purple', joint_kws={'alpha': 0.5})\n",
566    "plt.suptitle('Joint Plot: Advertising vs Revenue', y=1.02, fontsize=16, fontweight='bold')\n",
567    "plt.tight_layout()\n",
568    "plt.show()"
569   ]
570  },
571  {
572   "cell_type": "markdown",
573   "metadata": {},
574   "source": [
575    "## 10. Subplots and Figure Composition\n",
576    "\n",
577    "Creating multiple plots in a single figure for comprehensive analysis."
578   ]
579  },
580  {
581   "cell_type": "code",
582   "execution_count": null,
583   "metadata": {},
584   "outputs": [],
585   "source": [
586    "# 2x2 subplot grid\n",
587    "fig, axes = plt.subplots(2, 2, figsize=(14, 10))\n",
588    "\n",
589    "# Plot 1: Line plot\n",
590    "axes[0, 0].plot(ts_df['date'][:90], ts_df['sales'][:90], color='steelblue', linewidth=2)\n",
591    "axes[0, 0].set_title('Daily Sales (Q1)', fontsize=12, fontweight='bold')\n",
592    "axes[0, 0].set_xlabel('Date', fontsize=10)\n",
593    "axes[0, 0].set_ylabel('Sales ($)', fontsize=10)\n",
594    "axes[0, 0].grid(True, alpha=0.3)\n",
595    "axes[0, 0].tick_params(axis='x', rotation=45)\n",
596    "\n",
597    "# Plot 2: Bar chart\n",
598    "axes[0, 1].bar(category_df['category'], category_df['sales'], color='coral')\n",
599    "axes[0, 1].set_title('Sales by Category', fontsize=12, fontweight='bold')\n",
600    "axes[0, 1].set_xlabel('Category', fontsize=10)\n",
601    "axes[0, 1].set_ylabel('Total Sales ($)', fontsize=10)\n",
602    "axes[0, 1].tick_params(axis='x', rotation=45)\n",
603    "axes[0, 1].grid(axis='y', alpha=0.3)\n",
604    "\n",
605    "# Plot 3: Scatter plot\n",
606    "axes[1, 0].scatter(scatter_df['advertising'], scatter_df['revenue'], alpha=0.6, color='green')\n",
607    "axes[1, 0].set_title('Advertising vs Revenue', fontsize=12, fontweight='bold')\n",
608    "axes[1, 0].set_xlabel('Advertising ($)', fontsize=10)\n",
609    "axes[1, 0].set_ylabel('Revenue ($)', fontsize=10)\n",
610    "axes[1, 0].grid(True, alpha=0.3)\n",
611    "\n",
612    "# Plot 4: Histogram\n",
613    "axes[1, 1].hist(corr_df['sales'], bins=25, color='purple', edgecolor='black', alpha=0.7)\n",
614    "axes[1, 1].set_title('Sales Distribution', fontsize=12, fontweight='bold')\n",
615    "axes[1, 1].set_xlabel('Sales Value', fontsize=10)\n",
616    "axes[1, 1].set_ylabel('Frequency', fontsize=10)\n",
617    "axes[1, 1].grid(axis='y', alpha=0.3)\n",
618    "\n",
619    "plt.suptitle('Comprehensive Sales Analysis Dashboard', fontsize=18, fontweight='bold', y=1.00)\n",
620    "plt.tight_layout()\n",
621    "plt.show()"
622   ]
623  },
624  {
625   "cell_type": "code",
626   "execution_count": null,
627   "metadata": {},
628   "outputs": [],
629   "source": [
630    "# Complex layout with different sizes\n",
631    "fig = plt.figure(figsize=(14, 10))\n",
632    "gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)\n",
633    "\n",
634    "# Large plot spanning 2x2\n",
635    "ax1 = fig.add_subplot(gs[0:2, 0:2])\n",
636    "ax1.plot(ts_df['date'], ts_df['sales'], linewidth=2, color='steelblue')\n",
637    "ax1.fill_between(ts_df['date'], ts_df['sales'], alpha=0.3)\n",
638    "ax1.set_title('Full Year Sales Trend', fontsize=14, fontweight='bold')\n",
639    "ax1.set_xlabel('Date', fontsize=11)\n",
640    "ax1.set_ylabel('Sales ($)', fontsize=11)\n",
641    "ax1.grid(True, alpha=0.3)\n",
642    "ax1.tick_params(axis='x', rotation=45)\n",
643    "\n",
644    "# Top right plot\n",
645    "ax2 = fig.add_subplot(gs[0, 2])\n",
646    "ax2.pie(category_df['sales'][:3], labels=category_df['category'][:3], autopct='%1.1f%%')\n",
647    "ax2.set_title('Top 3 Categories', fontsize=12, fontweight='bold')\n",
648    "\n",
649    "# Middle right plot\n",
650    "ax3 = fig.add_subplot(gs[1, 2])\n",
651    "ax3.bar(range(len(categories)), category_df['transactions'], color='orange')\n",
652    "ax3.set_title('Transactions', fontsize=12, fontweight='bold')\n",
653    "ax3.set_xticks(range(len(categories)))\n",
654    "ax3.set_xticklabels(categories, rotation=45, ha='right', fontsize=8)\n",
655    "ax3.grid(axis='y', alpha=0.3)\n",
656    "\n",
657    "# Bottom row - 3 small plots\n",
658    "ax4 = fig.add_subplot(gs[2, 0])\n",
659    "ax4.hist(corr_df['price'], bins=20, color='green', alpha=0.7)\n",
660    "ax4.set_title('Price Distribution', fontsize=10, fontweight='bold')\n",
661    "ax4.set_xlabel('Price', fontsize=9)\n",
662    "\n",
663    "ax5 = fig.add_subplot(gs[2, 1])\n",
664    "ax5.hist(corr_df['advertising'], bins=20, color='red', alpha=0.7)\n",
665    "ax5.set_title('Advertising Distribution', fontsize=10, fontweight='bold')\n",
666    "ax5.set_xlabel('Advertising', fontsize=9)\n",
667    "\n",
668    "ax6 = fig.add_subplot(gs[2, 2])\n",
669    "sns.boxplot(data=corr_df, y='sales', ax=ax6, color='skyblue')\n",
670    "ax6.set_title('Sales Box Plot', fontsize=10, fontweight='bold')\n",
671    "\n",
672    "plt.suptitle('Advanced Multi-Panel Dashboard', fontsize=18, fontweight='bold')\n",
673    "plt.show()"
674   ]
675  },
676  {
677   "cell_type": "markdown",
678   "metadata": {},
679   "source": [
680    "## Summary\n",
681    "\n",
682    "In this notebook, we covered comprehensive data visualization techniques:\n",
683    "\n",
684    "### Matplotlib:\n",
685    "1. **Line Plots**: Trends over time, multiple series, moving averages\n",
686    "2. **Bar Charts**: Vertical, horizontal, grouped bars with labels\n",
687    "3. **Scatter Plots**: Relationships, correlations, regression lines\n",
688    "4. **Histograms**: Distributions, frequency analysis\n",
689    "5. **Pie Charts**: Proportions, donut charts\n",
690    "6. **Subplots**: Complex multi-panel layouts\n",
691    "\n",
692    "### Seaborn:\n",
693    "1. **Statistical Plots**: histplot with KDE, box plots, violin plots\n",
694    "2. **Heatmaps**: Correlation matrices, pivot tables\n",
695    "3. **Advanced Plots**: pairplot, jointplot\n",
696    "\n",
697    "### Customization:\n",
698    "- Titles, labels, legends\n",
699    "- Colors, styles, transparency\n",
700    "- Grid lines, annotations\n",
701    "- Figure size and layout\n",
702    "\n",
703    "These visualization techniques are essential for exploratory data analysis and communicating insights effectively."
704   ]
705  }
706 ],
707 "metadata": {
708  "kernelspec": {
709   "display_name": "Python 3",
710   "language": "python",
711   "name": "python3"
712  },
713  "language_info": {
714   "codemirror_mode": {
715    "name": "ipython",
716    "version": 3
717   },
718   "file_extension": ".py",
719   "mimetype": "text/x-python",
720   "name": "python",
721   "nbconvert_exporter": "python",
722   "pygments_lexer": "ipython3",
723   "version": "3.8.0"
724  }
725 },
726 "nbformat": 4,
727 "nbformat_minor": 4
728}