1{
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {},
6 "source": [
7 "# Data Visualization with Matplotlib and Seaborn\n",
8 "\n",
9 "This notebook demonstrates comprehensive data visualization techniques using Matplotlib and Seaborn.\n",
10 "\n",
11 "## Topics Covered:\n",
12 "- Line plots and time series\n",
13 "- Bar charts (single and grouped)\n",
14 "- Scatter plots\n",
15 "- Histograms and distributions\n",
16 "- Box plots and violin plots\n",
17 "- Heatmaps (correlation matrices)\n",
18 "- Pie charts\n",
19 "- Subplots and figure composition\n",
20 "- Customization: titles, labels, legends, colors, styles"
21 ]
22 },
23 {
24 "cell_type": "code",
25 "execution_count": null,
26 "metadata": {},
27 "outputs": [],
28 "source": [
29 "import numpy as np\n",
30 "import pandas as pd\n",
31 "import matplotlib.pyplot as plt\n",
32 "import seaborn as sns\n",
33 "\n",
34 "# Enable inline plotting\n",
35 "%matplotlib inline\n",
36 "\n",
37 "# Set style\n",
38 "plt.style.use('seaborn-v0_8-darkgrid')\n",
39 "sns.set_palette('husl')\n",
40 "\n",
41 "# Set random seed\n",
42 "np.random.seed(42)\n",
43 "\n",
44 "# Figure size default\n",
45 "plt.rcParams['figure.figsize'] = (10, 6)\n",
46 "plt.rcParams['font.size'] = 10"
47 ]
48 },
49 {
50 "cell_type": "markdown",
51 "metadata": {},
52 "source": [
53 "## 1. Generate Sample Data\n",
54 "\n",
55 "We'll create multiple datasets for demonstrating different visualization techniques."
56 ]
57 },
58 {
59 "cell_type": "code",
60 "execution_count": null,
61 "metadata": {},
62 "outputs": [],
63 "source": [
64 "# Time series data\n",
65 "dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')\n",
66 "n_days = len(dates)\n",
67 "\n",
68 "# Sales data with trend and seasonality\n",
69 "trend = np.linspace(100, 150, n_days)\n",
70 "seasonality = 20 * np.sin(2 * np.pi * np.arange(n_days) / 365)\n",
71 "noise = np.random.normal(0, 5, n_days)\n",
72 "sales = trend + seasonality + noise\n",
73 "\n",
74 "ts_df = pd.DataFrame({\n",
75 " 'date': dates,\n",
76 " 'sales': sales\n",
77 "})\n",
78 "\n",
79 "# Categorical data\n",
80 "categories = ['Electronics', 'Clothing', 'Food', 'Books', 'Home & Garden']\n",
81 "category_sales = [45000, 32000, 28000, 18000, 25000]\n",
82 "category_counts = [450, 820, 950, 380, 520]\n",
83 "\n",
84 "category_df = pd.DataFrame({\n",
85 " 'category': categories,\n",
86 " 'sales': category_sales,\n",
87 " 'transactions': category_counts\n",
88 "})\n",
89 "\n",
90 "# Scatter plot data (with correlation)\n",
91 "n_samples = 200\n",
92 "advertising_spend = np.random.uniform(1000, 10000, n_samples)\n",
93 "sales_revenue = 2.5 * advertising_spend + np.random.normal(0, 3000, n_samples)\n",
94 "\n",
95 "scatter_df = pd.DataFrame({\n",
96 " 'advertising': advertising_spend,\n",
97 " 'revenue': sales_revenue\n",
98 "})\n",
99 "\n",
100 "# Multi-variable data for correlation\n",
101 "n_obs = 300\n",
102 "corr_df = pd.DataFrame({\n",
103 " 'price': np.random.uniform(10, 100, n_obs),\n",
104 " 'advertising': np.random.uniform(500, 5000, n_obs),\n",
105 " 'competitor_price': np.random.uniform(15, 95, n_obs),\n",
106 " 'season': np.random.choice(['Winter', 'Spring', 'Summer', 'Fall'], n_obs)\n",
107 "})\n",
108 "corr_df['sales'] = (50 - 0.3 * corr_df['price'] + 0.002 * corr_df['advertising'] + \n",
109 " 0.2 * corr_df['competitor_price'] + np.random.normal(0, 5, n_obs))\n",
110 "\n",
111 "print(\"Sample data generated successfully!\")\n",
112 "print(f\"Time series data: {len(ts_df)} rows\")\n",
113 "print(f\"Category data: {len(category_df)} rows\")\n",
114 "print(f\"Scatter data: {len(scatter_df)} rows\")\n",
115 "print(f\"Correlation data: {len(corr_df)} rows\")"
116 ]
117 },
118 {
119 "cell_type": "markdown",
120 "metadata": {},
121 "source": [
122 "## 2. Line Plots\n",
123 "\n",
124 "Line plots are ideal for visualizing trends over time or continuous data."
125 ]
126 },
127 {
128 "cell_type": "code",
129 "execution_count": null,
130 "metadata": {},
131 "outputs": [],
132 "source": [
133 "# Basic line plot\n",
134 "plt.figure(figsize=(12, 6))\n",
135 "plt.plot(ts_df['date'], ts_df['sales'], linewidth=2, color='steelblue', alpha=0.8)\n",
136 "plt.title('Daily Sales Over Time', fontsize=16, fontweight='bold')\n",
137 "plt.xlabel('Date', fontsize=12)\n",
138 "plt.ylabel('Sales ($)', fontsize=12)\n",
139 "plt.grid(True, alpha=0.3)\n",
140 "plt.xticks(rotation=45)\n",
141 "plt.tight_layout()\n",
142 "plt.show()"
143 ]
144 },
145 {
146 "cell_type": "code",
147 "execution_count": null,
148 "metadata": {},
149 "outputs": [],
150 "source": [
151 "# Multiple line plots with legend\n",
152 "# Calculate rolling averages\n",
153 "ts_df['sales_7d_ma'] = ts_df['sales'].rolling(window=7).mean()\n",
154 "ts_df['sales_30d_ma'] = ts_df['sales'].rolling(window=30).mean()\n",
155 "\n",
156 "plt.figure(figsize=(12, 6))\n",
157 "plt.plot(ts_df['date'], ts_df['sales'], label='Daily Sales', alpha=0.5, linewidth=1)\n",
158 "plt.plot(ts_df['date'], ts_df['sales_7d_ma'], label='7-Day Moving Average', linewidth=2, color='orange')\n",
159 "plt.plot(ts_df['date'], ts_df['sales_30d_ma'], label='30-Day Moving Average', linewidth=2, color='red')\n",
160 "plt.title('Sales with Moving Averages', fontsize=16, fontweight='bold')\n",
161 "plt.xlabel('Date', fontsize=12)\n",
162 "plt.ylabel('Sales ($)', fontsize=12)\n",
163 "plt.legend(loc='upper left', fontsize=10)\n",
164 "plt.grid(True, alpha=0.3)\n",
165 "plt.xticks(rotation=45)\n",
166 "plt.tight_layout()\n",
167 "plt.show()"
168 ]
169 },
170 {
171 "cell_type": "markdown",
172 "metadata": {},
173 "source": [
174 "## 3. Bar Charts\n",
175 "\n",
176 "Bar charts are excellent for comparing categorical data."
177 ]
178 },
179 {
180 "cell_type": "code",
181 "execution_count": null,
182 "metadata": {},
183 "outputs": [],
184 "source": [
185 "# Vertical bar chart\n",
186 "plt.figure(figsize=(10, 6))\n",
187 "bars = plt.bar(category_df['category'], category_df['sales'], color='skyblue', edgecolor='navy', linewidth=1.5)\n",
188 "\n",
189 "# Add value labels on bars\n",
190 "for bar in bars:\n",
191 " height = bar.get_height()\n",
192 " plt.text(bar.get_x() + bar.get_width()/2., height,\n",
193 " f'${height:,.0f}',\n",
194 " ha='center', va='bottom', fontsize=10, fontweight='bold')\n",
195 "\n",
196 "plt.title('Sales by Category', fontsize=16, fontweight='bold')\n",
197 "plt.xlabel('Category', fontsize=12)\n",
198 "plt.ylabel('Total Sales ($)', fontsize=12)\n",
199 "plt.xticks(rotation=45, ha='right')\n",
200 "plt.grid(axis='y', alpha=0.3)\n",
201 "plt.tight_layout()\n",
202 "plt.show()"
203 ]
204 },
205 {
206 "cell_type": "code",
207 "execution_count": null,
208 "metadata": {},
209 "outputs": [],
210 "source": [
211 "# Horizontal bar chart\n",
212 "plt.figure(figsize=(10, 6))\n",
213 "plt.barh(category_df['category'], category_df['transactions'], color='coral', edgecolor='darkred', linewidth=1.5)\n",
214 "plt.title('Number of Transactions by Category', fontsize=16, fontweight='bold')\n",
215 "plt.xlabel('Number of Transactions', fontsize=12)\n",
216 "plt.ylabel('Category', fontsize=12)\n",
217 "plt.grid(axis='x', alpha=0.3)\n",
218 "plt.tight_layout()\n",
219 "plt.show()"
220 ]
221 },
222 {
223 "cell_type": "code",
224 "execution_count": null,
225 "metadata": {},
226 "outputs": [],
227 "source": [
228 "# Grouped bar chart\n",
229 "x = np.arange(len(categories))\n",
230 "width = 0.35\n",
231 "\n",
232 "fig, ax = plt.subplots(figsize=(12, 6))\n",
233 "bars1 = ax.bar(x - width/2, category_df['sales']/1000, width, label='Sales ($K)', color='steelblue')\n",
234 "bars2 = ax.bar(x + width/2, category_df['transactions'], width, label='Transactions', color='orange')\n",
235 "\n",
236 "ax.set_title('Sales and Transactions by Category', fontsize=16, fontweight='bold')\n",
237 "ax.set_xlabel('Category', fontsize=12)\n",
238 "ax.set_ylabel('Values', fontsize=12)\n",
239 "ax.set_xticks(x)\n",
240 "ax.set_xticklabels(categories, rotation=45, ha='right')\n",
241 "ax.legend(fontsize=10)\n",
242 "ax.grid(axis='y', alpha=0.3)\n",
243 "plt.tight_layout()\n",
244 "plt.show()"
245 ]
246 },
247 {
248 "cell_type": "markdown",
249 "metadata": {},
250 "source": [
251 "## 4. Scatter Plots\n",
252 "\n",
253 "Scatter plots reveal relationships and correlations between two continuous variables."
254 ]
255 },
256 {
257 "cell_type": "code",
258 "execution_count": null,
259 "metadata": {},
260 "outputs": [],
261 "source": [
262 "# Basic scatter plot\n",
263 "plt.figure(figsize=(10, 6))\n",
264 "plt.scatter(scatter_df['advertising'], scatter_df['revenue'], \n",
265 " alpha=0.6, s=50, color='purple', edgecolors='black', linewidth=0.5)\n",
266 "plt.title('Advertising Spend vs Revenue', fontsize=16, fontweight='bold')\n",
267 "plt.xlabel('Advertising Spend ($)', fontsize=12)\n",
268 "plt.ylabel('Revenue ($)', fontsize=12)\n",
269 "plt.grid(True, alpha=0.3)\n",
270 "plt.tight_layout()\n",
271 "plt.show()"
272 ]
273 },
274 {
275 "cell_type": "code",
276 "execution_count": null,
277 "metadata": {},
278 "outputs": [],
279 "source": [
280 "# Scatter plot with regression line\n",
281 "plt.figure(figsize=(10, 6))\n",
282 "plt.scatter(scatter_df['advertising'], scatter_df['revenue'], \n",
283 " alpha=0.6, s=50, color='green', edgecolors='black', linewidth=0.5, label='Data points')\n",
284 "\n",
285 "# Add regression line\n",
286 "z = np.polyfit(scatter_df['advertising'], scatter_df['revenue'], 1)\n",
287 "p = np.poly1d(z)\n",
288 "plt.plot(scatter_df['advertising'], p(scatter_df['advertising']), \n",
289 " \"r--\", linewidth=2, label=f'Fit: y={z[0]:.2f}x+{z[1]:.2f}')\n",
290 "\n",
291 "plt.title('Advertising Spend vs Revenue (with Trend Line)', fontsize=16, fontweight='bold')\n",
292 "plt.xlabel('Advertising Spend ($)', fontsize=12)\n",
293 "plt.ylabel('Revenue ($)', fontsize=12)\n",
294 "plt.legend(fontsize=10)\n",
295 "plt.grid(True, alpha=0.3)\n",
296 "plt.tight_layout()\n",
297 "plt.show()"
298 ]
299 },
300 {
301 "cell_type": "markdown",
302 "metadata": {},
303 "source": [
304 "## 5. Histograms and Distributions\n",
305 "\n",
306 "Histograms show the distribution of a continuous variable."
307 ]
308 },
309 {
310 "cell_type": "code",
311 "execution_count": null,
312 "metadata": {},
313 "outputs": [],
314 "source": [
315 "# Basic histogram\n",
316 "plt.figure(figsize=(10, 6))\n",
317 "plt.hist(corr_df['sales'], bins=30, color='teal', edgecolor='black', alpha=0.7)\n",
318 "plt.title('Distribution of Sales', fontsize=16, fontweight='bold')\n",
319 "plt.xlabel('Sales Value', fontsize=12)\n",
320 "plt.ylabel('Frequency', fontsize=12)\n",
321 "plt.axvline(corr_df['sales'].mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {corr_df[\"sales\"].mean():.2f}')\n",
322 "plt.axvline(corr_df['sales'].median(), color='orange', linestyle='--', linewidth=2, label=f'Median: {corr_df[\"sales\"].median():.2f}')\n",
323 "plt.legend(fontsize=10)\n",
324 "plt.grid(axis='y', alpha=0.3)\n",
325 "plt.tight_layout()\n",
326 "plt.show()"
327 ]
328 },
329 {
330 "cell_type": "code",
331 "execution_count": null,
332 "metadata": {},
333 "outputs": [],
334 "source": [
335 "# Seaborn histogram with KDE (Kernel Density Estimate)\n",
336 "plt.figure(figsize=(10, 6))\n",
337 "sns.histplot(corr_df['sales'], bins=30, kde=True, color='purple', edgecolor='black', alpha=0.6)\n",
338 "plt.title('Sales Distribution with KDE', fontsize=16, fontweight='bold')\n",
339 "plt.xlabel('Sales Value', fontsize=12)\n",
340 "plt.ylabel('Density', fontsize=12)\n",
341 "plt.grid(axis='y', alpha=0.3)\n",
342 "plt.tight_layout()\n",
343 "plt.show()"
344 ]
345 },
346 {
347 "cell_type": "code",
348 "execution_count": null,
349 "metadata": {},
350 "outputs": [],
351 "source": [
352 "# Multiple overlapping histograms\n",
353 "plt.figure(figsize=(10, 6))\n",
354 "for season in corr_df['season'].unique():\n",
355 " season_data = corr_df[corr_df['season'] == season]['sales']\n",
356 " plt.hist(season_data, bins=20, alpha=0.5, label=season)\n",
357 "\n",
358 "plt.title('Sales Distribution by Season', fontsize=16, fontweight='bold')\n",
359 "plt.xlabel('Sales Value', fontsize=12)\n",
360 "plt.ylabel('Frequency', fontsize=12)\n",
361 "plt.legend(fontsize=10)\n",
362 "plt.grid(axis='y', alpha=0.3)\n",
363 "plt.tight_layout()\n",
364 "plt.show()"
365 ]
366 },
367 {
368 "cell_type": "markdown",
369 "metadata": {},
370 "source": [
371 "## 6. Box Plots and Violin Plots\n",
372 "\n",
373 "These plots show distribution, quartiles, and outliers."
374 ]
375 },
376 {
377 "cell_type": "code",
378 "execution_count": null,
379 "metadata": {},
380 "outputs": [],
381 "source": [
382 "# Box plot\n",
383 "plt.figure(figsize=(10, 6))\n",
384 "sns.boxplot(data=corr_df, x='season', y='sales', palette='Set2')\n",
385 "plt.title('Sales Distribution by Season (Box Plot)', fontsize=16, fontweight='bold')\n",
386 "plt.xlabel('Season', fontsize=12)\n",
387 "plt.ylabel('Sales Value', fontsize=12)\n",
388 "plt.grid(axis='y', alpha=0.3)\n",
389 "plt.tight_layout()\n",
390 "plt.show()"
391 ]
392 },
393 {
394 "cell_type": "code",
395 "execution_count": null,
396 "metadata": {},
397 "outputs": [],
398 "source": [
399 "# Violin plot (combines box plot and KDE)\n",
400 "plt.figure(figsize=(10, 6))\n",
401 "sns.violinplot(data=corr_df, x='season', y='sales', palette='muted', inner='quartile')\n",
402 "plt.title('Sales Distribution by Season (Violin Plot)', fontsize=16, fontweight='bold')\n",
403 "plt.xlabel('Season', fontsize=12)\n",
404 "plt.ylabel('Sales Value', fontsize=12)\n",
405 "plt.grid(axis='y', alpha=0.3)\n",
406 "plt.tight_layout()\n",
407 "plt.show()"
408 ]
409 },
410 {
411 "cell_type": "code",
412 "execution_count": null,
413 "metadata": {},
414 "outputs": [],
415 "source": [
416 "# Side-by-side comparison\n",
417 "fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
418 "\n",
419 "sns.boxplot(data=corr_df, x='season', y='sales', palette='Set2', ax=axes[0])\n",
420 "axes[0].set_title('Box Plot', fontsize=14, fontweight='bold')\n",
421 "axes[0].set_xlabel('Season', fontsize=12)\n",
422 "axes[0].set_ylabel('Sales Value', fontsize=12)\n",
423 "axes[0].grid(axis='y', alpha=0.3)\n",
424 "\n",
425 "sns.violinplot(data=corr_df, x='season', y='sales', palette='muted', ax=axes[1])\n",
426 "axes[1].set_title('Violin Plot', fontsize=14, fontweight='bold')\n",
427 "axes[1].set_xlabel('Season', fontsize=12)\n",
428 "axes[1].set_ylabel('Sales Value', fontsize=12)\n",
429 "axes[1].grid(axis='y', alpha=0.3)\n",
430 "\n",
431 "plt.tight_layout()\n",
432 "plt.show()"
433 ]
434 },
435 {
436 "cell_type": "markdown",
437 "metadata": {},
438 "source": [
439 "## 7. Heatmaps (Correlation Matrices)\n",
440 "\n",
441 "Heatmaps visualize correlation matrices and patterns in data."
442 ]
443 },
444 {
445 "cell_type": "code",
446 "execution_count": null,
447 "metadata": {},
448 "outputs": [],
449 "source": [
450 "# Correlation matrix heatmap\n",
451 "numerical_cols = corr_df.select_dtypes(include=[np.number])\n",
452 "correlation_matrix = numerical_cols.corr()\n",
453 "\n",
454 "plt.figure(figsize=(10, 8))\n",
455 "sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', \n",
456 " square=True, linewidths=0.5, cbar_kws={'shrink': 0.8})\n",
457 "plt.title('Correlation Matrix Heatmap', fontsize=16, fontweight='bold', pad=20)\n",
458 "plt.tight_layout()\n",
459 "plt.show()"
460 ]
461 },
462 {
463 "cell_type": "code",
464 "execution_count": null,
465 "metadata": {},
466 "outputs": [],
467 "source": [
468 "# Create a pivot table for heatmap\n",
469 "pivot_data = corr_df.pivot_table(values='sales', index='season', \n",
470 " columns=pd.cut(corr_df['price'], bins=3, labels=['Low', 'Medium', 'High']),\n",
471 " aggfunc='mean')\n",
472 "\n",
473 "plt.figure(figsize=(8, 6))\n",
474 "sns.heatmap(pivot_data, annot=True, fmt='.1f', cmap='YlGnBu', linewidths=0.5)\n",
475 "plt.title('Average Sales by Season and Price Range', fontsize=16, fontweight='bold', pad=20)\n",
476 "plt.xlabel('Price Range', fontsize=12)\n",
477 "plt.ylabel('Season', fontsize=12)\n",
478 "plt.tight_layout()\n",
479 "plt.show()"
480 ]
481 },
482 {
483 "cell_type": "markdown",
484 "metadata": {},
485 "source": [
486 "## 8. Pie Charts\n",
487 "\n",
488 "Pie charts show proportions and percentages."
489 ]
490 },
491 {
492 "cell_type": "code",
493 "execution_count": null,
494 "metadata": {},
495 "outputs": [],
496 "source": [
497 "# Basic pie chart\n",
498 "plt.figure(figsize=(10, 8))\n",
499 "colors = plt.cm.Set3(range(len(categories)))\n",
500 "explode = (0.05, 0, 0, 0, 0) # Explode the first slice\n",
501 "\n",
502 "plt.pie(category_df['sales'], labels=category_df['category'], autopct='%1.1f%%',\n",
503 " startangle=90, colors=colors, explode=explode, shadow=True)\n",
504 "plt.title('Sales Distribution by Category', fontsize=16, fontweight='bold', pad=20)\n",
505 "plt.axis('equal')\n",
506 "plt.tight_layout()\n",
507 "plt.show()"
508 ]
509 },
510 {
511 "cell_type": "code",
512 "execution_count": null,
513 "metadata": {},
514 "outputs": [],
515 "source": [
516 "# Donut chart\n",
517 "plt.figure(figsize=(10, 8))\n",
518 "colors = plt.cm.Pastel1(range(len(categories)))\n",
519 "\n",
520 "wedges, texts, autotexts = plt.pie(category_df['sales'], labels=category_df['category'], \n",
521 " autopct='%1.1f%%', startangle=90, colors=colors)\n",
522 "\n",
523 "# Draw a white circle at the center to create donut\n",
524 "centre_circle = plt.Circle((0, 0), 0.70, fc='white')\n",
525 "fig = plt.gcf()\n",
526 "fig.gca().add_artist(centre_circle)\n",
527 "\n",
528 "plt.title('Sales Distribution (Donut Chart)', fontsize=16, fontweight='bold', pad=20)\n",
529 "plt.axis('equal')\n",
530 "plt.tight_layout()\n",
531 "plt.show()"
532 ]
533 },
534 {
535 "cell_type": "markdown",
536 "metadata": {},
537 "source": [
538 "## 9. Seaborn Advanced Plots\n",
539 "\n",
540 "Seaborn provides high-level statistical visualizations."
541 ]
542 },
543 {
544 "cell_type": "code",
545 "execution_count": null,
546 "metadata": {},
547 "outputs": [],
548 "source": [
549 "# Pairplot - shows relationships between all numerical variables\n",
550 "sns.pairplot(corr_df[['price', 'advertising', 'competitor_price', 'sales']], \n",
551 " diag_kind='kde', plot_kws={'alpha': 0.6})\n",
552 "plt.suptitle('Pairplot of Numerical Variables', y=1.02, fontsize=16, fontweight='bold')\n",
553 "plt.tight_layout()\n",
554 "plt.show()"
555 ]
556 },
557 {
558 "cell_type": "code",
559 "execution_count": null,
560 "metadata": {},
561 "outputs": [],
562 "source": [
563 "# Jointplot - scatter plot with marginal distributions\n",
564 "sns.jointplot(data=scatter_df, x='advertising', y='revenue', \n",
565 " kind='reg', height=8, color='purple', joint_kws={'alpha': 0.5})\n",
566 "plt.suptitle('Joint Plot: Advertising vs Revenue', y=1.02, fontsize=16, fontweight='bold')\n",
567 "plt.tight_layout()\n",
568 "plt.show()"
569 ]
570 },
571 {
572 "cell_type": "markdown",
573 "metadata": {},
574 "source": [
575 "## 10. Subplots and Figure Composition\n",
576 "\n",
577 "Creating multiple plots in a single figure for comprehensive analysis."
578 ]
579 },
580 {
581 "cell_type": "code",
582 "execution_count": null,
583 "metadata": {},
584 "outputs": [],
585 "source": [
586 "# 2x2 subplot grid\n",
587 "fig, axes = plt.subplots(2, 2, figsize=(14, 10))\n",
588 "\n",
589 "# Plot 1: Line plot\n",
590 "axes[0, 0].plot(ts_df['date'][:90], ts_df['sales'][:90], color='steelblue', linewidth=2)\n",
591 "axes[0, 0].set_title('Daily Sales (Q1)', fontsize=12, fontweight='bold')\n",
592 "axes[0, 0].set_xlabel('Date', fontsize=10)\n",
593 "axes[0, 0].set_ylabel('Sales ($)', fontsize=10)\n",
594 "axes[0, 0].grid(True, alpha=0.3)\n",
595 "axes[0, 0].tick_params(axis='x', rotation=45)\n",
596 "\n",
597 "# Plot 2: Bar chart\n",
598 "axes[0, 1].bar(category_df['category'], category_df['sales'], color='coral')\n",
599 "axes[0, 1].set_title('Sales by Category', fontsize=12, fontweight='bold')\n",
600 "axes[0, 1].set_xlabel('Category', fontsize=10)\n",
601 "axes[0, 1].set_ylabel('Total Sales ($)', fontsize=10)\n",
602 "axes[0, 1].tick_params(axis='x', rotation=45)\n",
603 "axes[0, 1].grid(axis='y', alpha=0.3)\n",
604 "\n",
605 "# Plot 3: Scatter plot\n",
606 "axes[1, 0].scatter(scatter_df['advertising'], scatter_df['revenue'], alpha=0.6, color='green')\n",
607 "axes[1, 0].set_title('Advertising vs Revenue', fontsize=12, fontweight='bold')\n",
608 "axes[1, 0].set_xlabel('Advertising ($)', fontsize=10)\n",
609 "axes[1, 0].set_ylabel('Revenue ($)', fontsize=10)\n",
610 "axes[1, 0].grid(True, alpha=0.3)\n",
611 "\n",
612 "# Plot 4: Histogram\n",
613 "axes[1, 1].hist(corr_df['sales'], bins=25, color='purple', edgecolor='black', alpha=0.7)\n",
614 "axes[1, 1].set_title('Sales Distribution', fontsize=12, fontweight='bold')\n",
615 "axes[1, 1].set_xlabel('Sales Value', fontsize=10)\n",
616 "axes[1, 1].set_ylabel('Frequency', fontsize=10)\n",
617 "axes[1, 1].grid(axis='y', alpha=0.3)\n",
618 "\n",
619 "plt.suptitle('Comprehensive Sales Analysis Dashboard', fontsize=18, fontweight='bold', y=1.00)\n",
620 "plt.tight_layout()\n",
621 "plt.show()"
622 ]
623 },
624 {
625 "cell_type": "code",
626 "execution_count": null,
627 "metadata": {},
628 "outputs": [],
629 "source": [
630 "# Complex layout with different sizes\n",
631 "fig = plt.figure(figsize=(14, 10))\n",
632 "gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)\n",
633 "\n",
634 "# Large plot spanning 2x2\n",
635 "ax1 = fig.add_subplot(gs[0:2, 0:2])\n",
636 "ax1.plot(ts_df['date'], ts_df['sales'], linewidth=2, color='steelblue')\n",
637 "ax1.fill_between(ts_df['date'], ts_df['sales'], alpha=0.3)\n",
638 "ax1.set_title('Full Year Sales Trend', fontsize=14, fontweight='bold')\n",
639 "ax1.set_xlabel('Date', fontsize=11)\n",
640 "ax1.set_ylabel('Sales ($)', fontsize=11)\n",
641 "ax1.grid(True, alpha=0.3)\n",
642 "ax1.tick_params(axis='x', rotation=45)\n",
643 "\n",
644 "# Top right plot\n",
645 "ax2 = fig.add_subplot(gs[0, 2])\n",
646 "ax2.pie(category_df['sales'][:3], labels=category_df['category'][:3], autopct='%1.1f%%')\n",
647 "ax2.set_title('Top 3 Categories', fontsize=12, fontweight='bold')\n",
648 "\n",
649 "# Middle right plot\n",
650 "ax3 = fig.add_subplot(gs[1, 2])\n",
651 "ax3.bar(range(len(categories)), category_df['transactions'], color='orange')\n",
652 "ax3.set_title('Transactions', fontsize=12, fontweight='bold')\n",
653 "ax3.set_xticks(range(len(categories)))\n",
654 "ax3.set_xticklabels(categories, rotation=45, ha='right', fontsize=8)\n",
655 "ax3.grid(axis='y', alpha=0.3)\n",
656 "\n",
657 "# Bottom row - 3 small plots\n",
658 "ax4 = fig.add_subplot(gs[2, 0])\n",
659 "ax4.hist(corr_df['price'], bins=20, color='green', alpha=0.7)\n",
660 "ax4.set_title('Price Distribution', fontsize=10, fontweight='bold')\n",
661 "ax4.set_xlabel('Price', fontsize=9)\n",
662 "\n",
663 "ax5 = fig.add_subplot(gs[2, 1])\n",
664 "ax5.hist(corr_df['advertising'], bins=20, color='red', alpha=0.7)\n",
665 "ax5.set_title('Advertising Distribution', fontsize=10, fontweight='bold')\n",
666 "ax5.set_xlabel('Advertising', fontsize=9)\n",
667 "\n",
668 "ax6 = fig.add_subplot(gs[2, 2])\n",
669 "sns.boxplot(data=corr_df, y='sales', ax=ax6, color='skyblue')\n",
670 "ax6.set_title('Sales Box Plot', fontsize=10, fontweight='bold')\n",
671 "\n",
672 "plt.suptitle('Advanced Multi-Panel Dashboard', fontsize=18, fontweight='bold')\n",
673 "plt.show()"
674 ]
675 },
676 {
677 "cell_type": "markdown",
678 "metadata": {},
679 "source": [
680 "## Summary\n",
681 "\n",
682 "In this notebook, we covered comprehensive data visualization techniques:\n",
683 "\n",
684 "### Matplotlib:\n",
685 "1. **Line Plots**: Trends over time, multiple series, moving averages\n",
686 "2. **Bar Charts**: Vertical, horizontal, grouped bars with labels\n",
687 "3. **Scatter Plots**: Relationships, correlations, regression lines\n",
688 "4. **Histograms**: Distributions, frequency analysis\n",
689 "5. **Pie Charts**: Proportions, donut charts\n",
690 "6. **Subplots**: Complex multi-panel layouts\n",
691 "\n",
692 "### Seaborn:\n",
693 "1. **Statistical Plots**: histplot with KDE, box plots, violin plots\n",
694 "2. **Heatmaps**: Correlation matrices, pivot tables\n",
695 "3. **Advanced Plots**: pairplot, jointplot\n",
696 "\n",
697 "### Customization:\n",
698 "- Titles, labels, legends\n",
699 "- Colors, styles, transparency\n",
700 "- Grid lines, annotations\n",
701 "- Figure size and layout\n",
702 "\n",
703 "These visualization techniques are essential for exploratory data analysis and communicating insights effectively."
704 ]
705 }
706 ],
707 "metadata": {
708 "kernelspec": {
709 "display_name": "Python 3",
710 "language": "python",
711 "name": "python3"
712 },
713 "language_info": {
714 "codemirror_mode": {
715 "name": "ipython",
716 "version": 3
717 },
718 "file_extension": ".py",
719 "mimetype": "text/x-python",
720 "name": "python",
721 "nbconvert_exporter": "python",
722 "pygments_lexer": "ipython3",
723 "version": "3.8.0"
724 }
725 },
726 "nbformat": 4,
727 "nbformat_minor": 4
728}