# Lecture 21: Advanced Data Visualization with Seaborn

## Learning Objectives

By the end of this lecture, you will be able to:
- Create multi-panel figures using subplots for dashboard layouts
- Understand Seaborn's high-level interface for statistical visualization
- Build distribution plots including box plots, violin plots, and KDE plots
- Create categorical plots for comparing groups
- Generate relational plots to explore variable relationships
- Produce matrix visualizations including heatmaps and cluster maps
- Use pair plots and joint plots for multi-variable exploration
- Apply themes and color palettes for professional styling
- Create complete analytical workflows combining multiple visualization types

**Prerequisites**: Lecture 20 (Matplotlib Fundamentals)

## Setup and Imports

Advanced visualization requires a combination of libraries working together. Matplotlib provides the foundation that Seaborn builds upon. Seaborn adds statistical plotting capabilities with attractive default aesthetics. We also need pandas for data manipulation and NumPy for numerical operations.

In [None]:
# Import visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Import data manipulation libraries
import pandas as pd
import numpy as np

# Set default style
sns.set_style('whitegrid')
pd.set_option('display.precision', 2)

print("Libraries imported successfully!")
print(f"Seaborn version: {sns.__version__}")

## Part 1: Subplots and Multi-Panel Figures

### Creating Subplot Grids

Professional visualizations often require multiple plots arranged in a grid layout. The plt.subplots() function creates a figure with a specified number of rows and columns. Each cell in the grid is an independent axes object where you can draw different types of plots. This approach is essential for creating dashboards and comprehensive data reports.

In [None]:
# Create a 2x2 subplot grid
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Sample data
x = np.linspace(0, 10, 100)

# Top-left: Sine wave
axes[0, 0].plot(x, np.sin(x), 'b-', linewidth=2)
axes[0, 0].set_title('Sine Wave')
axes[0, 0].set_xlabel('X')
axes[0, 0].set_ylabel('sin(x)')

# Top-right: Cosine wave
axes[0, 1].plot(x, np.cos(x), 'r-', linewidth=2)
axes[0, 1].set_title('Cosine Wave')
axes[0, 1].set_xlabel('X')
axes[0, 1].set_ylabel('cos(x)')

# Bottom-left: Exponential
axes[1, 0].plot(x, np.exp(x/3), 'g-', linewidth=2)
axes[1, 0].set_title('Exponential Growth')
axes[1, 0].set_xlabel('X')
axes[1, 0].set_ylabel('exp(x/3)')

# Bottom-right: Logarithm
axes[1, 1].plot(x[1:], np.log(x[1:]), 'm-', linewidth=2)
axes[1, 1].set_title('Natural Logarithm')
axes[1, 1].set_xlabel('X')
axes[1, 1].set_ylabel('ln(x)')

plt.tight_layout()
plt.show()

### Uneven Subplot Layouts

Sometimes you need different sized panels in your figure. The GridSpec module provides fine control over subplot arrangement. You can span multiple rows or columns to create complex layouts that highlight your most important visualizations.

In [None]:
from matplotlib.gridspec import GridSpec

# Create figure with custom grid
fig = plt.figure(figsize=(12, 8))
gs = GridSpec(2, 3, figure=fig)

# Large plot spanning two columns
ax1 = fig.add_subplot(gs[0, :2])
ax1.plot(x, np.sin(x), 'b-', linewidth=2)
ax1.set_title('Main Plot (spans 2 columns)')

# Small plot in top-right
ax2 = fig.add_subplot(gs[0, 2])
ax2.bar(['A', 'B', 'C'], [3, 7, 5])
ax2.set_title('Side Panel')

# Three small plots on bottom
ax3 = fig.add_subplot(gs[1, 0])
ax3.scatter(np.random.rand(20), np.random.rand(20))
ax3.set_title('Panel 1')

ax4 = fig.add_subplot(gs[1, 1])
ax4.hist(np.random.randn(100), bins=15)
ax4.set_title('Panel 2')

ax5 = fig.add_subplot(gs[1, 2])
ax5.pie([30, 40, 30], labels=['X', 'Y', 'Z'])
ax5.set_title('Panel 3')

plt.tight_layout()
plt.show()

### Exercise 1: Multi-Panel Dashboard

Create a 2x3 subplot dashboard:
1. Row 1: Three line plots showing sin(x), cos(x), and tan(x) (limit tan to -10, 10)
2. Row 2: Bar chart, scatter plot, and histogram
3. Use different colors for each plot
4. Add titles to all subplots

In [None]:
# Your code here


In [None]:
# Solution
fig, axes = plt.subplots(2, 3, figsize=(15, 8))

x = np.linspace(0, 2*np.pi, 100)

# Row 1: Trigonometric functions
axes[0, 0].plot(x, np.sin(x), 'b-', linewidth=2)
axes[0, 0].set_title('Sine')
axes[0, 0].set_xlabel('x')

axes[0, 1].plot(x, np.cos(x), 'r-', linewidth=2)
axes[0, 1].set_title('Cosine')
axes[0, 1].set_xlabel('x')

tan_y = np.tan(x)
tan_y = np.clip(tan_y, -10, 10)  # Limit range
axes[0, 2].plot(x, tan_y, 'g-', linewidth=2)
axes[0, 2].set_title('Tangent')
axes[0, 2].set_xlabel('x')

# Row 2: Various plots
axes[1, 0].bar(['A', 'B', 'C', 'D'], [25, 40, 30, 55], color='teal')
axes[1, 0].set_title('Bar Chart')

np.random.seed(42)
axes[1, 1].scatter(np.random.rand(30), np.random.rand(30), 
                   c='purple', alpha=0.6)
axes[1, 1].set_title('Scatter Plot')

axes[1, 2].hist(np.random.randn(200), bins=20, 
                color='orange', edgecolor='black')
axes[1, 2].set_title('Histogram')

plt.tight_layout()
plt.show()

## Part 2: Introduction to Seaborn

### Why Seaborn?

Seaborn provides a high-level interface for drawing attractive statistical graphics. While matplotlib gives you complete control, seaborn makes common statistical visualizations simple. It works seamlessly with pandas DataFrames, allowing you to specify column names directly. Seaborn also provides attractive default themes and color palettes.

In [None]:
# Create sample employee dataset
np.random.seed(42)
n = 150

df = pd.DataFrame({
    'Department': np.random.choice(['Sales', 'Engineering', 'Marketing', 'HR'], n),
    'Experience': np.random.randint(1, 20, n),
    'Salary': np.random.normal(70000, 15000, n),
    'Performance': np.random.uniform(60, 100, n),
    'Age': np.random.randint(22, 60, n)
})

# Adjust salary based on experience
df['Salary'] = df['Salary'] + df['Experience'] * 2000

print("Employee Dataset:")
print(df.head(10))
print(f"\nShape: {df.shape}")

### Seaborn Themes and Styles

Seaborn provides several built-in themes that instantly improve your plots. The set_style() function changes the background and grid appearance, while set_palette() changes the colors used in plots. These simple commands can make a dramatic difference in how professional your visualizations look.

In [None]:
# Compare different styles
styles = ['whitegrid', 'darkgrid', 'white', 'dark']

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

for ax, style in zip(axes.flat, styles):
    with sns.axes_style(style):
        ax.plot([1, 2, 3, 4], [1, 4, 2, 3], marker='o')
        ax.set_title(f"Style: '{style}'")

plt.tight_layout()
plt.show()

### Color Palettes

Color choices significantly impact visualization effectiveness. Seaborn provides several palette types: qualitative (for categorical data), sequential (for ordered data), and diverging (for data with a meaningful center point). Choosing the right palette helps viewers interpret your data correctly.

In [None]:
# Display color palettes
palettes = ['deep', 'muted', 'bright', 'pastel', 'dark', 'colorblind']

fig, axes = plt.subplots(3, 2, figsize=(12, 8))

for ax, palette in zip(axes.flat, palettes):
    colors = sns.color_palette(palette)
    ax.bar(range(len(colors)), [1]*len(colors), color=colors)
    ax.set_title(f"Palette: '{palette}'")
    ax.set_ylim(0, 1.2)
    ax.set_xticks([])
    ax.set_yticks([])

plt.tight_layout()
plt.show()

## Part 3: Distribution Visualizations

### Box Plots

Box plots show the distribution of data through five key statistics: minimum, first quartile (25th percentile), median (50th percentile), third quartile (75th percentile), and maximum. Points beyond the whiskers are considered outliers. Box plots are excellent for comparing distributions across categories.

In [None]:
plt.figure(figsize=(10, 6))

sns.boxplot(data=df, x='Department', y='Salary', palette='Set2')

plt.title('Salary Distribution by Department')
plt.xlabel('Department')
plt.ylabel('Salary ($)')

plt.show()

### Violin Plots

Violin plots combine box plots with kernel density estimation (KDE). The width of the violin at each point shows the density of data at that value. This provides more information than a box plot alone, revealing the full shape of the distribution including multiple modes.

In [None]:
plt.figure(figsize=(10, 6))

sns.violinplot(data=df, x='Department', y='Salary', palette='muted')

plt.title('Salary Distribution by Department (Violin)')
plt.xlabel('Department')
plt.ylabel('Salary ($)')

plt.show()

### KDE Plots (Kernel Density Estimation)

KDE plots show smooth continuous distributions by placing a kernel (small density curve) at each data point and summing them. This provides a smooth alternative to histograms. KDE plots are useful when you want to compare distribution shapes without the binning artifacts of histograms.

In [None]:
plt.figure(figsize=(10, 6))

# Plot KDE for each department
for dept in df['Department'].unique():
    dept_data = df[df['Department'] == dept]['Salary']
    sns.kdeplot(data=dept_data, label=dept, linewidth=2)

plt.title('Salary Distribution by Department (KDE)')
plt.xlabel('Salary ($)')
plt.ylabel('Density')
plt.legend()

plt.show()

### Histogram with KDE Overlay

The histplot function can display both histogram bars and KDE curve together. This combination shows the actual data distribution (histogram) and its smooth approximation (KDE). The stat='density' parameter normalizes the histogram to match the KDE scale.

In [None]:
plt.figure(figsize=(10, 6))

sns.histplot(data=df, x='Salary', kde=True, bins=30, 
             color='steelblue', alpha=0.7)

plt.title('Overall Salary Distribution')
plt.xlabel('Salary ($)')
plt.ylabel('Count')

plt.show()

### Exercise 2: Distribution Comparison

Create three plots in a 1x3 layout comparing Performance scores by Department:
1. Box plot
2. Violin plot
3. KDE plot with all departments overlaid
4. Use the 'Set2' palette for consistency

In [None]:
# Your code here


In [None]:
# Solution
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Box plot
sns.boxplot(data=df, x='Department', y='Performance', 
            palette='Set2', ax=axes[0])
axes[0].set_title('Box Plot')
axes[0].tick_params(axis='x', rotation=45)

# Violin plot
sns.violinplot(data=df, x='Department', y='Performance', 
               palette='Set2', ax=axes[1])
axes[1].set_title('Violin Plot')
axes[1].tick_params(axis='x', rotation=45)

# KDE plot
colors = sns.color_palette('Set2')
for i, dept in enumerate(df['Department'].unique()):
    dept_data = df[df['Department'] == dept]['Performance']
    sns.kdeplot(data=dept_data, label=dept, 
                color=colors[i], ax=axes[2], linewidth=2)
axes[2].set_title('KDE Plot')
axes[2].legend()

plt.tight_layout()
plt.show()

## Part 4: Categorical Plots

### Count Plots

Count plots show the number of observations in each categorical bin. They are essentially histograms for categorical data. Use count plots to understand the distribution of your categorical variables and identify class imbalance.

In [None]:
plt.figure(figsize=(10, 6))

sns.countplot(data=df, x='Department', palette='viridis')

plt.title('Employee Count by Department')
plt.xlabel('Department')
plt.ylabel('Number of Employees')

plt.show()

### Bar Plots with Error Bars

Seaborn's barplot function calculates the mean of numeric variables for each category and displays confidence intervals. By default, it shows 95% confidence intervals. This is useful for comparing average values across groups while showing the uncertainty in those estimates.

In [None]:
plt.figure(figsize=(10, 6))

sns.barplot(data=df, x='Department', y='Salary', 
            palette='coolwarm', errorbar='sd')  # sd = standard deviation

plt.title('Average Salary by Department (with Std Dev)')
plt.xlabel('Department')
plt.ylabel('Average Salary ($)')

plt.show()

### Point Plots

Point plots show point estimates and confidence intervals using dots and lines. They are particularly useful when comparing values across multiple categorical variables. The connecting lines help highlight patterns and differences between groups.

In [None]:
# Create experience groups
df['Exp_Group'] = pd.cut(df['Experience'], 
                         bins=[0, 5, 10, 15, 20], 
                         labels=['0-5', '6-10', '11-15', '16-20'])

plt.figure(figsize=(10, 6))

sns.pointplot(data=df, x='Exp_Group', y='Salary', 
              hue='Department', palette='Set1')

plt.title('Salary by Experience and Department')
plt.xlabel('Experience (Years)')
plt.ylabel('Average Salary ($)')
plt.legend(title='Department', bbox_to_anchor=(1.05, 1))

plt.tight_layout()
plt.show()

### Strip and Swarm Plots

Strip plots show individual data points for each category. Swarm plots are similar but adjust points to avoid overlap. These plots are useful when you want to see the actual data distribution, not just summary statistics. They work best with smaller datasets.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Strip plot
sns.stripplot(data=df, x='Department', y='Performance', 
              palette='Set2', alpha=0.6, ax=axes[0])
axes[0].set_title('Strip Plot - Individual Points')

# Swarm plot
sns.swarmplot(data=df.sample(80), x='Department', y='Performance', 
              palette='Set2', ax=axes[1])
axes[1].set_title('Swarm Plot - Avoiding Overlap')

plt.tight_layout()
plt.show()

### Exercise 3: Categorical Analysis

Create a 2x2 plot showing:
1. Count plot of Department
2. Bar plot of mean Age by Department
3. Point plot of Performance by Experience Group
4. Box plot of Salary by Experience Group

Use appropriate color palettes for each plot.

In [None]:
# Your code here


In [None]:
# Solution
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Count plot
sns.countplot(data=df, x='Department', palette='viridis', ax=axes[0, 0])
axes[0, 0].set_title('Employee Count by Department')

# Bar plot of Age
sns.barplot(data=df, x='Department', y='Age', 
            palette='coolwarm', ax=axes[0, 1])
axes[0, 1].set_title('Average Age by Department')

# Point plot
sns.pointplot(data=df, x='Exp_Group', y='Performance', 
              palette='Set1', ax=axes[1, 0])
axes[1, 0].set_title('Performance by Experience')

# Box plot
sns.boxplot(data=df, x='Exp_Group', y='Salary', 
            palette='Set2', ax=axes[1, 1])
axes[1, 1].set_title('Salary by Experience Group')

plt.tight_layout()
plt.show()

## Part 5: Relational Plots

### Scatter Plots with Hue

Seaborn's scatterplot function creates scatter plots that integrate beautifully with DataFrames. The hue parameter colors points by a categorical variable, automatically creating a legend. This makes it easy to see how relationships differ across groups.

In [None]:
plt.figure(figsize=(10, 6))

sns.scatterplot(data=df, x='Experience', y='Salary', 
                hue='Department', palette='deep', s=100, alpha=0.7)

plt.title('Salary vs Experience by Department')
plt.xlabel('Experience (Years)')
plt.ylabel('Salary ($)')
plt.legend(title='Department')

plt.show()

### Regression Plots

The regplot function combines scatter plots with linear regression. It automatically fits and displays a regression line with confidence interval. This is useful for visualizing and quantifying linear relationships between variables.

In [None]:
plt.figure(figsize=(10, 6))

sns.regplot(data=df, x='Experience', y='Salary',
            scatter_kws={'alpha': 0.5},
            line_kws={'color': 'red', 'linewidth': 2})

plt.title('Salary vs Experience with Regression Line')
plt.xlabel('Experience (Years)')
plt.ylabel('Salary ($)')

plt.show()

### Regression by Category

The lmplot function fits separate regression lines for each category. This reveals whether the relationship between variables differs across groups. Different slopes indicate that the effect of one variable on another varies by category.

In [None]:
# Regression by department
g = sns.lmplot(data=df, x='Experience', y='Salary', 
               hue='Department', height=6, aspect=1.5,
               scatter_kws={'alpha': 0.5})

g.fig.suptitle('Salary vs Experience by Department', y=1.02)

plt.show()

### Relational Plot with Size and Style

Scatter plots can encode multiple variables using color (hue), size, and marker style. This allows you to visualize up to five dimensions: x, y, color, size, and style. Use these encodings judiciously to avoid overwhelming the viewer.

In [None]:
plt.figure(figsize=(12, 8))

sns.scatterplot(data=df, x='Age', y='Salary',
                hue='Department',
                size='Performance',
                sizes=(50, 300),
                alpha=0.7)

plt.title('Salary by Age, Department, and Performance')
plt.xlabel('Age')
plt.ylabel('Salary ($)')
plt.legend(bbox_to_anchor=(1.15, 1))

plt.tight_layout()
plt.show()

### Exercise 4: Multi-Dimensional Scatter

Create a scatter plot of Age vs Performance where:
1. Color (hue) represents Department
2. Size represents Salary (scaled appropriately)
3. Add a regression line for the overall relationship
4. Create a separate plot showing regression lines by Department

In [None]:
# Your code here


In [None]:
# Solution
# Multi-dimensional scatter
plt.figure(figsize=(12, 8))

sns.scatterplot(data=df, x='Age', y='Performance',
                hue='Department',
                size='Salary',
                sizes=(50, 300),
                alpha=0.6)

# Add overall regression line
z = np.polyfit(df['Age'], df['Performance'], 1)
p = np.poly1d(z)
x_line = np.linspace(df['Age'].min(), df['Age'].max(), 100)
plt.plot(x_line, p(x_line), 'k--', linewidth=2, label='Overall Trend')

plt.title('Performance by Age, Department, and Salary')
plt.xlabel('Age')
plt.ylabel('Performance Score')
plt.legend(bbox_to_anchor=(1.15, 1))
plt.tight_layout()
plt.show()

# Regression by department
g = sns.lmplot(data=df, x='Age', y='Performance', 
               hue='Department', height=6, aspect=1.5)
g.fig.suptitle('Performance vs Age by Department', y=1.02)
plt.show()

## Part 6: Matrix Visualizations

### Correlation Heatmaps

Heatmaps visualize matrices using color. For correlation matrices, this immediately shows which variables are related. Positive correlations appear in one color, negative in another. The intensity shows the strength of the relationship.

In [None]:
# Calculate correlation matrix
numeric_cols = df.select_dtypes(include=[np.number])
correlation = numeric_cols.corr()

print("Correlation Matrix:")
print(correlation.round(2))

In [None]:
plt.figure(figsize=(10, 8))

sns.heatmap(correlation, 
            annot=True,           # Show values
            cmap='coolwarm',      # Color scheme
            center=0,             # Center color at 0
            fmt='.2f',            # Number format
            square=True,          # Square cells
            linewidths=0.5)       # Cell borders

plt.title('Correlation Heatmap of Employee Metrics')
plt.show()

### Custom Heatmaps

Heatmaps can visualize any rectangular data, not just correlations. You can use them for pivot tables, confusion matrices, or any grid of values. Customizing colors, annotations, and labels helps communicate the data effectively.

In [None]:
# Create pivot table for heatmap
pivot = df.pivot_table(values='Salary', 
                       index='Department', 
                       columns='Exp_Group', 
                       aggfunc='mean')

plt.figure(figsize=(10, 6))

sns.heatmap(pivot, 
            annot=True, 
            fmt=',.0f',
            cmap='YlOrRd',
            linewidths=0.5)

plt.title('Average Salary by Department and Experience')
plt.xlabel('Experience Group')
plt.ylabel('Department')

plt.show()

### Cluster Maps

Cluster maps add hierarchical clustering to heatmaps. The rows and columns are reordered to place similar items together, and dendrograms show the clustering structure. This is useful for identifying patterns and groups in complex datasets.

In [None]:
# Create clustermap of correlation matrix
g = sns.clustermap(correlation, 
                   annot=True, 
                   cmap='coolwarm',
                   center=0,
                   fmt='.2f',
                   figsize=(10, 8))

g.fig.suptitle('Clustered Correlation Matrix', y=1.02)
plt.show()

### Exercise 5: Matrix Analysis

1. Create a pivot table showing average Performance by Department and Experience Group
2. Visualize it as a heatmap with annotations
3. Create a correlation heatmap using only numerical columns
4. Use 'RdYlGn' colormap for the pivot heatmap

In [None]:
# Your code here


In [None]:
# Solution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Performance pivot heatmap
perf_pivot = df.pivot_table(values='Performance',
                             index='Department',
                             columns='Exp_Group',
                             aggfunc='mean')

sns.heatmap(perf_pivot, annot=True, fmt='.1f',
            cmap='RdYlGn', ax=axes[0], linewidths=0.5)
axes[0].set_title('Avg Performance by Dept and Experience')

# Correlation heatmap
numeric_data = df[['Experience', 'Salary', 'Performance', 'Age']]
corr = numeric_data.corr()

sns.heatmap(corr, annot=True, fmt='.2f',
            cmap='coolwarm', center=0, ax=axes[1], square=True)
axes[1].set_title('Correlation Matrix')

plt.tight_layout()
plt.show()

## Part 7: Pair Plots and Joint Plots

### Pair Plots

Pair plots show scatter plots for every pair of numeric variables in a dataset. The diagonal shows the distribution of each variable. This is invaluable for exploratory data analysis, quickly revealing which variables are related and identifying clusters or outliers.

In [None]:
# Create pair plot colored by Department
g = sns.pairplot(df[['Experience', 'Salary', 'Performance', 'Age', 'Department']], 
                 hue='Department',
                 palette='Set2',
                 diag_kind='kde')

g.fig.suptitle('Pair Plot of Employee Metrics', y=1.02)
plt.show()

### Joint Plots

Joint plots show the relationship between two variables with their individual distributions. The main panel shows the scatter plot (or other bivariate plot), while the margins show histograms or KDE plots of each variable. This provides a complete picture of two variables and their relationship.

In [None]:
# Basic joint plot
g = sns.jointplot(data=df, x='Experience', y='Salary',
                  kind='scatter', height=8)

g.fig.suptitle('Experience vs Salary Distribution', y=1.02)
plt.show()

### Joint Plot Variations

Joint plots support several kinds of bivariate displays: scatter, regression, hexbin (for large datasets), kde (density estimation), and resid (residuals from regression). Each reveals different aspects of the relationship.

In [None]:
# Joint plot with regression
g = sns.jointplot(data=df, x='Experience', y='Salary',
                  kind='reg', height=8)

g.fig.suptitle('Experience vs Salary with Regression', y=1.02)
plt.show()

In [None]:
# Joint plot with KDE
g = sns.jointplot(data=df, x='Age', y='Performance',
                  kind='kde', height=8, fill=True)

g.fig.suptitle('Age vs Performance Density', y=1.02)
plt.show()

### Exercise 6: Comprehensive Pair Analysis

1. Create a pair plot with Experience, Salary, and Performance
2. Use 'hist' for diagonal plots
3. Color by Department
4. Create a joint plot of Age vs Salary with kind='hex'

In [None]:
# Your code here


In [None]:
# Solution
# Pair plot
g = sns.pairplot(df[['Experience', 'Salary', 'Performance', 'Department']],
                 hue='Department',
                 diag_kind='hist',
                 palette='Set1')
g.fig.suptitle('Pair Plot Analysis', y=1.02)
plt.show()

# Hexbin joint plot
g = sns.jointplot(data=df, x='Age', y='Salary',
                  kind='hex', height=8)
g.fig.suptitle('Age vs Salary (Hexbin)', y=1.02)
plt.show()

## Part 8: Advanced Styling and Customization

### Context Settings

Seaborn's set_context function scales plot elements for different display contexts. The 'paper' context uses small elements suitable for publication figures, 'talk' uses larger elements for presentations, and 'poster' uses the largest elements for poster displays.

In [None]:
# Compare contexts
contexts = ['paper', 'notebook', 'talk', 'poster']

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

for ax, context in zip(axes.flat, contexts):
    with sns.plotting_context(context):
        ax.plot([1, 2, 3], [1, 2, 3], marker='o')
        ax.set_title(f"Context: {context}")
        ax.set_xlabel('X axis')
        ax.set_ylabel('Y axis')

plt.tight_layout()
plt.show()

### Custom Color Palettes

Beyond built-in palettes, you can create custom color schemes. Sequential palettes work well for continuous data, while diverging palettes suit data with a meaningful center point. You can also create palettes from specific colors.

In [None]:
# Create custom sequential palette
custom_palette = sns.light_palette('navy', n_colors=6)

plt.figure(figsize=(10, 6))
sns.barplot(data=df, x='Department', y='Salary', 
            palette=custom_palette, errorbar='sd')
plt.title('Salary by Department (Custom Navy Palette)')
plt.show()

# Diverging palette
div_palette = sns.diverging_palette(10, 240, n=7)

plt.figure(figsize=(10, 6))
data = np.random.randn(7)
colors = [div_palette[3] if v > 0 else div_palette[0] for v in data]
plt.bar(range(7), data, color=colors)
plt.title('Diverging Palette Example')
plt.show()

### Figure-Level vs Axes-Level Functions

Seaborn has two types of functions: figure-level (like pairplot, jointplot, lmplot) and axes-level (like scatterplot, boxplot). Figure-level functions create their own figure and return a FacetGrid. Axes-level functions draw on current axes and return an Axes object. Understanding this distinction helps with plot customization.

In [None]:
# Figure-level: catplot creates its own figure
g = sns.catplot(data=df, x='Department', y='Salary',
                kind='box', col='Exp_Group', col_wrap=2,
                height=4, aspect=1.2)

g.fig.suptitle('Salary Distribution: Department by Experience', y=1.02)
plt.show()

### Exercise 7: Custom Styled Dashboard

Create a 2x2 dashboard with custom styling:
1. Use 'darkgrid' style
2. Use a custom color palette (create from 'green')
3. Include: violin plot, regression plot, heatmap, and bar plot
4. Use 'talk' context for larger elements

In [None]:
# Your code here


In [None]:
# Solution
sns.set_style('darkgrid')
custom_pal = sns.light_palette('green', n_colors=5)

with sns.plotting_context('talk'):
    fig, axes = plt.subplots(2, 2, figsize=(14, 12))
    
    # Violin plot
    sns.violinplot(data=df, x='Department', y='Salary',
                   palette=custom_pal, ax=axes[0, 0])
    axes[0, 0].set_title('Salary Distribution')
    axes[0, 0].tick_params(axis='x', rotation=45)
    
    # Regression plot
    sns.regplot(data=df, x='Experience', y='Salary',
                ax=axes[0, 1], color='green')
    axes[0, 1].set_title('Salary vs Experience')
    
    # Heatmap
    corr = df[['Experience', 'Salary', 'Performance', 'Age']].corr()
    sns.heatmap(corr, annot=True, cmap='Greens',
                ax=axes[1, 0], fmt='.2f')
    axes[1, 0].set_title('Correlation Matrix')
    
    # Bar plot
    sns.barplot(data=df, x='Department', y='Performance',
                palette=custom_pal, ax=axes[1, 1])
    axes[1, 1].set_title('Avg Performance by Dept')
    axes[1, 1].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()

# Reset to default
sns.set_style('whitegrid')

## Part 9: Complete Visualization Workflow

### Best Practices Summary

Effective data visualization follows several principles:
- Choose the right chart type for your data and question
- Always include titles and labels
- Use color purposefully (distinguish categories, show values, highlight)
- Consider your audience (paper, presentation, poster)
- Tell a story - what should the viewer learn?
- Keep it simple - don't overload with information

### Comprehensive Analysis Dashboard

Let's create a complete analytical dashboard that tells a story about our employee data. This demonstrates how to combine multiple visualization types into a cohesive report.

In [None]:
# Create comprehensive dashboard
fig = plt.figure(figsize=(16, 12))
gs = GridSpec(3, 3, figure=fig)

# Title for entire figure
fig.suptitle('Employee Analytics Dashboard', fontsize=16, fontweight='bold')

# 1. Salary distribution overview (large, top-left)
ax1 = fig.add_subplot(gs[0, :2])
sns.histplot(data=df, x='Salary', kde=True, ax=ax1, color='steelblue')
ax1.axvline(df['Salary'].mean(), color='red', linestyle='--', 
            label=f"Mean: ${df['Salary'].mean():,.0f}")
ax1.set_title('Overall Salary Distribution')
ax1.legend()

# 2. Department composition (top-right)
ax2 = fig.add_subplot(gs[0, 2])
dept_counts = df['Department'].value_counts()
ax2.pie(dept_counts, labels=dept_counts.index, autopct='%1.1f%%',
        colors=sns.color_palette('Set2'))
ax2.set_title('Department Composition')

# 3. Salary by department (middle-left)
ax3 = fig.add_subplot(gs[1, 0])
sns.boxplot(data=df, x='Department', y='Salary', 
            palette='Set2', ax=ax3)
ax3.set_title('Salary by Department')
ax3.tick_params(axis='x', rotation=45)

# 4. Experience vs Salary relationship (middle-center)
ax4 = fig.add_subplot(gs[1, 1])
sns.regplot(data=df, x='Experience', y='Salary', ax=ax4,
            scatter_kws={'alpha': 0.5})
ax4.set_title('Salary vs Experience')

# 5. Performance distribution (middle-right)
ax5 = fig.add_subplot(gs[1, 2])
sns.violinplot(data=df, x='Department', y='Performance',
               palette='muted', ax=ax5)
ax5.set_title('Performance by Department')
ax5.tick_params(axis='x', rotation=45)

# 6. Correlation heatmap (bottom-left)
ax6 = fig.add_subplot(gs[2, 0])
corr = df[['Experience', 'Salary', 'Performance', 'Age']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0,
            ax=ax6, fmt='.2f', square=True)
ax6.set_title('Correlation Matrix')

# 7. Salary by experience group (bottom-center)
ax7 = fig.add_subplot(gs[2, 1])
sns.barplot(data=df, x='Exp_Group', y='Salary',
            palette='viridis', ax=ax7)
ax7.set_title('Avg Salary by Experience')

# 8. Age distribution (bottom-right)
ax8 = fig.add_subplot(gs[2, 2])
for dept in df['Department'].unique():
    dept_data = df[df['Department'] == dept]['Age']
    sns.kdeplot(data=dept_data, label=dept, ax=ax8)
ax8.set_title('Age Distribution by Dept')
ax8.legend(fontsize=8)

plt.tight_layout()
plt.show()

### Saving High-Quality Outputs

When your visualization is complete, save it in appropriate formats. PNG works well for web and presentations. PDF is ideal for publications. SVG is best for further editing. Always use high DPI (150-300) for print quality.

In [None]:
# Create a presentation-quality figure
sns.set_context('talk')

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Key insight 1: Salary growth with experience
sns.regplot(data=df, x='Experience', y='Salary',
            scatter_kws={'alpha': 0.5}, ax=axes[0])
axes[0].set_title('Salary Increases with Experience')
axes[0].set_xlabel('Years of Experience')
axes[0].set_ylabel('Salary ($)')

# Key insight 2: Department salary comparison
order = df.groupby('Department')['Salary'].mean().sort_values().index
sns.boxplot(data=df, x='Department', y='Salary',
            order=order, palette='RdYlGn', ax=axes[1])
axes[1].set_title('Salary Distribution by Department')
axes[1].tick_params(axis='x', rotation=45)

plt.tight_layout()

# Save in multiple formats
plt.savefig('employee_analysis.png', dpi=150, bbox_inches='tight')
print("Saved as employee_analysis.png")

plt.show()

# Reset context
sns.set_context('notebook')

### Exercise 8: Complete Analysis Report

Create a 3x2 dashboard that tells a story:
1. Row 1: Histogram of Salary with mean line, Count plot of Departments
2. Row 2: Box plot of Salary by Department, Scatter of Experience vs Salary
3. Row 3: Heatmap of correlations, Bar plot of Performance by Department
4. Add a main title and save as 'final_report.png' with dpi=150

In [None]:
# Your code here


In [None]:
# Solution
fig, axes = plt.subplots(3, 2, figsize=(14, 15))
fig.suptitle('Complete Employee Analysis Report', fontsize=16, fontweight='bold')

# Row 1
# Salary histogram
sns.histplot(data=df, x='Salary', kde=True, ax=axes[0, 0], color='steelblue')
axes[0, 0].axvline(df['Salary'].mean(), color='red', linestyle='--',
                   label=f"Mean: ${df['Salary'].mean():,.0f}")
axes[0, 0].set_title('Salary Distribution')
axes[0, 0].legend()

# Department count
sns.countplot(data=df, x='Department', palette='Set2', ax=axes[0, 1])
axes[0, 1].set_title('Employee Count by Department')

# Row 2
# Box plot
sns.boxplot(data=df, x='Department', y='Salary', 
            palette='viridis', ax=axes[1, 0])
axes[1, 0].set_title('Salary by Department')
axes[1, 0].tick_params(axis='x', rotation=45)

# Scatter plot
sns.scatterplot(data=df, x='Experience', y='Salary',
                hue='Department', ax=axes[1, 1], alpha=0.6)
axes[1, 1].set_title('Experience vs Salary')

# Row 3
# Heatmap
corr = df[['Experience', 'Salary', 'Performance', 'Age']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0,
            ax=axes[2, 0], fmt='.2f')
axes[2, 0].set_title('Correlation Matrix')

# Bar plot
sns.barplot(data=df, x='Department', y='Performance',
            palette='muted', ax=axes[2, 1])
axes[2, 1].set_title('Avg Performance by Department')
axes[2, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.savefig('final_report.png', dpi=150, bbox_inches='tight')
print("Report saved as final_report.png")
plt.show()

## Summary

In this lecture, you learned advanced data visualization techniques:

1. **Subplots and multi-panel figures** - Creating complex layouts with GridSpec
2. **Introduction to Seaborn** - High-level interface, themes, and palettes
3. **Distribution visualizations** - Box plots, violin plots, KDE plots
4. **Categorical plots** - Count plots, bar plots, point plots, strip plots
5. **Relational plots** - Scatter plots with hue, regression, multi-dimensional
6. **Matrix visualizations** - Heatmaps, cluster maps for correlations
7. **Pair plots and joint plots** - Multi-variable exploration
8. **Advanced styling** - Contexts, custom palettes, figure-level functions
9. **Complete workflows** - Dashboard creation and saving high-quality outputs

Key takeaways:
- Choose visualizations based on your data type and question
- Seaborn simplifies statistical visualization with pandas integration
- Use color, size, and style purposefully to encode information
- Always provide context with titles, labels, and legends
- Create dashboards that tell a coherent story about your data