Making Beautiful Plots with Seaborn in Python

Welcome to the sixth tutorial in our series on data analysis with Python! In this article, we’ll introduce you to Seaborn, a powerful Python visualization library built on top of Matplotlib. Seaborn provides a high-level interface for drawing attractive and informative statistical graphics. We’ll cover a variety of plot types, explain their uses and benefits, and discuss the types of analysis they are best suited for. We’ll also demonstrate how to configure and customize these plots using real-time business examples.

Importing Seaborn

Before we start, let’s import Seaborn:

!pip install matplotlib   # if not installed
!pip install seaborn # if not installed

import seaborn as sns
import matplotlib.pyplot as plt

Basic Configuration and Functions

Seaborn allows extensive customization of plots. Here are some common functions and configurations:

  • Figure Size: plt.figure(figsize=(width, height)) sets the size of the plot.
  • Title: plt.title('Title') adds a title to the plot.
  • Axis Labels: plt.xlabel('X-axis Label') and plt.ylabel('Y-axis Label') add labels to the axes.
  • Style: sns.set_style('style_name') sets the style of the plot (e.g., ‘whitegrid’, ‘darkgrid’, ‘white’, ‘dark’, ‘ticks’).
  • Palette: sns.set_palette('palette_name') sets the color palette for the plot.

Example 1: Scatter Plot with Regression Line

Use and Benefits

  • Use: Scatter plots with regression lines are used to visualize the relationship between two variables and fit a regression line to the data.
  • Benefits: They provide insights into correlations and trends.
  • Analysis Type: Correlation analysis and trend analysis.

Data

Imagine you are analyzing the relationship between advertising spend and sales.

import pandas as pd

data = {
'ad_spend': [1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500],
'sales': [200, 240, 300, 350, 400, 430, 500, 520, 600, 620]
}

df = pd.DataFrame(data)

Creating a Scatter Plot with Regression Line

plt.figure(figsize=(8, 4))
sns.regplot(x='ad_spend', y='sales', data=df)
plt.title('Advertising Spend vs Sales with Regression Line')
plt.xlabel('Advertising Spend (Rs)')
plt.ylabel('Sales')
plt.show()

Example 2: Box Plot

Use and Benefits

  • Use: Box plots are used to visualize the distribution of data and identify outliers.
  • Benefits: They provide a summary of the data’s distribution, including median, quartiles, and outliers.
  • Analysis Type: Descriptive statistical analysis.

Data

Imagine you are analyzing the salary distribution across different departments.

data = {
'department': ['HR', 'HR', 'IT', 'IT', 'Finance', 'Finance', 'IT', 'HR', 'Finance', 'IT'],
'salary': [55, 65, 70, 80, 85, 90, 95, 100, 105, 110]
}

df = pd.DataFrame(data)

Creating a Box Plot

plt.figure(figsize=(10, 6))
sns.boxplot(x='department', y='salary', data=df)
plt.title('Salary Distribution by Department')
plt.xlabel('Department')
plt.ylabel('Salary')
plt.show()

Example 3: Violin Plot

Use and Benefits

  • Use: Violin plots are used to visualize the distribution of data across different categories.
  • Benefits: They combine features of box plots and density plots, providing more information about the distribution.
  • Analysis Type: Comparative distribution analysis.

Data

Using the same salary distribution data as above:

plt.figure(figsize=(10, 6))
sns.violinplot(x='department', y='salary', data=df)
plt.title('Salary Distribution by Department')
plt.xlabel('Department')
plt.ylabel('Salary')
plt.show()

Example 4: Heatmap

Use and Benefits

  • Use: Heatmaps are used to visualize data in a matrix format, where values are represented by colors.
  • Benefits: They help identify patterns, correlations, and outliers in large datasets.
  • Analysis Type: Correlation and pattern analysis.

Data

Imagine you are analyzing the correlation between different features in a dataset. We’ll use a dataset with numerical features for this example.

data = {
'age': [25, 30, 35, 40, 29, 32, 33, 28, 27, 26],
'experience': [1, 3, 5, 7, 2, 4, 6, 1, 2, 3],
'salary': [55, 65, 70, 80, 85, 90, 95, 100, 105, 110]
}

df = pd.DataFrame(data)

Creating a Heatmap

# Calculate the correlation matrix
correlation_matrix = df.corr()

plt.figure(figsize=(8, 4))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix Heatmap')
plt.show()

Example 5: Pair Plot

Use and Benefits

  • Use: Pair plots are used to visualize the pairwise relationships between different features in a dataset.
  • Benefits: They provide a comprehensive view of interactions between multiple variables.
  • Analysis Type: Exploratory data analysis (EDA).

Data

Imagine you have a dataset with multiple features.

data = sns.load_dataset('iris')

Creating a Pair Plot

sns.pairplot(data, hue='species')
plt.suptitle('Pair Plot of Iris Dataset', y=1.02)
plt.show()

Example 6: Distribution Plot (Distplot)

Use and Benefits

  • Use: Distplots are used to visualize the distribution of a univariate set of observations.
  • Benefits: They provide a combined view of the histogram and kernel density estimate (KDE).
  • Analysis Type: Descriptive statistical analysis and distribution analysis.

Data

Imagine you are analyzing the distribution of customer ages in a retail store.

data = {
'age': [23, 25, 28, 29, 31, 33, 35, 37, 39, 42, 45, 48, 50, 53, 55, 58, 60, 63, 65, 68]
}

df = pd.DataFrame(data)

Creating a Distplot

plt.figure(figsize=(8, 4))
sns.histplot(df['age'], kde=True, bins=10)
plt.title('Distribution of Customer Ages')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

Example 7: Joint Plot

Use and Benefits

  • Use: Joint plots are used to visualize the relationship between two variables and their distributions.
  • Benefits: They provide both scatter plots and histograms/KDEs, giving a comprehensive view of the data.
  • Analysis Type: Bivariate analysis and correlation analysis.

Data

Imagine you are analyzing the relationship between advertising spend and sales.

data = {
'ad_spend': [1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500],
'sales': [200, 240, 300, 350, 400, 430, 500, 520, 600, 620]
}

df = pd.DataFrame(data)

Creating a Joint Plot

sns.set(style="white", palette="muted")

plt.figure(figsize=(8, 4))
sns.jointplot(x='ad_spend', y='sales', data=df, kind='reg')
plt.suptitle('Advertising Spend vs Sales', y=1.02)
plt.xlabel('Advertising Spend (Rs)')
plt.ylabel('Sales')
plt.show()

Example 8: Hexbin Plot (Hexplot)

Use and Benefits

  • Use: Hexbin plots are used to visualize the relationship between two variables, showing the counts of observations that fall within hexagonal bins.
  • Benefits: They are particularly useful for large datasets and help in identifying patterns and density.
  • Analysis Type: Bivariate analysis and density estimation.

Data

Imagine you are analyzing the relationship between house prices and house sizes.

data = {
'house_size': [1400, 1600, 1700, 1875, 1100, 1550, 2350, 2450, 1425, 1700, 1550, 1750, 2200, 2200, 3000, 2900, 3200, 2800, 1900, 1750],
'house_price': [300000, 350000, 320000, 370000, 200000, 290000, 450000, 420000, 310000, 330000, 290000, 340000, 440000, 420000, 530000, 520000, 600000, 580000, 370000, 360000]
}

df = pd.DataFrame(data)

Creating a Hexplot

plt.figure(figsize=(8, 4))
sns.jointplot(x='house_size', y='house_price', data=df, kind='hex', color='purple')
plt.suptitle('House Size vs House Price', y=1.02)
plt.xlabel('House Size (sq ft)')
plt.ylabel('House Price (Rs)')
plt.show()

Example 9: 3D Plot

Seaborn does not natively support 3D plots, as it is primarily designed for 2D statistical graphics. However, you can enhance 3D plots created with Matplotlib using Seaborn’s styling capabilities to some extent. Here’s how you can do it:

Enhancing a 3D Plot with Seaborn

We’ll use Matplotlib’s 3D plotting capabilities and apply Seaborn’s styling to make the plot more aesthetically pleasing.

Data

Imagine you are analyzing the relationship between the number of hours studied, the number of hours slept, and the test scores of students.

import pandas as pd
import numpy as np

data = {
'hours_studied': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'hours_slept': [7, 6.5, 6, 5.5, 5, 7.5, 8, 6, 5.5, 6.5],
'test_scores': [65, 70, 75, 80, 85, 90, 95, 70, 80, 85]
}

df = pd.DataFrame(data)

Creating an Enhanced 3D Plot

from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
import matplotlib.pyplot as plt

# Setting Seaborn style
sns.set(style="whitegrid")

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot
sc = ax.scatter(df['hours_studied'], df['hours_slept'], df['test_scores'], c=df['test_scores'], cmap='viridis', s=100)

# Adding labels
ax.set_title('Relationship Between Study Hours, Sleep Hours, and Test Scores', fontsize=15)
ax.set_xlabel('Hours Studied', fontsize=12)
ax.set_ylabel('Hours Slept', fontsize=12)
ax.set_zlabel('Test Scores', fontsize=12)

# Adding color bar
cbar = plt.colorbar(sc, ax=ax, shrink=0.5, aspect=5)
cbar.set_label('Test Scores', rotation=270, labelpad=15)

plt.show()

Breakdown of the Enhanced 3D Plot

  1. Seaborn Style: We set the Seaborn style to ‘whitegrid’ using sns.set(style="whitegrid") to improve the plot’s appearance.
  2. Figure and Axis: We create a figure and a 3D axis using Matplotlib.
  3. Scatter Plot: We use ax.scatter to create a 3D scatter plot, setting the color based on the test scores and using a colormap for better visualization.
  4. Labels and Titles: We add a title and labels for each axis with customized font sizes.
  5. Color Bar: We add a color bar to provide a reference for the color-coded test scores, making the plot more informative.

Conclusion

In this tutorial, we’ve introduced you to Seaborn and explored various types of plots, each with real-time examples. We’ve covered scatter plots with regression lines, box plots, violin plots, heatmaps, and pair plots. Each plot type has its unique use case and benefits, making them essential tools for different kinds of data analysis. Seaborn’s high-level interface and beautiful aesthetics make it a powerful addition to your data visualization toolkit.

In the next tutorial, we’ll dive deeper into advanced Seaborn plots and explore more customization options. Stay tuned and keep exploring!

Data AI Admin

Senior AI Lead having overall Experience of 10+ years in IT, Data Science, Machine Learning, AI and related fields.

Related Posts

Exploring Different Pandas File Formats

Data handling and manipulation are fundamental in data analysis. Pandas, a powerful Python library, supports various file formats for reading, writing, and converting data. Understanding these formats and their benefits…

Read more

Mastering Data Visualization with Matplotlib

Welcome to the fifth tutorial in our series on data analysis with Python! In this article, we’ll explore data visualization with Matplotlib, a powerful library for creating static, animated, and…

Read more

Leave a Reply

You Missed

Exploring Different Pandas File Formats

  • June 28, 2024
Exploring Different Pandas File Formats

Making Beautiful Plots with Seaborn in Python

  • June 28, 2024
Making Beautiful Plots with Seaborn in Python

Mastering Data Visualization with Matplotlib

  • June 28, 2024
Mastering Data Visualization with Matplotlib

Data Cleaning and Preprocessing with Pandas

  • June 27, 2024
Data Cleaning and Preprocessing with Pandas

Exploring Data with Pandas: Series and DataFrames

  • June 27, 2024
Exploring Data with Pandas: Series and DataFrames

NumPy : Basic Operations and Arrays

  • June 27, 2024
NumPy : Basic Operations and Arrays