Making Beautiful Plots with Seaborn in Python
Welcome to the sixth tutorial in our series on data analysis with Python! In this article, we’ll introduce you to Seaborn, a powerful Python visualization library built on top of Matplotlib. Seaborn provides a high-level interface for drawing attractive and informative statistical graphics. We’ll cover a variety of plot types, explain their uses and benefits, and discuss the types of analysis they are best suited for. We’ll also demonstrate how to configure and customize these plots using real-time business examples.
Importing Seaborn
Before we start, let’s import Seaborn:
!pip install matplotlib # if not installed
!pip install seaborn # if not installedimport seaborn as sns
import matplotlib.pyplot as plt
Basic Configuration and Functions
Seaborn allows extensive customization of plots. Here are some common functions and configurations:
- Figure Size:
plt.figure(figsize=(width, height))
sets the size of the plot. - Title:
plt.title('Title')
adds a title to the plot. - Axis Labels:
plt.xlabel('X-axis Label')
andplt.ylabel('Y-axis Label')
add labels to the axes. - Style:
sns.set_style('style_name')
sets the style of the plot (e.g., ‘whitegrid’, ‘darkgrid’, ‘white’, ‘dark’, ‘ticks’). - Palette:
sns.set_palette('palette_name')
sets the color palette for the plot.
Example 1: Scatter Plot with Regression Line
Use and Benefits
- Use: Scatter plots with regression lines are used to visualize the relationship between two variables and fit a regression line to the data.
- Benefits: They provide insights into correlations and trends.
- Analysis Type: Correlation analysis and trend analysis.
Data
Imagine you are analyzing the relationship between advertising spend and sales.
import pandas as pd
data = {
'ad_spend': [1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500],
'sales': [200, 240, 300, 350, 400, 430, 500, 520, 600, 620]
}
df = pd.DataFrame(data)
Creating a Scatter Plot with Regression Line
plt.figure(figsize=(8, 4))
sns.regplot(x='ad_spend', y='sales', data=df)
plt.title('Advertising Spend vs Sales with Regression Line')
plt.xlabel('Advertising Spend (Rs)')
plt.ylabel('Sales')
plt.show()
Example 2: Box Plot
Use and Benefits
- Use: Box plots are used to visualize the distribution of data and identify outliers.
- Benefits: They provide a summary of the data’s distribution, including median, quartiles, and outliers.
- Analysis Type: Descriptive statistical analysis.
Data
Imagine you are analyzing the salary distribution across different departments.
data = {
'department': ['HR', 'HR', 'IT', 'IT', 'Finance', 'Finance', 'IT', 'HR', 'Finance', 'IT'],
'salary': [55, 65, 70, 80, 85, 90, 95, 100, 105, 110]
}
df = pd.DataFrame(data)
Creating a Box Plot
plt.figure(figsize=(10, 6))
sns.boxplot(x='department', y='salary', data=df)
plt.title('Salary Distribution by Department')
plt.xlabel('Department')
plt.ylabel('Salary')
plt.show()
Example 3: Violin Plot
Use and Benefits
- Use: Violin plots are used to visualize the distribution of data across different categories.
- Benefits: They combine features of box plots and density plots, providing more information about the distribution.
- Analysis Type: Comparative distribution analysis.
Data
Using the same salary distribution data as above:
plt.figure(figsize=(10, 6))
sns.violinplot(x='department', y='salary', data=df)
plt.title('Salary Distribution by Department')
plt.xlabel('Department')
plt.ylabel('Salary')
plt.show()
Example 4: Heatmap
Use and Benefits
- Use: Heatmaps are used to visualize data in a matrix format, where values are represented by colors.
- Benefits: They help identify patterns, correlations, and outliers in large datasets.
- Analysis Type: Correlation and pattern analysis.
Data
Imagine you are analyzing the correlation between different features in a dataset. We’ll use a dataset with numerical features for this example.
data = {
'age': [25, 30, 35, 40, 29, 32, 33, 28, 27, 26],
'experience': [1, 3, 5, 7, 2, 4, 6, 1, 2, 3],
'salary': [55, 65, 70, 80, 85, 90, 95, 100, 105, 110]
}
df = pd.DataFrame(data)
Creating a Heatmap
# Calculate the correlation matrix
correlation_matrix = df.corr()
plt.figure(figsize=(8, 4))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix Heatmap')
plt.show()
Example 5: Pair Plot
Use and Benefits
- Use: Pair plots are used to visualize the pairwise relationships between different features in a dataset.
- Benefits: They provide a comprehensive view of interactions between multiple variables.
- Analysis Type: Exploratory data analysis (EDA).
Data
Imagine you have a dataset with multiple features.
data = sns.load_dataset('iris')
Creating a Pair Plot
sns.pairplot(data, hue='species')
plt.suptitle('Pair Plot of Iris Dataset', y=1.02)
plt.show()
Example 6: Distribution Plot (Distplot)
Use and Benefits
- Use: Distplots are used to visualize the distribution of a univariate set of observations.
- Benefits: They provide a combined view of the histogram and kernel density estimate (KDE).
- Analysis Type: Descriptive statistical analysis and distribution analysis.
Data
Imagine you are analyzing the distribution of customer ages in a retail store.
data = {
'age': [23, 25, 28, 29, 31, 33, 35, 37, 39, 42, 45, 48, 50, 53, 55, 58, 60, 63, 65, 68]
}
df = pd.DataFrame(data)
Creating a Distplot
plt.figure(figsize=(8, 4))
sns.histplot(df['age'], kde=True, bins=10)
plt.title('Distribution of Customer Ages')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()
Example 7: Joint Plot
Use and Benefits
- Use: Joint plots are used to visualize the relationship between two variables and their distributions.
- Benefits: They provide both scatter plots and histograms/KDEs, giving a comprehensive view of the data.
- Analysis Type: Bivariate analysis and correlation analysis.
Data
Imagine you are analyzing the relationship between advertising spend and sales.
data = {
'ad_spend': [1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500],
'sales': [200, 240, 300, 350, 400, 430, 500, 520, 600, 620]
}
df = pd.DataFrame(data)
Creating a Joint Plot
sns.set(style="white", palette="muted")
plt.figure(figsize=(8, 4))
sns.jointplot(x='ad_spend', y='sales', data=df, kind='reg')
plt.suptitle('Advertising Spend vs Sales', y=1.02)
plt.xlabel('Advertising Spend (Rs)')
plt.ylabel('Sales')
plt.show()
Example 8: Hexbin Plot (Hexplot)
Use and Benefits
- Use: Hexbin plots are used to visualize the relationship between two variables, showing the counts of observations that fall within hexagonal bins.
- Benefits: They are particularly useful for large datasets and help in identifying patterns and density.
- Analysis Type: Bivariate analysis and density estimation.
Data
Imagine you are analyzing the relationship between house prices and house sizes.
data = {
'house_size': [1400, 1600, 1700, 1875, 1100, 1550, 2350, 2450, 1425, 1700, 1550, 1750, 2200, 2200, 3000, 2900, 3200, 2800, 1900, 1750],
'house_price': [300000, 350000, 320000, 370000, 200000, 290000, 450000, 420000, 310000, 330000, 290000, 340000, 440000, 420000, 530000, 520000, 600000, 580000, 370000, 360000]
}
df = pd.DataFrame(data)
Creating a Hexplot
plt.figure(figsize=(8, 4))
sns.jointplot(x='house_size', y='house_price', data=df, kind='hex', color='purple')
plt.suptitle('House Size vs House Price', y=1.02)
plt.xlabel('House Size (sq ft)')
plt.ylabel('House Price (Rs)')
plt.show()
Example 9: 3D Plot
Seaborn does not natively support 3D plots, as it is primarily designed for 2D statistical graphics. However, you can enhance 3D plots created with Matplotlib using Seaborn’s styling capabilities to some extent. Here’s how you can do it:
Enhancing a 3D Plot with Seaborn
We’ll use Matplotlib’s 3D plotting capabilities and apply Seaborn’s styling to make the plot more aesthetically pleasing.
Data
Imagine you are analyzing the relationship between the number of hours studied, the number of hours slept, and the test scores of students.
import pandas as pd
import numpy as np
data = {
'hours_studied': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'hours_slept': [7, 6.5, 6, 5.5, 5, 7.5, 8, 6, 5.5, 6.5],
'test_scores': [65, 70, 75, 80, 85, 90, 95, 70, 80, 85]
}
df = pd.DataFrame(data)
Creating an Enhanced 3D Plot
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
import matplotlib.pyplot as plt
# Setting Seaborn style
sns.set(style="whitegrid")
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
# Scatter plot
sc = ax.scatter(df['hours_studied'], df['hours_slept'], df['test_scores'], c=df['test_scores'], cmap='viridis', s=100)
# Adding labels
ax.set_title('Relationship Between Study Hours, Sleep Hours, and Test Scores', fontsize=15)
ax.set_xlabel('Hours Studied', fontsize=12)
ax.set_ylabel('Hours Slept', fontsize=12)
ax.set_zlabel('Test Scores', fontsize=12)
# Adding color bar
cbar = plt.colorbar(sc, ax=ax, shrink=0.5, aspect=5)
cbar.set_label('Test Scores', rotation=270, labelpad=15)
plt.show()
Breakdown of the Enhanced 3D Plot
- Seaborn Style: We set the Seaborn style to ‘whitegrid’ using
sns.set(style="whitegrid")
to improve the plot’s appearance. - Figure and Axis: We create a figure and a 3D axis using Matplotlib.
- Scatter Plot: We use
ax.scatter
to create a 3D scatter plot, setting the color based on the test scores and using a colormap for better visualization. - Labels and Titles: We add a title and labels for each axis with customized font sizes.
- Color Bar: We add a color bar to provide a reference for the color-coded test scores, making the plot more informative.
Conclusion
In this tutorial, we’ve introduced you to Seaborn and explored various types of plots, each with real-time examples. We’ve covered scatter plots with regression lines, box plots, violin plots, heatmaps, and pair plots. Each plot type has its unique use case and benefits, making them essential tools for different kinds of data analysis. Seaborn’s high-level interface and beautiful aesthetics make it a powerful addition to your data visualization toolkit.
In the next tutorial, we’ll dive deeper into advanced Seaborn plots and explore more customization options. Stay tuned and keep exploring!
1 thought on “Making Beautiful Plots with Seaborn in Python”