Objective: Statistical visualizations.
High-level API based Matplotlib with strong integration of pandas.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("penguins")
df.head(3)
| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
|---|---|---|---|---|---|---|---|
| 0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | Male |
| 1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | Female |
| 2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | Female |
High-level API:
Seaborn functions work on entire datasets and take care of many steps, such as aggregating data automatically.
Example: relplot
The relplot function is designed to visualize static relationships of all kinds:
sns.relplot(
x="bill_length_mm", y="bill_depth_mm",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x1406d32b0>
With the help of a few arguments of the plotting function, you can add more variables to the plot.
Here, for example, the coloring of the scatter dots indicates the species of the penguins:
sns.relplot(
x="bill_length_mm", y="bill_depth_mm",
hue="species",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x1407ee700>
We can also change the dot's size according to their weights
sns.relplot(
x="bill_length_mm", y="bill_depth_mm",
hue="species",
size="body_mass_g",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x140873be0>
Using the parameters col and row, multiple plots can be created based on a categorical variable:
sns.relplot(
x="bill_length_mm", y="bill_depth_mm",
hue="sex",
size="body_mass_g",
col="species",
row="island",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x140958e80>
Continuous relationships can also be visualized using line plots (more on that later)...
Generate histograms or similar plots.
sns.displot(
x="body_mass_g", col="species",
hue="sex",
kde=True,
data=df
)
<seaborn.axisgrid.FacetGrid at 0x1408ca9a0>
sns.displot(
x="body_mass_g", col="species",
hue="sex",
kind="kde",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x141ea0a90>
Generate plots showing distributions split by certain values for categorical variables.
sns.catplot(
x="species", y="body_mass_g",
kind="boxen",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x141caab20>
However, it also works without classes...
sns.catplot(
y="body_mass_g",
kind="box",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x141d48070>
sns.catplot(
x="species", y="body_mass_g",
kind="violin",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x160135a90>
sns.catplot(
x="species", y="body_mass_g",
hue="sex",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x160118730>
sns.catplot(
x="species", y="body_mass_g",
hue="sex",
kind="swarm",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x141d48a90>
sns.catplot(
x="species", y="body_mass_g",
hue="sex",
kind="bar",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x16026bfa0>
Fits a regression model to the data to be visualized and also plots certain model parameters.
Can be a neat way to visualize (linear) relations within your data.
sns.lmplot(
x="body_mass_g", y="bill_length_mm",
hue="sex",
col="species",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x160316160>
Especially in exploratory data analysis, it can be informative to plot different measurements or display formats in combination to gain more "global" insights.
The pairplot, for example, plots all variables of a data set against each other:
sns.pairplot(hue="species", data=df)
<seaborn.axisgrid.PairGrid at 0x16046ae80>
With the jointplot the display types histogram and scatterplot are combined:
sns.jointplot(
x="flipper_length_mm", y="bill_length_mm",
hue="species",
data=df
)
<seaborn.axisgrid.JointGrid at 0x1609dc880>
Seaborn is designed to work with Panda's DataFrames.
The whole DateFrame can be passed with the data parameter and then columns can be selected using their name.
data = pd.DataFrame({
"x": np.linspace(0, 20, 10000),
"y": np.sin(np.linspace(0, 20, 10000))
})
sns.lineplot(x="x", y="y", data=data)
<AxesSubplot:xlabel='x', ylabel='y'>
However, Seaborn also accepts other data types:
x = np.linspace(0, 20, 10000)
y = np.sin(x)
sns.lineplot(x=x, y=y)
<AxesSubplot:>
sns.histplot(y)
<AxesSubplot:ylabel='Count'>
etc..
But of course you lose many of the helpful features of the DataFrame integration. (Most notably: Automatic axes labeling!).
DateFrames: Long- vs. Wide-form¶DataFrames can contain data in different formats. For example, in longform format, where each variable has its own column.
Or in wideform format, which is more like traditional Excel spreadsheets and only contrasts two values.

pandas is best at handling longform-based data:
flights = sns.load_dataset("flights")
flights.head()
| year | month | passengers | |
|---|---|---|---|
| 0 | 1949 | Jan | 112 |
| 1 | 1949 | Feb | 118 |
| 2 | 1949 | Mar | 132 |
| 3 | 1949 | Apr | 129 |
| 4 | 1949 | May | 121 |
Here the data for the vast majority of plots are automatically aggregated and correctly prepared.
For example, here the spread of the number of passenger per month is automatically aggregated by year:
sns.lineplot(x="year", y="passengers", data=flights)
<AxesSubplot:xlabel='year', ylabel='passengers'>
sns.lineplot(x="year", y="passengers", hue="month", data=flights)
<AxesSubplot:xlabel='year', ylabel='passengers'>
The same mechanism also works the other way round:
sns.lineplot(x="month", y="passengers", data=flights)
<AxesSubplot:xlabel='month', ylabel='passengers'>
sns.lineplot(x="month", y="passengers", hue="year", data=flights)
<AxesSubplot:xlabel='month', ylabel='passengers'>
Some datasets also come in more complex formats. For example, different hierarchical levels could be mixed.
freqs = pd.read_csv("freqs-engl.txt", sep="\t")
freqs.head()
| ID | year | author | title | subgenre | genre | form | word_types | size | the | ... | many | against | faith | put | about | leaue | might | brother | friend | none | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1607 | Beaumont_Francis | Knight_of_the_Burning_Pestle | Burlesque_Romance | comedy | unknown | 3223 | 21105 | 2.629709 | ... | 0.061597 | 0.037906 | 0.080550 | 0.061597 | 0.080550 | 0.042644 | 0.028429 | 0.018953 | 0.075811 | 0.052120 |
| 1 | 2 | 1641 | Brome_Richard | Jovial_Crew | comedy | comedy | mixed | 3576 | 24190 | 3.137660 | ... | 0.062009 | 0.033072 | 0.008268 | 0.057875 | 0.037205 | 0.000000 | 0.045473 | 0.008268 | 0.095081 | 0.062009 |
| 2 | 3 | 1601 | Chapman_George | All_Fools | comedy | comedy | unknown | 3475 | 19143 | 2.606697 | ... | 0.041791 | 0.078358 | 0.026119 | 0.062686 | 0.073134 | 0.041791 | 0.062686 | 0.130596 | 0.073134 | 0.031343 |
| 3 | 4 | 1596 | Chapman_George | Blind_Beggar_of_Alexandria | comedy | comedy | unknown | 2352 | 13140 | 2.838661 | ... | 0.022831 | 0.053272 | 0.015221 | 0.083714 | 0.045662 | 0.129376 | 0.060883 | 0.038052 | 0.015221 | 0.053272 |
| 4 | 5 | 1604 | Chapman_George | Bussy_DAmbois | Foreign_History | history | unknown | 3695 | 19781 | 3.184874 | ... | 0.050554 | 0.070775 | 0.035387 | 0.050554 | 0.045498 | 0.070775 | 0.065720 | 0.030332 | 0.096052 | 0.045498 |
5 rows × 209 columns
Example: Comparing the frequencies of you and thoufor tragedies and comedies.
To generate a histogram of the frequencies of the two words for both genres, we need to convert the data into long-form using the .melt method of DataFrames.
plot_df = freqs.query("genre == 'tragedy' or genre == 'comedy'").melt(
id_vars=["genre", "title", "year"],
value_vars=["you", "thou"],
var_name="token",
value_name="freq"
)
plot_df
| genre | title | year | token | freq | |
|---|---|---|---|---|---|
| 0 | comedy | Knight_of_the_Burning_Pestle | 1607 | you | 2.018479 |
| 1 | comedy | Jovial_Crew | 1641 | you | 2.414221 |
| 2 | comedy | All_Fools | 1601 | you | 2.021627 |
| 3 | comedy | Blind_Beggar_of_Alexandria | 1596 | you | 1.993912 |
| 4 | tragedy | Byrons_Conspiracy | 1608 | you | 1.246883 |
| ... | ... | ... | ... | ... | ... |
| 253 | tragedy | Duchess_of_Malfi | 1614 | thou | 0.383885 |
| 254 | tragedy | White_Devil | 1612 | thou | 0.365200 |
| 255 | comedy | Cobblers_Prophecy | 1590 | thou | 0.948917 |
| 256 | comedy | Three_Ladies_of_London | 1581 | thou | 1.115419 |
| 257 | comedy | Three_Lords_and_Three_Ladies | 1590 | thou | 0.662318 |
258 rows × 5 columns
Since we lose data by applying this transformation, it is recommended to save the result in a new DataFrame...
sns.displot(
x="freq",
hue="token",
col="genre",
kde=True,
data=plot_df
)
<seaborn.axisgrid.FacetGrid at 0x160cbb670>
seaborn uses matplotlib as a backend framework to create the plots.
This means, it is to extend seaborn plots with matplotlib.
However, this is not necessary in all cases where you want to customize seaborn plots, because seaborn itself also provides some functions for this.
For this you have to distinguish between two types of plots:
axes_level plotsfigure_level plotsaxes_level plots return a matplotlib axes object containing the plot while figure_level plots return a FacetGrid object containing the plot.
FacetGrid¶FacetGrid objects are special containers that seaborn uses to encapsulate one (or more) graphic(s) and the data they generate.
df.head(3)
| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
|---|---|---|---|---|---|---|---|
| 0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | Male |
| 1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | Female |
| 2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | Female |
g = sns.FacetGrid(df)
You can assign individual columns and rows of a 'FacetGrid' to specific variables from the data set.
g = sns.FacetGrid(df, col="species", row="sex", hue="island")
Using the .map method of Facetgrid, it is possible to apply various plotting functions to each subplot (and its associated data) of a FacetGrid.
g.map(sns.scatterplot, "body_mass_g", "bill_length_mm")
g.add_legend()
g.figure
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()]
Certain plotting functions of Seaborn require the data as DataFrame via the data parameter. To apply those functions to the FacetGrid too, you can use the .map_dataframe method.
g = sns.FacetGrid(df, col="species", row="sex", hue="island")
g.map_dataframe(sns.swarmplot, y="body_mass_g")
g.add_legend()
g.figure
FacetGrid objects encapsulate the subplots they contain in the axes attribute.
g.axes
array([[<AxesSubplot:title={'center':'sex = Male | species = Adelie'}, ylabel='body_mass_g'>,
<AxesSubplot:title={'center':'sex = Male | species = Chinstrap'}>,
<AxesSubplot:title={'center':'sex = Male | species = Gentoo'}>],
[<AxesSubplot:title={'center':'sex = Female | species = Adelie'}, ylabel='body_mass_g'>,
<AxesSubplot:title={'center':'sex = Female | species = Chinstrap'}>,
<AxesSubplot:title={'center':'sex = Female | species = Gentoo'}>]],
dtype=object)
g.axes[0][0].set_title("1.")
g.axes[0][1].set_title("2.")
g.figure
The entire graphic is stored in the figure attribute.
These objects are again classic matplotlib graphics and can be adapted or processed accordingly.
g.figure.suptitle("My first custom FacetGrid :-)", y=1.1)
g.figure
The advantage of 'FacetGrids' is that you can create and customize your own plots quite flexibly without having to drop any of seaborn's convenient features.
figure_level-Plots¶High-level plot functions, such as relplot, catplot or displot mostly return a FacetGrid object.
g = sns.catplot(x="species", y="body_mass_g", hue="sex", data=df)
type(g)
seaborn.axisgrid.FacetGrid
Since FacetGrid serve as containers for axes, figure, they are poorly adapted to other graphics and should be used to create a coherent graphic.
axes_level-Plots¶As the name suggests, axes_level plots return a matplotlib axes object.
axes_level plots are intended to be a drop-in replacement for matplotlib functions and can be well integrated into other plots or matplotlib workflows.
data.head(3)
| x | y | |
|---|---|---|
| 0 | 0.000 | 0.000 |
| 1 | 0.002 | 0.002 |
| 2 | 0.004 | 0.004 |
fig, axes = plt.subplots(2, 1)
axes[0].plot(data["x"], data["y"])
axes[0].set_title("Sine Curve")
sns.histplot(x=data["y"], ax=axes[1])
axes[1].set_title("Histogram of sine values")
fig.tight_layout()
fig.suptitle("Example for a combined matplotlib and seaborn plot", y=1.1)
plt.show()