Objective: Statistical visualizations.
High-level API based Matplotlib with strong integration of pandas.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("penguins")
df.head(3)
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | Male |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | Female |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | Female |
High-level API:
Seaborn functions work on entire datasets and take care of many steps, such as aggregating data automatically.
Example: relplot
The relplot
function is designed to visualize static relationships of all kinds:
sns.relplot(
x="bill_length_mm", y="bill_depth_mm",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x1406d32b0>
With the help of a few arguments of the plotting function, you can add more variables to the plot.
Here, for example, the coloring of the scatter dots indicates the species of the penguins:
sns.relplot(
x="bill_length_mm", y="bill_depth_mm",
hue="species",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x1407ee700>
We can also change the dot's size according to their weights
sns.relplot(
x="bill_length_mm", y="bill_depth_mm",
hue="species",
size="body_mass_g",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x140873be0>
Using the parameters col
and row
, multiple plots can be created based on a categorical variable:
sns.relplot(
x="bill_length_mm", y="bill_depth_mm",
hue="sex",
size="body_mass_g",
col="species",
row="island",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x140958e80>
Continuous relationships can also be visualized using line plots (more on that later)...
Seaborn provides functions for different types of visualizations:
Generate histograms or similar plots.
sns.displot(
x="body_mass_g", col="species",
hue="sex",
kde=True,
data=df
)
<seaborn.axisgrid.FacetGrid at 0x1408ca9a0>
sns.displot(
x="body_mass_g", col="species",
hue="sex",
kind="kde",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x141ea0a90>
Generate plots showing distributions split by certain values for categorical variables.
sns.catplot(
x="species", y="body_mass_g",
kind="boxen",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x141caab20>
However, it also works without classes...
sns.catplot(
y="body_mass_g",
kind="box",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x141d48070>
sns.catplot(
x="species", y="body_mass_g",
kind="violin",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x160135a90>
sns.catplot(
x="species", y="body_mass_g",
hue="sex",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x160118730>
sns.catplot(
x="species", y="body_mass_g",
hue="sex",
kind="swarm",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x141d48a90>
sns.catplot(
x="species", y="body_mass_g",
hue="sex",
kind="bar",
data=df
)
<seaborn.axisgrid.FacetGrid at 0x16026bfa0>
Fits a regression model to the data to be visualized and also plots certain model parameters.
Can be a neat way to visualize (linear) relations within your data.
sns.lmplot(
x="body_mass_g", y="bill_length_mm",
hue="sex",
col="species",
data=df,
)
<seaborn.axisgrid.FacetGrid at 0x160316160>
Especially in exploratory data analysis, it can be informative to plot different measurements or display formats in combination to gain more "global" insights.
The pairplot
, for example, plots all variables of a data set against each other:
sns.pairplot(hue="species", data=df)
<seaborn.axisgrid.PairGrid at 0x16046ae80>
With the jointplot
the display types histogram and scatterplot are combined:
sns.jointplot(
x="flipper_length_mm", y="bill_length_mm",
hue="species",
data=df
)
<seaborn.axisgrid.JointGrid at 0x1609dc880>
Seaborn is designed to work with Panda's DataFrames
.
The whole DateFrame can be passed with the data
parameter and then columns can be selected using their name.
data = pd.DataFrame({
"x": np.linspace(0, 20, 10000),
"y": np.sin(np.linspace(0, 20, 10000))
})
sns.lineplot(x="x", y="y", data=data)
<AxesSubplot:xlabel='x', ylabel='y'>
However, Seaborn also accepts other data types:
x = np.linspace(0, 20, 10000)
y = np.sin(x)
sns.lineplot(x=x, y=y)
<AxesSubplot:>
sns.histplot(y)
<AxesSubplot:ylabel='Count'>
etc..
But of course you lose many of the helpful features of the DataFrame
integration. (Most notably: Automatic axes labeling!).
DateFrames
: Long- vs. Wide-form¶DataFrames
can contain data in different formats. For example, in longform format, where each variable has its own column.
Or in wideform format, which is more like traditional Excel spreadsheets and only contrasts two values.
pandas
is best at handling longform-based data:
flights = sns.load_dataset("flights")
flights.head()
year | month | passengers | |
---|---|---|---|
0 | 1949 | Jan | 112 |
1 | 1949 | Feb | 118 |
2 | 1949 | Mar | 132 |
3 | 1949 | Apr | 129 |
4 | 1949 | May | 121 |
Here the data for the vast majority of plots are automatically aggregated and correctly prepared.
For example, here the spread of the number of passenger per month is automatically aggregated by year:
sns.lineplot(x="year", y="passengers", data=flights)
<AxesSubplot:xlabel='year', ylabel='passengers'>
sns.lineplot(x="year", y="passengers", hue="month", data=flights)
<AxesSubplot:xlabel='year', ylabel='passengers'>
The same mechanism also works the other way round:
sns.lineplot(x="month", y="passengers", data=flights)
<AxesSubplot:xlabel='month', ylabel='passengers'>
sns.lineplot(x="month", y="passengers", hue="year", data=flights)
<AxesSubplot:xlabel='month', ylabel='passengers'>
Some datasets also come in more complex formats. For example, different hierarchical levels could be mixed.
freqs = pd.read_csv("freqs-engl.txt", sep="\t")
freqs.head()
ID | year | author | title | subgenre | genre | form | word_types | size | the | ... | many | against | faith | put | about | leaue | might | brother | friend | none | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1607 | Beaumont_Francis | Knight_of_the_Burning_Pestle | Burlesque_Romance | comedy | unknown | 3223 | 21105 | 2.629709 | ... | 0.061597 | 0.037906 | 0.080550 | 0.061597 | 0.080550 | 0.042644 | 0.028429 | 0.018953 | 0.075811 | 0.052120 |
1 | 2 | 1641 | Brome_Richard | Jovial_Crew | comedy | comedy | mixed | 3576 | 24190 | 3.137660 | ... | 0.062009 | 0.033072 | 0.008268 | 0.057875 | 0.037205 | 0.000000 | 0.045473 | 0.008268 | 0.095081 | 0.062009 |
2 | 3 | 1601 | Chapman_George | All_Fools | comedy | comedy | unknown | 3475 | 19143 | 2.606697 | ... | 0.041791 | 0.078358 | 0.026119 | 0.062686 | 0.073134 | 0.041791 | 0.062686 | 0.130596 | 0.073134 | 0.031343 |
3 | 4 | 1596 | Chapman_George | Blind_Beggar_of_Alexandria | comedy | comedy | unknown | 2352 | 13140 | 2.838661 | ... | 0.022831 | 0.053272 | 0.015221 | 0.083714 | 0.045662 | 0.129376 | 0.060883 | 0.038052 | 0.015221 | 0.053272 |
4 | 5 | 1604 | Chapman_George | Bussy_DAmbois | Foreign_History | history | unknown | 3695 | 19781 | 3.184874 | ... | 0.050554 | 0.070775 | 0.035387 | 0.050554 | 0.045498 | 0.070775 | 0.065720 | 0.030332 | 0.096052 | 0.045498 |
5 rows × 209 columns
Example: Comparing the frequencies of you
and thou
for tragedies and comedies.
To generate a histogram of the frequencies of the two words for both genres, we need to convert the data into long-form using the .melt
method of DataFrames
.
plot_df = freqs.query("genre == 'tragedy' or genre == 'comedy'").melt(
id_vars=["genre", "title", "year"],
value_vars=["you", "thou"],
var_name="token",
value_name="freq"
)
plot_df
genre | title | year | token | freq | |
---|---|---|---|---|---|
0 | comedy | Knight_of_the_Burning_Pestle | 1607 | you | 2.018479 |
1 | comedy | Jovial_Crew | 1641 | you | 2.414221 |
2 | comedy | All_Fools | 1601 | you | 2.021627 |
3 | comedy | Blind_Beggar_of_Alexandria | 1596 | you | 1.993912 |
4 | tragedy | Byrons_Conspiracy | 1608 | you | 1.246883 |
... | ... | ... | ... | ... | ... |
253 | tragedy | Duchess_of_Malfi | 1614 | thou | 0.383885 |
254 | tragedy | White_Devil | 1612 | thou | 0.365200 |
255 | comedy | Cobblers_Prophecy | 1590 | thou | 0.948917 |
256 | comedy | Three_Ladies_of_London | 1581 | thou | 1.115419 |
257 | comedy | Three_Lords_and_Three_Ladies | 1590 | thou | 0.662318 |
258 rows × 5 columns
Since we lose data by applying this transformation, it is recommended to save the result in a new DataFrame
...
sns.displot(
x="freq",
hue="token",
col="genre",
kde=True,
data=plot_df
)
<seaborn.axisgrid.FacetGrid at 0x160cbb670>
seaborn
uses matplotlib
as a backend framework to create the plots.
This means, it is to extend seaborn
plots with matplotlib
.
However, this is not necessary in all cases where you want to customize seaborn
plots, because seaborn
itself also provides some functions for this.
For this you have to distinguish between two types of plots:
axes_level
plotsfigure_level
plotsaxes_level
plots return a matplotlib
axes
object containing the plot while figure_level
plots return a FacetGrid
object containing the plot.
FacetGrid
¶FacetGrid
objects are special containers that seaborn
uses to encapsulate one (or more) graphic(s) and the data they generate.
df.head(3)
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | Male |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | Female |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | Female |
g = sns.FacetGrid(df)
You can assign individual columns and rows of a 'FacetGrid' to specific variables from the data set.
g = sns.FacetGrid(df, col="species", row="sex", hue="island")
Using the .map
method of Facetgrid, it is possible to apply various plotting functions to each subplot (and its associated data) of a FacetGrid
.
g.map(sns.scatterplot, "body_mass_g", "bill_length_mm")
g.add_legend()
g.figure
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()] /opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. plot_args = [v for k, v in plot_data.iteritems()]
Certain plotting functions of Seaborn require the data as DataFrame
via the data
parameter. To apply those functions to the FacetGrid
too, you can use the .map_dataframe
method.
g = sns.FacetGrid(df, col="species", row="sex", hue="island")
g.map_dataframe(sns.swarmplot, y="body_mass_g")
g.add_legend()
g.figure
FacetGrid objects encapsulate the subplots they contain in the axes
attribute.
g.axes
array([[<AxesSubplot:title={'center':'sex = Male | species = Adelie'}, ylabel='body_mass_g'>, <AxesSubplot:title={'center':'sex = Male | species = Chinstrap'}>, <AxesSubplot:title={'center':'sex = Male | species = Gentoo'}>], [<AxesSubplot:title={'center':'sex = Female | species = Adelie'}, ylabel='body_mass_g'>, <AxesSubplot:title={'center':'sex = Female | species = Chinstrap'}>, <AxesSubplot:title={'center':'sex = Female | species = Gentoo'}>]], dtype=object)
g.axes[0][0].set_title("1.")
g.axes[0][1].set_title("2.")
g.figure
The entire graphic is stored in the figure
attribute.
These objects are again classic matplotlib
graphics and can be adapted or processed accordingly.
g.figure.suptitle("My first custom FacetGrid :-)", y=1.1)
g.figure
The advantage of 'FacetGrids' is that you can create and customize your own plots quite flexibly without having to drop any of seaborn's
convenient features.
figure_level
-Plots¶High-level plot functions, such as relplot
, catplot
or displot
mostly return a FacetGrid
object.
g = sns.catplot(x="species", y="body_mass_g", hue="sex", data=df)
type(g)
seaborn.axisgrid.FacetGrid
Since FacetGrid
serve as containers for axes
, figure
, they are poorly adapted to other graphics and should be used to create a coherent graphic.
axes_level
-Plots¶As the name suggests, axes_level
plots return a matplotlib
axes
object.
axes_level
plots are intended to be a drop-in replacement for matplotlib
functions and can be well integrated into other plots or matplotlib
workflows.
data.head(3)
x | y | |
---|---|---|
0 | 0.000 | 0.000 |
1 | 0.002 | 0.002 |
2 | 0.004 | 0.004 |
fig, axes = plt.subplots(2, 1)
axes[0].plot(data["x"], data["y"])
axes[0].set_title("Sine Curve")
sns.histplot(x=data["y"], ax=axes[1])
axes[1].set_title("Histogram of sine values")
fig.tight_layout()
fig.suptitle("Example for a combined matplotlib and seaborn plot", y=1.1)
plt.show()