Seaborn¶

Objective: Statistical visualizations.

High-level API based Matplotlib with strong integration of pandas.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
In [2]:
df = sns.load_dataset("penguins")
In [3]:
df.head(3)
Out[3]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female

High-level API:

Seaborn functions work on entire datasets and take care of many steps, such as aggregating data automatically.

Example: relplot

The relplot function is designed to visualize static relationships of all kinds:

In [4]:
sns.relplot(
    x="bill_length_mm", y="bill_depth_mm",
    data=df,
)
Out[4]:
<seaborn.axisgrid.FacetGrid at 0x1406d32b0>

With the help of a few arguments of the plotting function, you can add more variables to the plot.

Here, for example, the coloring of the scatter dots indicates the species of the penguins:

In [5]:
sns.relplot(
    x="bill_length_mm", y="bill_depth_mm",
    hue="species",
    data=df,
)
Out[5]:
<seaborn.axisgrid.FacetGrid at 0x1407ee700>

We can also change the dot's size according to their weights

In [6]:
sns.relplot(
    x="bill_length_mm", y="bill_depth_mm",
    hue="species",
    size="body_mass_g",
    data=df,
)
Out[6]:
<seaborn.axisgrid.FacetGrid at 0x140873be0>

Using the parameters col and row, multiple plots can be created based on a categorical variable:

In [7]:
sns.relplot(
    x="bill_length_mm", y="bill_depth_mm",
    hue="sex",
    size="body_mass_g",
    col="species",
    row="island",
    data=df,
)
Out[7]:
<seaborn.axisgrid.FacetGrid at 0x140958e80>

Continuous relationships can also be visualized using line plots (more on that later)...

Plot types:¶

Seaborn provides functions for different types of visualizations:

Distributions¶

Generate histograms or similar plots.

In [8]:
sns.displot(
    x="body_mass_g", col="species",
    hue="sex",
    kde=True,
    data=df
)
Out[8]:
<seaborn.axisgrid.FacetGrid at 0x1408ca9a0>
In [9]:
sns.displot(
    x="body_mass_g", col="species",
    hue="sex",
    kind="kde",
    data=df
)
Out[9]:
<seaborn.axisgrid.FacetGrid at 0x141ea0a90>

Categorical data¶

Generate plots showing distributions split by certain values for categorical variables.

In [10]:
sns.catplot(
    x="species", y="body_mass_g",
    kind="boxen",
    data=df
)
Out[10]:
<seaborn.axisgrid.FacetGrid at 0x141caab20>

However, it also works without classes...

In [11]:
sns.catplot(
    y="body_mass_g",
    kind="box",
    data=df
)
Out[11]:
<seaborn.axisgrid.FacetGrid at 0x141d48070>
In [12]:
sns.catplot(
    x="species", y="body_mass_g",
    kind="violin",
    data=df
)
Out[12]:
<seaborn.axisgrid.FacetGrid at 0x160135a90>
In [13]:
sns.catplot(
    x="species", y="body_mass_g",
    hue="sex",
    data=df
)
Out[13]:
<seaborn.axisgrid.FacetGrid at 0x160118730>
In [14]:
sns.catplot(
    x="species", y="body_mass_g",
    hue="sex",
    kind="swarm",
    data=df
)
Out[14]:
<seaborn.axisgrid.FacetGrid at 0x141d48a90>
In [15]:
sns.catplot(
    x="species", y="body_mass_g",
    hue="sex",
    kind="bar",
    data=df
)
Out[15]:
<seaborn.axisgrid.FacetGrid at 0x16026bfa0>

Regression plot¶

Fits a regression model to the data to be visualized and also plots certain model parameters.

Can be a neat way to visualize (linear) relations within your data.

In [16]:
sns.lmplot(
    x="body_mass_g", y="bill_length_mm",
    hue="sex",
    col="species",
    data=df,
)
Out[16]:
<seaborn.axisgrid.FacetGrid at 0x160316160>

Multivariate Beziehungen¶

Especially in exploratory data analysis, it can be informative to plot different measurements or display formats in combination to gain more "global" insights.

The pairplot, for example, plots all variables of a data set against each other:

In [17]:
sns.pairplot(hue="species", data=df)
Out[17]:
<seaborn.axisgrid.PairGrid at 0x16046ae80>

With the jointplot the display types histogram and scatterplot are combined:

In [18]:
sns.jointplot(
    x="flipper_length_mm", y="bill_length_mm",
    hue="species",
    data=df
)
Out[18]:
<seaborn.axisgrid.JointGrid at 0x1609dc880>

Seaborn and Pandas: Data Formats¶

Seaborn is designed to work with Panda's DataFrames.

The whole DateFrame can be passed with the data parameter and then columns can be selected using their name.

In [19]:
data = pd.DataFrame({
    "x": np.linspace(0, 20, 10000),
    "y": np.sin(np.linspace(0, 20, 10000))
})
In [20]:
sns.lineplot(x="x", y="y", data=data)
Out[20]:
<AxesSubplot:xlabel='x', ylabel='y'>

However, Seaborn also accepts other data types:

In [21]:
x = np.linspace(0, 20, 10000)
y = np.sin(x)

sns.lineplot(x=x, y=y)
Out[21]:
<AxesSubplot:>
In [22]:
sns.histplot(y)
Out[22]:
<AxesSubplot:ylabel='Count'>

etc..

But of course you lose many of the helpful features of the DataFrame integration. (Most notably: Automatic axes labeling!).

DateFrames: Long- vs. Wide-form¶

DataFrames can contain data in different formats. For example, in longform format, where each variable has its own column.

Or in wideform format, which is more like traditional Excel spreadsheets and only contrasts two values.

pandas is best at handling longform-based data:

In [23]:
flights = sns.load_dataset("flights")
flights.head()
Out[23]:
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121

Here the data for the vast majority of plots are automatically aggregated and correctly prepared.

For example, here the spread of the number of passenger per month is automatically aggregated by year:

In [24]:
sns.lineplot(x="year", y="passengers", data=flights)
Out[24]:
<AxesSubplot:xlabel='year', ylabel='passengers'>
In [25]:
sns.lineplot(x="year", y="passengers", hue="month", data=flights)
Out[25]:
<AxesSubplot:xlabel='year', ylabel='passengers'>

The same mechanism also works the other way round:

In [26]:
sns.lineplot(x="month", y="passengers", data=flights)
Out[26]:
<AxesSubplot:xlabel='month', ylabel='passengers'>
In [27]:
sns.lineplot(x="month", y="passengers", hue="year", data=flights)
Out[27]:
<AxesSubplot:xlabel='month', ylabel='passengers'>

Messy Data:¶

Some datasets also come in more complex formats. For example, different hierarchical levels could be mixed.

In [28]:
freqs = pd.read_csv("freqs-engl.txt", sep="\t")
freqs.head()
Out[28]:
ID year author title subgenre genre form word_types size the ... many against faith put about leaue might brother friend none
0 1 1607 Beaumont_Francis Knight_of_the_Burning_Pestle Burlesque_Romance comedy unknown 3223 21105 2.629709 ... 0.061597 0.037906 0.080550 0.061597 0.080550 0.042644 0.028429 0.018953 0.075811 0.052120
1 2 1641 Brome_Richard Jovial_Crew comedy comedy mixed 3576 24190 3.137660 ... 0.062009 0.033072 0.008268 0.057875 0.037205 0.000000 0.045473 0.008268 0.095081 0.062009
2 3 1601 Chapman_George All_Fools comedy comedy unknown 3475 19143 2.606697 ... 0.041791 0.078358 0.026119 0.062686 0.073134 0.041791 0.062686 0.130596 0.073134 0.031343
3 4 1596 Chapman_George Blind_Beggar_of_Alexandria comedy comedy unknown 2352 13140 2.838661 ... 0.022831 0.053272 0.015221 0.083714 0.045662 0.129376 0.060883 0.038052 0.015221 0.053272
4 5 1604 Chapman_George Bussy_DAmbois Foreign_History history unknown 3695 19781 3.184874 ... 0.050554 0.070775 0.035387 0.050554 0.045498 0.070775 0.065720 0.030332 0.096052 0.045498

5 rows × 209 columns

Example: Comparing the frequencies of you and thoufor tragedies and comedies.

To generate a histogram of the frequencies of the two words for both genres, we need to convert the data into long-form using the .melt method of DataFrames.

In [29]:
plot_df = freqs.query("genre == 'tragedy' or genre == 'comedy'").melt(
    id_vars=["genre", "title", "year"],
    value_vars=["you", "thou"],
    var_name="token",
    value_name="freq"
)
plot_df
Out[29]:
genre title year token freq
0 comedy Knight_of_the_Burning_Pestle 1607 you 2.018479
1 comedy Jovial_Crew 1641 you 2.414221
2 comedy All_Fools 1601 you 2.021627
3 comedy Blind_Beggar_of_Alexandria 1596 you 1.993912
4 tragedy Byrons_Conspiracy 1608 you 1.246883
... ... ... ... ... ...
253 tragedy Duchess_of_Malfi 1614 thou 0.383885
254 tragedy White_Devil 1612 thou 0.365200
255 comedy Cobblers_Prophecy 1590 thou 0.948917
256 comedy Three_Ladies_of_London 1581 thou 1.115419
257 comedy Three_Lords_and_Three_Ladies 1590 thou 0.662318

258 rows × 5 columns

Since we lose data by applying this transformation, it is recommended to save the result in a new DataFrame...

In [30]:
sns.displot(
    x="freq",
    hue="token",
    col="genre",
    kde=True,
    data=plot_df
)
Out[30]:
<seaborn.axisgrid.FacetGrid at 0x160cbb670>

Matplotlib als Seaborn-Backend und weitere Anpassungsmöglichkeiten.¶

seaborn uses matplotlib as a backend framework to create the plots.

This means, it is to extend seaborn plots with matplotlib.

However, this is not necessary in all cases where you want to customize seaborn plots, because seaborn itself also provides some functions for this.

For this you have to distinguish between two types of plots:

  • axes_level plots
  • figure_level plots

axes_level plots return a matplotlib axes object containing the plot while figure_level plots return a FacetGrid object containing the plot.

FacetGrid¶

FacetGrid objects are special containers that seaborn uses to encapsulate one (or more) graphic(s) and the data they generate.

In [31]:
df.head(3)
Out[31]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
In [32]:
g = sns.FacetGrid(df)

You can assign individual columns and rows of a 'FacetGrid' to specific variables from the data set.

In [33]:
g = sns.FacetGrid(df, col="species", row="sex", hue="island")

Using the .map method of Facetgrid, it is possible to apply various plotting functions to each subplot (and its associated data) of a FacetGrid.

In [34]:
g.map(sns.scatterplot, "body_mass_g", "bill_length_mm")
g.add_legend()
g.figure
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
/opt/homebrew/Caskroom/miniforge/base/envs/python_intro/lib/python3.9/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  plot_args = [v for k, v in plot_data.iteritems()]
Out[34]:

Certain plotting functions of Seaborn require the data as DataFrame via the data parameter. To apply those functions to the FacetGrid too, you can use the .map_dataframe method.

In [35]:
g = sns.FacetGrid(df, col="species", row="sex", hue="island")

g.map_dataframe(sns.swarmplot, y="body_mass_g")
g.add_legend()
g.figure
Out[35]:

FacetGrid objects encapsulate the subplots they contain in the axes attribute.

In [36]:
g.axes
Out[36]:
array([[<AxesSubplot:title={'center':'sex = Male | species = Adelie'}, ylabel='body_mass_g'>,
        <AxesSubplot:title={'center':'sex = Male | species = Chinstrap'}>,
        <AxesSubplot:title={'center':'sex = Male | species = Gentoo'}>],
       [<AxesSubplot:title={'center':'sex = Female | species = Adelie'}, ylabel='body_mass_g'>,
        <AxesSubplot:title={'center':'sex = Female | species = Chinstrap'}>,
        <AxesSubplot:title={'center':'sex = Female | species = Gentoo'}>]],
      dtype=object)
In [37]:
g.axes[0][0].set_title("1.")
g.axes[0][1].set_title("2.")
g.figure
Out[37]:

The entire graphic is stored in the figure attribute.

These objects are again classic matplotlib graphics and can be adapted or processed accordingly.

In [38]:
g.figure.suptitle("My first custom FacetGrid :-)", y=1.1)
g.figure
Out[38]:

The advantage of 'FacetGrids' is that you can create and customize your own plots quite flexibly without having to drop any of seaborn's convenient features.

figure_level-Plots¶

High-level plot functions, such as relplot, catplot or displot mostly return a FacetGrid object.

In [39]:
g = sns.catplot(x="species", y="body_mass_g", hue="sex", data=df)
In [40]:
type(g)
Out[40]:
seaborn.axisgrid.FacetGrid

Since FacetGrid serve as containers for axes, figure, they are poorly adapted to other graphics and should be used to create a coherent graphic.

axes_level-Plots¶

As the name suggests, axes_level plots return a matplotlib axes object. axes_level plots are intended to be a drop-in replacement for matplotlib functions and can be well integrated into other plots or matplotlib workflows.

In [41]:
data.head(3)
Out[41]:
x y
0 0.000 0.000
1 0.002 0.002
2 0.004 0.004
In [42]:
fig, axes = plt.subplots(2, 1)
axes[0].plot(data["x"], data["y"])
axes[0].set_title("Sine Curve")
sns.histplot(x=data["y"], ax=axes[1])
axes[1].set_title("Histogram of sine values")
fig.tight_layout()
fig.suptitle("Example for a combined matplotlib and seaborn plot", y=1.1)
plt.show()
In [ ]: