After this part you'll know about:
def mean(values):
return sum(values) / len(values)
mean([1,2,3])
2.0
def
signifies the start of a function definitiondef
is followed by the function name (must be unique within the current program)# A function with two arguments
def concat_strings(string_1, string_2):
return string_1 + string_2
# A function with no arguments
def smile():
return "😊"
print(concat_strings("Hello ", "World"))
print(smile())
Hello World 😊
def print_greeting(name):
print(f"Hello {name}")
type(print_greeting("Lennart"))
Hello Lennart
NoneType
Inside the function's body, you can create new variables which only live in this local scope:
%%script python --no-raise-error
def concat_strings(string_1, string_2):
concatenated_string = string_1 + string_2
return concatenated_string
print(concat_strings("Hello ", "World"))
print(concatenated_string)
Hello World
Traceback (most recent call last): File "<stdin>", line 6, in <module> NameError: name 'concatenated_string' is not defined
Note: It is possible to go the other way round and access outer-scope, global variables from inside a function, but you should never do this unless you have a good reason to do so. In all other cases, always pass all required data explicitly as arguments!
Functions are objects that can be passed around like any other variable/ datatype.
For example, this allows to pass a function to other functions:
authors = ["Goethe", "Kracht", "Fontane", "Schiller", "Hahn"]
print(sorted(authors))
print(len("Goethe"))
print(sorted(authors, key=len))
['Fontane', 'Goethe', 'Hahn', 'Kracht', 'Schiller'] 6 ['Hahn', 'Goethe', 'Kracht', 'Fontane', 'Schiller']
Sometimes you'll end with a function that has argument(s) that usually have the same value, but they occasionally might change.
To alleviate this, you can define default values for the arguments.
def product(values, start=1):
product = start
for value in values:
product = product * value
return product
print(product([1, 2, 3]))
print(product([1, 2, 3], start=2))
6 12
You can usually use the argument_name=value
syntax to make explicit which value to assign the which argument.
print(product(values=[1,2,3], start=1))
6
If you do this for all arguments, you can also rearrange the order of the arguments.
Otherwise, their mapping to the arguments is inferred by order of values.
If processing data you often want to a apply a operation to all elements in a list, tuple, etc.
len_names = []
for name in authors:
len_names.append(len(name))
print(len_names)
[6, 6, 7, 8, 4]
Using list expression you can shorten these recurring code blocks into one elegant single expression:
len_names = [len(name) for name in authors]
print(len_names)
[6, 6, 7, 8, 4]
" ".join([token.capitalize() for token in "my thesis title".split()])
'My Thesis Title'
Doing so has several benefits:
You can also use a trailing if statement to filter out any values that do not satisfy a condition.
even_numbers = [number for number in range(1, 26) if number % 2 == 0]
print(even_numbers)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24]
If-else
statements in list expressions¶Using the short inline form of if-else
, it is also possible to build more complex forms of expressions
even_numbers_str = [
f"{number} is even" if number % 2 == 0 else f"{number} is odd"
for number in range(1, 26)
]
print(even_numbers_str)
['1 is odd', '2 is even', '3 is odd', '4 is even', '5 is odd', '6 is even', '7 is odd', '8 is even', '9 is odd', '10 is even', '11 is odd', '12 is even', '13 is odd', '14 is even', '15 is odd', '16 is even', '17 is odd', '18 is even', '19 is odd', '20 is even', '21 is odd', '22 is even', '23 is odd', '24 is even', '25 is odd']
Objects store data, and functions that operate on the data.
text = "OOP can be helpful" # <- Data: Sequence of characters
text.upper() # <- `Function` (actually it is called method)
# that operates on the data of the string in variable `text`
'OOP CAN BE HELPFUL'
We already learned about Python's most straight forward way to bundle data:
my_dataset = {
"excavation_ids": ["south_1", "south_2", "south_3", "north_1", "north_2", "east_1", "east_2"],
"num_artifacts": [1, 3, 4, 10, 9, 2, 5]
}
Now suppose we want to check in which direction we on average find the most artificats:
def mean_by_direction(excavation_ids, num_artifacts):
directions = [excavation_id.split("_")[0] for excavation_id in excavation_ids]
results = {}
for uniq_direction in set(directions):
aggregated_num_findings = [
n_artifacts
for excavation_id, n_artifacts in zip(excavation_ids, num_artifacts)
if excavation_id.startswith(uniq_direction)
]
results[uniq_direction] = mean(aggregated_num_findings)
return results
mean_by_direction(my_dataset["excavation_ids"], my_dataset["num_artifacts"])
{'south': 2.6666666666666665, 'north': 9.5, 'east': 3.5}
This function is specifically tailored to the dataset, so it wouldn't make much sense to reuse it in other scenarios, but it would be cool to have it every time you interact with the dataset.
But, since we can also pass a function around as we do with other variables, we can also store the function in the dictionary:
excavation_dataset = {
"excavation_ids": ["south_1", "south_2", "south_3", "north_1", "north_2", "east_1", "east_2"],
"num_artifacts": [1, 3, 4, 10, 9, 2, 5],
"func_mean_by_direction": mean_by_direction
}
So now, we can access the function via the dataset-dict.
excavation_dataset["func_mean_by_direction"](
excavation_dataset["excavation_ids"],
excavation_dataset["num_artifacts"]
)
{'south': 2.6666666666666665, 'north': 9.5, 'east': 3.5}
Even though it is - in theory - possible to bundle data and functions this way, it is messy, clunky and overly complicated!
-> Never do it this way
This one of many problem setting, where object oriented programming shines.
Let's create a custom class representing our dataset:
class ExcavationDataset:
# Constructor of the class, defines how a object is created
# Its main purpose is to define the way data is stored within the object.
def __init__(self, excavation_ids, num_artifacts):
# The self-argument in each method represents the object itself.
# It is used to access the attributes and methods of the object.
# If you call a method, it's not necessary to care about the self argument,
# Python will set it automatically
# Each following argument is exposed to the outside and must be set when calling the method.
# By assinging fields to self you can create attributes
# (I.e., internal variables that store data within an object)
self.excavation_ids = excavation_ids
self.num_artifacts = num_artifacts
# Besides special methods such as the __init__-constructor-method, you can also define custom methods.
# The mean_by_direction method only operates on the internal data of the objects,
# so it does not receive any external arguments.
def mean_by_direction(self):
directions = [
excavation_id.split("_")[0]
# Using the self-argument you can access attributes and other methods of the object
for excavation_id in self.excavation_ids
]
results = {}
for uniq_direction in set(directions):
aggregated_num_findings = [
n_artifacts
for excavation_id, n_artifacts in zip(self.excavation_ids, self.num_artifacts)
if excavation_id.startswith(uniq_direction)
]
results[uniq_direction] = mean(aggregated_num_findings)
return results
Now let's create a object of this class, which stores our data:
dataset = ExcavationDataset(
excavation_ids=["south_1", "south_2", "south_3", "north_1", "north_2", "east_1", "east_2"],
num_artifacts=[1, 3, 4, 10, 9, 2, 5]
)
type(dataset)
__main__.ExcavationDataset
Like for dicts we can access the contents of an object
dataset.excavation_ids
['south_1', 'south_2', 'south_3', 'north_1', 'north_2', 'east_1', 'east_2']
dataset.num_artifacts
[1, 3, 4, 10, 9, 2, 5]
And you can call your custom methods the same way, you'd call methods of other objects (e.g., strings, dicts, ...):
dataset.mean_by_direction()
{'south': 2.6666666666666665, 'north': 9.5, 'east': 3.5}
Defintion of how to structure and create objects of the same kind (like a Blueprint)
Concrete manifestion of a class. They contain concrete data. Mutliple objects of the same class can be instantiated.
Functions that are bound to an object, and can access its data (via the self-argument).
Data of an object.
Classes can inherit properties (e.g. methods and attributes) from other classes.
If class B
inherits properties from class A
, class A
is called superclass of B
. Vice versa Class B
is called subclass (or child-class) of A
.
Most python libraries are structured using the OOP paradigm, forcing you to interact with objects. Knowing their structure and rules, you can work more efficiently and concentrate on the things that matter most - the data and your research question(s).
If you want to customize things or even newly implement your own ideas, chances are high that you are required to do this in an OOP manner. So you'll need a thorough understanding of classes and the basics of inheritance.
Additionally, structuring your code in the OOP paradigm often enables you to store your data in semantically structured objects. Doing so not only spares you hatred from your collaborators but also from your future self since it makes your code much more accessible and understandable.
class RegressionModel:
def __init__(self, learning_rate=1.0, n_iterations=100):
self.learning_rate = learning_rate
self.n_iterations = n_iterations
def predict(self, x):
y_pred = [self.weight_ * xi + self.bias_ for xi in x]
return y_pred
def fit(self, x, y):
self.weight_, self.bias_ = sum(x) / len(x), 0.0
for iteration in range(self.n_iterations):
weight_gradient, bias_gradient = 0.0, 0.0
for xi, yi in zip(x, y):
weight_gradient += xi * (self.predict([xi])[0] - yi)
bias_gradient += self.predict([xi])[0] - yi
self.weight_ -= self.learning_rate * (weight_gradient / len(x))
self.bias_ -= self.learning_rate * (bias_gradient / len(x))
return self
Now you can easily:
import matplotlib.pyplot as plt
x, y = [i/100 for i in range(1, 101, 5)], [3-((i/100)) for i in range(1, 101, 5)]
regressor = RegressionModel().fit(x, y)
print(f"Learned weight: {regressor.weight_} and bias: {regressor.bias_}")
x_pred = [i/100 for i in range(-25, 126, 5)]
y_pred = regressor.predict(x_pred)
plt.scatter(x, y)
plt.plot(x_pred, y_pred, color="red")
plt.show()
Learned weight: -0.99751188719238 and bias: 2.998707444486552
import pickle
# Save model to disk
with open("my_model.pickle", "wb") as out_f:
pickle.dump(regressor, out_f)
# Delete the model from memory
del regressor
# Load it from disk
with open("my_model.pickle", "rb") as in_f:
regressor = pickle.load(in_f)
print(f"Learned weight: {regressor.weight_} and bias: {regressor.bias_}")
Learned weight: -0.99751188719238 and bias: 2.998707444486552
As you can see in the example above, OOP enables you to bundle some parameters (weights and bias) and the routines that describe how to apply these parameters to the data in one single location. This allows for easy access and interoperability.
An essential pillar of Object-Oriented Programming is the concept of inheritance. Inheritance enables classes (and, therefore, objects) to inherit attributes and methods from other types (=classes). Using this mechanism, you can derive subclasses from another class and only update parts incompatible with its new purpose.
Now we also want to store the depth in wich the artifacts where found.
# Recall the ExcavationDataset
class ExcavationDataset:
def __init__(self, excavation_ids, num_artifacts):
self.excavation_ids = excavation_ids
self.num_artifacts = num_artifacts
def mean_by_direction(self):
directions = [
excavation_id.split("_")[0]
for excavation_id in self.excavation_ids
]
results = {}
for uniq_direction in set(directions):
aggregated_num_findings = [
n_artifacts
for excavation_id, n_artifacts in zip(self.excavation_ids, self.num_artifacts)
if excavation_id.startswith(uniq_direction)
]
results[uniq_direction] = mean(aggregated_num_findings)
return results
# We state the class we want to inherit in round brackets after the class name.
class DepthExcavationDataset(ExcavationDataset):
def __init__(self, excavation_ids, num_artifacts, depth_of_artifacts):
# We can leverage the constructor of our superclass
# since it already defines how to handle the values of excavation_ids and num_artifacts.
# Calling the super function within a method returns the superclass of the current class.
super().__init__(excavation_ids=excavation_ids, num_artifacts=num_artifacts)
# We only define how to handle the new arguments.
self.depth_of_artifacts = depth_of_artifacts
# Now, we can add new methods to the subclass
def count_artifacts_depth_range(self, min_depth, max_depth):
return sum([
n_artifacts
for n_artifacts, depth in zip(self.num_artifacts, self.depth_of_artifacts)
if min_depth <= depth <= max_depth
])
dataset = DepthExcavationDataset(
excavation_ids=["south_1", "south_2", "south_3", "north_1", "north_2", "east_1", "east_2"],
num_artifacts=[1, 3, 4, 10, 9, 2, 5],
depth_of_artifacts=[0.3, 0.2, 1.0, 0.1, 0.2, 0.5, 0.5]
)
Now we can use the new method(s):
dataset.count_artifacts_depth_range(0.1, 0.4)
23
And all those from the superclass:
dataset.mean_by_direction()
{'south': 2.6666666666666665, 'north': 9.5, 'east': 3.5}
Functions and OOP aim at making your code interoperable and reusable! Still, one mandatory ingredient is missing to ensure that your future-you and your colleagues understand what your classes and functions do: Documentation.
from typing import List
from string import punctuation
def tokenize(text: str) -> List:
"""Tokenizes a text and strips all punctuation marks from single tokens.
Args:
text: The text to tokenize
Returns:
The list of tokens
"""
tokens = [token.strip(punctuation) for token in text.split() if token.strip()]
return tokens
print(help(tokenize))
print(tokenize("""
My dear friend,
Documenting my code is a hardly bearable chore....
But I will pull myself together for the noble cause!!!
Yours truly,
A responsible programmer
"""))
Help on function tokenize in module __main__: tokenize(text: str) -> List Tokenizes a text and strips all punctuation marks from single tokens. Args: text: The text to tokenize Returns: The list of tokens None ['My', 'dear', 'friend', 'Documenting', 'my', 'code', 'is', 'a', 'hardly', 'bearable', 'chore', 'But', 'I', 'will', 'pull', 'myself', 'together', 'for', 'the', 'noble', 'cause', 'Yours', 'truly', 'A', 'responsible', 'programmer']
->
arrow)Luckily, methods within a class can be documented like functions. Additionally, you can add further documentation to describe the class as a whole.
from typing import List, Dict
class ExcavationDataset:
"""This class represents a dataset describing a set of excavation sites.
Attributes:
excavation_ids: ID of each individual excavation site of format "<direction>_<num>"
num_artifacts: The number of found artifacts at each site.
"""
def __init__(self, excavation_ids: List[str], num_artifacts: List[int]):
"""Constructor of the class.
Attributes:
excavation_ids: See class description
num_artifacts: See class description
"""
self.excavation_ids = excavation_ids
self.num_artifacts = num_artifacts
def mean_by_direction(self) -> Dict[str, float]:
"""Returns the mean number of found artifacts per direction.
"""
directions = [
excavation_id.split("_")[0]
for excavation_id in self.excavation_ids
]
results = {}
for uniq_direction in set(directions):
aggregated_num_findings = [
n_artifacts
for excavation_id, n_artifacts in zip(self.excavation_ids, self.num_artifacts)
if excavation_id.startswith(uniq_direction)
]
results[uniq_direction] = mean(aggregated_num_findings)
return results
help(ExcavationDataset)
Help on class ExcavationDataset in module __main__: class ExcavationDataset(builtins.object) | ExcavationDataset(excavation_ids: List[str], num_artifacts: List[int]) | | This class represents a dataset describing a set of excavation sites. | Attributes: | excavation_ids: ID of each individual excavation site of format "<direction>_<num>" | num_artifacts: The number of found artifacts at each site. | | Methods defined here: | | __init__(self, excavation_ids: List[str], num_artifacts: List[int]) | Constructor of the class. | Attributes: | excavation_ids: See class description | num_artifacts: See class description | | mean_by_direction(self) -> Dict[str, float] | Returns the mean number of found artifacts per direction. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined)
Python's typing module provides a large set of prebuilt type-annotations. We won't go through that in detail. If you want to read up on this, you can start with this tutorial.