Python Advanced – Course

Yoan Mollard

CHAPTER 1

  1. ADVANCED PROGRAMMING TECHNIQUES
    1.1. Python types
    1.2. Type annotations
    1.3. Complexity and the Big-O notation
    1.4. Iterators & generators
  2. CHARACTERISTICS AND PARADIGMS OF PYTHON
    2.1. Object-oriented programming (OOP)
    2.2. Metaclasses
    2.3. Functional programming
    2.4. Decorators
    2.5. Context manager: the with statement

CHAPTER 2

  1. CODE WITH QUALITY
    3.1. Logging
    3.2. Virtual environments (venv)
    3.3. Python Enhancement Proposals (PEPs)
    3.4. Quality control tools
    3.5. Testing
  2. PACKAGE AND DISTRIBUTE
    4.1. Reminders about Modules and packages
    4.2. The Python Package Index (PyPI.org)
    4.3. Package distribution
    4.4. Uploading your package distribution on PyPI

CHAPTER 3

  1. TOOLING FOR PERFORMANCE
    5.1. Profiling
    5.2. Python interpreters and compiling strategies
    5.3. Common design patterns
    5.4. Python protocols

  2. PARALLELIZE PYTHON APPLICATIONS
    6.1. Multithreading and multiprocessing
    6.2. Asynchronous programming (Python coroutines)

  3. SPECIALIZED FRAMEWORKS

List of mini-projects

  1. Stock investmentComplexity analysis, iterators, multiprocessing
  2. Money transfer simulatorPackage and project lifecycle
  3. Bread-First Search in a graphTime optimization
  4. Code breakerTime optimization
  5. Chess masterAsynchronous programming
  6. Virus spread simulatorMagic methods, operator overload, iterators, protocols
  7. Personal projectUsing specialized frameworks

CHAPTER 1

ADVANCED PROGRAMMING TECHNIQUES

(Including reminders)

Python types

Python typing is dynamic. Type is inferred from the value ➡️ Runtime type

Runtime type of v can be introspected with type(v)

But pythonistas rely on 🦆 duck typing: To state about the suitability of an obj object, the runtime type type(obj) is less important than the methods it declares

Example: As soon as method __iter__ exist in class C, then C is considered an iterable, no matters what type(C) returns.

Primitive types

i = 9999999999999999999999999                   # int (unbound)
f = 1.0                                         # float
b = True                                        # bool
n = None                                        # NoneType (NULL)

🚨 Beware with floats

Python floats are IEEE754 floats with mathematically incorrect rounding precision

0.1 + 0.1 + 0.1 - 0.3 == 0    # This is False 😿
print(0.1 + 0.1 + 0.1 - 0.3)  # Returns 5.551115123125783e-17 but not 0

Also, they are not able to handle large differences of precisions:

1e-10 + 1e10 == 1e10          # This is True 😿

👀 If precision counts, multiply floats by 10**k to work with ints, or use Decimal:

from decimal import Decimal
Decimal("1e-10") + Decimal("1e10") == Decimal("1e10")   # This is False 🎉

⚠️ Don't init Decimal with floats: Decimal(0.1) = Decimal('0.100000000000005')
⚠️ Decimal is faster than 10**k with + and - but slower with * and /

Python collections (data containers)

Collections allow to store data in structures.

General purpose built-in containers are dict, list, set, and tuple. Other containers exist in module collections.

Definition: Some Python collections are said mutable because they can be updated and modified at any moment during runtime.

Quizz: Do you know which of these types are mutable and immutable:

  • list
  • dict
  • tuple
  • str
  • bytes
  • bytesarray

Mutable types: Flexible but higher memory usage
Immutable: Static but low memory usage. Interesting for safety (e.g. crypto keys)

Mutable collections

The list
l = ["The", "list", "type", "is", "central", "in", "Python"]
l = list(("Conversion", "tuple", "to", "list"))
l = list("Hello")
l = "".join(["H", "e", "l", "l", "o"])
l.append("element") # Append at the end (right side)
l.insert("a", 5)    # less used. Beware to performance!
l.pop()             # Remove from the end. beware to pop(i) !
The dictionary (Order guaranteed since 3.7)
d = {}
dict(zip(("article", "price", "stock"), ("Logitech M180", 99.90, 5)))
d.update({"foo": "bar"})
d.keys()     # dict_keys
d.values()   # dict_values
The set

The set stores unique elements and is handy for

s = set([-4, 2, 4, 3])  # Performant for removal of double values 
s.union([-4, 2, 8, 8])  
s.intersection([-4, 2, 8, 8])
8 in s  # Performant for "in set" operator while "in list" would be slow
The double-ended queue (deque)

A deque is handy to append or remove elements at both extremities:

from collections import deque
queue = deque(["Kylie", "Albert", "Josh"])
queue.appendleft("Anna")   # list.insert(0, "Anna") would be slow here
queue.popleft()    # list.pop(0) would be slow here:

Beware of the list: prefer other data structures when they perform best.

The heap queue (heapq)
  • Binary tree to manage priorities
  • Parents have a priority ≤ of all its children
import heapq

patients = [(3, 'John'), (1, 'Alice'),
(2, 'Bob'), (4, 'Sarah'), (2, 'Mike')]

heapq.heapify(patients) # remains a type 'list'

# Process the patients based on their priority
while patients:
    priority, name = heapq.heappop(patients)
    print(f'Seeing patient {name}' \
          f'with priority {priority}')
Dataclasses (>= 3.7)

Type intended to store data

from dataclasses import dataclass

@dataclass
class Person:
  name: str
  age: int

albert: Person = Person(name="Albert", age=42)  # type(albert) == Person

Dataclass benefits:

  • Built-in magic methods __init__, __repr__, __eq__
  • Built-in constructor
  • Pass frozen=True to make it immutable

Immutable sequences

Definition: An immutable data structure can be assigned to values only once at creation, then it cannot mutate (cannot change).

The string
s = "A string is immutable"
t = ("A", "tuple", "is", "immutable")

Example: put the first letter of these sequences in lower case:

s[0] = "a"
# TypeError: 'str' object does not support item assignment

s = "a" + s[1:]

"".join(["a"] + list(s[1:]))
The tuple

The tuple is the Python type for an array.

tuple1 = (42, -15, None, 5.0)
tuple2 = True, True, 42.5
tuple3 = 1,

It is also used during unpacking:

a, b = b, a   # Value swapping

And this is the type used for returning several values in a function:

def compute(a, b):
    return a+b, a-b, a*b, a/b

results = compute(5, 5)
sum, difference, product, quotient = compute(5, 5)
sum, *other = compute(5, 5)

Type annotations

With PEP484, function parameters, outputs, variables and attributes can be typed:

def sum(a: int, b: int) -> int:
    return a+b

my_value : int = sum(5, 5)   # OK: Type checking passes

s: bool = sum(5.0, 5)
# Linter warning: Expected "int", got "float" instead
# Linter warning: Expected "bool", got "int" instead
sum(5, 5).capitalize()
# Linter warning:  Unresolved attribute reference "capitalize" for "int"

Mistakes will NOT raise exception or prevent the interpreter from running the code in any way, only an (optional) type checker would notice.

To specify more complex annotations, import them from typing:

  • Any: every type
  • Union[X, Y, Z]: one among several types (e.g. int, float or str)
  • Callable[[X], Y]: function that takes X in input and returns Y
  • Optional[X]: either X or NoneType
  • ForwardRef(X): forward reference to X, used to circumvent circular imports
  • TypeVar: create your own Generic types like MyOwnList[int]
from typing import Union

def sum(a: Union[int, float], b: Union[int, float]) -> Union[int, float]:
    return a+b

sum(5.0, 5) # Now, this call is valid for the type checker

If you are referring to the current class, use quotes:

class Foo:
    def bar(self, foo:"Foo"):
        pass

Here is an example to limit accepted literals to only a subset:

from typing import Literal

UInt3 = Literal[0, 1, 2, 3, 4, 5, 6, 7]

def accepts_only_uint3(x: UInt3) -> None:
    pass

accepts_only_uint3(10)
# Expected type Literal[0, 1, 2, 3, 4, 5, 6, 7], got Literal[10] instead

🐍 Learn more about type annotations

Data containers can be partly or fully typed:

l: list[list[int]]
d: dict
d2: dict[str: float]

TypedDict is used for data validation and API doc to annotate the type of a dict:

from typing import TypedDict

class Person(TypedDict):  # Class intended to perform 
    name: str             # static checking to dicts
    age: int

albert: Person = {"name": "Albert", "age": 42}  

Notice that runtime type remains unchanged: type(albert) == dict

Before we practice... did you know...?

The and double-star flags

The star * is the flag that means 0 or n values. They are received in a list:

def compute(*args):
    sum, difference, product, quotient = 0, 0, 1, 1
    for value in args:   # args is a list
        sum += value
        difference -= value
        product *= value
        quotient /= value
    return sum, difference, product, quotient

sum, *other_results = compute(42, 50, 26, 10, 15)

A named parameter is passed to a function via its name instead of its position:

def sentence(apples=1, oranges=10):
   return f"He robbed {apples} apples and {oranges} oranges"

p = sentence(2, 5)
p = sentence()
p = sentence(oranges=2) 

The double star ** in the flag that means 0 or n named parameters.
They are received as a dictionary:

def sentence(**kwargs):
    for item, quantity in kwargs.items():  # kwargs is a dict
        print(f"He robbed {quantity} {item}")

sentence(apples=2, oranges=5)
# He robbed 2 apples
# He robbed 5 oranges

Complexity and the Big-O notation

In computer science, optimization consists into improving:

  • Time complexity: the quantity of CPU/GPU cycles used by an operation
  • Space complexity: the quantity of memory used by an operation

Optimizing time often requires more space to solve the same problem.
Optimizing space often requires more time to solve the same problem.

The less complex an operation is in terms of time/space, the better it is optimized.

According to the usecase, we may opt for the best optimization in space or time.

An optimized program is faster, greener and more economic, since both time (CPUs and GPUs) and space (RAM, hard drives and networks) consume energy.

Big-O is a notation that helps measure complexity of programs in time & space

It describes how greedy an operation is according the size of its input, in terms of time (CPU cycles) or space (Memory space). ThHus, it is function of input size n.

Examples:

  • O(n) in time = for an input of size n, the operation requires n CPU cycles
  • O(n*n) in time = for an input of size n, the operation requires n*n CPU cycles

ℹ️ Here, CPU cycle and n do not refer to a precise quantity (e.g. bytes, assembly instructions, time in seconds...), only the order of magnitude is important.

ℹ️ Big-O usually measures complexity in the average or worst case scenario.

Table of common big-O complexities

From best to worst performance:

Big-O complexity Complexity Name Example of time complexity with a list
O(1) Constant Read value at index [i]
O(n) Linear A single for i in range(n) loop
O(2n), O(3n) ... Linear Several consecutive for loops
O(n.log(n)) Logarithmic Sort list (with the quicksort method)
O(n²), O(n³) ... Polynomial for loops nested inside for loops
O(2ⁿ), O(10ⁿ) ... Exponential Single for i in range(k**n) loop

Deal with complexity in practice

If you wish to optimize your program in time and/or space, check time or space the complexity of any:

  • data structure that you are using
  • function that you are using
  • algorithm that you are using

The final complexity of your program depends of all of these.

Conclusion: If performance matters for your application, read the documentation about any data structure/function/algorithm that you are willing to use, and be careful about their behaviour and performance.

🐍 Time complexity of Python structures

Benchmark time complexity

timeit will repeat an instruction and return time statistics.

From an interactive interpreter with a line-magic:

import math, numpy

%timeit math.sqrt(25)
# 63.7 ns ± 0.445 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit numpy.sqrt(25)   
# 788 ns ± 3.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Use %%timeit when code is in a Jupyter Lab cell. Use it in a Python module too:

from timeit import timeit
import math, numpy

print(timeit("math.sqrt(25)", globals=globals()))
print(timeit("numpy.sqrt(25)", globals=globals()))

Benchmark Space complexity

import sys

one_million_iterator = range(1000000)
one_million_list = list(range(1000000))

sys.getsizeof(one_million_iterator)
# 48 Bytes

sys.getsizeof(one_million_list)
# 8000056 Bytes

🚨 Benchmarking tools may return different results according to the hardware capabilities and the CPU load.

🐍 Learn more about timeit

Iterators & generators

An iterable is a data structure on which one can iterate over: list, tuple...

An iterator is an object that performs the actual iteration on an iterable.

Once an iterator has been consumed it cannot be rewound (but an iterable can)

iter(l) returns an iterator on iterable l

next(i) generates the next element of iterator i

🦆 Duck typing considers that:

  • objects with a __iter__ method are iterables
  • objects with a __next__ method are iterators
class DivisorsOfIntegerIterator:
    def __init__(self, n: int):
        self.__n = n
        self.__last_divisor_tested = n // 2 + 1

    def __iter__(self):    # This magic makes this class an iterable
        return self
    
    def __next__(self):    # This magic makes this class an iterator
        if self.__last_divisor_tested == 1:
            raise StopIteration("There is no more divisor")
        divisor = self.__last_divisor_tested - 1
        self.__last_divisor_tested = divisor
        return divisor if self.__n % divisor == 0 else next(self)

An iterable would then return a new iterator every time __iter__() is called,
causing the iteration to reset every time:

class DivisorsOf:       # This is an iterable
    def __init__(self, n: int):
        self.n = n
    def __iter__(self): # __iter__ returns an iterator
        return DivisorsOfIntegerIterator(self.n)
divisors = DivisorsOf(50)
for d in divisors:
    print(d, "is a divisor")

You can also get the iterator itself and iterate with manual calls to next():

it = iter(DivisorsOf(2))
print(next(it), "is the first divisor")  # 2 is the first divisor
print(next(it), "is the second divisor") # 1 is the second divisor
print(next(it), "is the third divisor")  # StopIteration: There is no more divisor

The Generator

A generator is a specific type of iterator created via a function instead of a class:

def divisors_of(n: int):
    for i in range(1, n // 2 + 1):
        if n % i == 0:
            yield i

Again, an iterable can be built out of the generator:

class DivisorsOf: # This is an iterable
    def __init__(self, n: int):
        self.n = n

    def __iter__(self):              # __iter__ returns an iterator...
        for i in range(1, self.n // 2 + 1):  # ...or more precisely, a generator
            if self.n % i == 0:
                yield i

Some builtin functions return Iterator:

In [1]: reversed([1, 2, 3, 4])
Out[1]: <list_reverseiterator at 0x75fad73d3070>

In [2]: zip([1,2], [3,4])
Out[2]: <zip at 0x75fad676ac40>

Getting an Iterable require explicit generation, e.g. list(reversed([1, 2]))

The 🐍 itertools module contains tools to create efficient iterators

When should we use iterators or generators?

  • when the iterable is infinite
  • when the space complexity would be too high with a regular list
  • when the time complexity would be too high with a regular list
  • when the list elements are unknown at the time the list is constructed
  • when it would be adapted to generate the list elements on demand

e.g. declare an iterator of (r, g, b) color pattern for your outdoor lights
e.g. generate an iterator of HTML pages to be retrieved when network is available
e.g. generate an iterator of thousands of 100MiB images

CHARACTERISTICS AND PARADIGMS OF PYTHON

Python is multi-paradigm:

  • Imperative: instructions create state changes
  • Object-oriented: instructions are grouped with their data in objects/classes
  • Functional: instructions are math function evaluations

All 3 paradigms are popular in the Python community, and often mixed all together, e.g:

sentence = "How do you do?"
"".join(map(lambda x: x.capitalize(), sentence.split(" ")))

# Out[0]: 'How Do You Do?'

Object-oriented programming (OOP)

Reminder of the class implementation syntax and inherence syntax

class Apartment:
    def __init__(self, surface):
        self._surface = round(surface)
    
    def get_description(self):
        return f"This flat has {self._surface}m²"

class FurnishedApartment(Apartment):
    def __init__(self, surface, furniture=("bed", "sofa")):
        super().__init__(surface)
        self.furniture = list(furniture)

    def get_description(self):
        return f"This flat of {self.surface}m² has: {self.furniture}"

FurnishedApartment(surface=50).get_description()    

Virtual methods

In Python, all methods are virtual.

It means that when accessing a method e.g. a.method() the interpreter will first try to resolve that method in class type(a), and if not present, in parent classes.

Polymorphism

Generic definition: Using heterogeneous data types in the same scope

10 + 20.0

How does polymorphism apply to OOP?

  • ad-hoc: same method name in other classes
["H", "e", "y"].index("e")
"Hey".index("y")
  • parametric: same name with different parameter types (overloading in Java)
def f(var):
    return 42 if isinstance(var, int) else 4.2
  • inheritance: method is inherited from a different (parent) type
class MyUpperCaseStr(UserString): pass

isinstance(MyUpperCaseStr(), UserString) # True: mystr() is of type "UserString"

MyUpperCaseStr(1e3).split(".") # Methods & attributes inherited from the parent
# Out[0]: ['1000', '0']

Reminder about public, protected and private scopes

In C++, the scope of attributes or methods can be:

  • private: read and write access from methods of the same class only
  • protected: read and write access from methods of the same class or its child classes only
  • public: read and write access from methods of any class

Python has similar concepts but does not enforce them:

  • private attributes or methods start with a double underscore
  • protected attributes or methods start with an underscore
  • otherwise they are public
class Foo:
    def __init__(self):
        self.public = 0
        self._protected = 0
        self.__private = 0        # ⚠ Name mangling applies here

Protected attributes are not enforced by the interpreter but private attributes are:

class BankAccount:
     def __init__(self):
         self.__balance = 3000
         
class Client:
     def make_transaction(self, bank_account: "BankAccount"):
         bank_account.__balance += 1000
         
Client().make_transaction(BankAccount())
# AttributeError: 'BankAccount' object has no attribute '_Client__balance'

Name mangling applies to all private attributes and methods but magic methods.

Class methods

While a regular method f(self) is an instance method because it applies to instance self, class methods apply to the class instead.

Their first parameter is no longer the instance self but the class type cls:

class Animal:
    @classmethod
    def define(cls):
        return f"An {str(cls)} is an organism in the biological kingdom Animalia."

Thus it is possible to call the class method from the class or the instance:

Animal.define()
Animal().define()

Static methods

Unlike instance methods and class methods, static methods do not receive any implicit parameter such as self or cls:

class Animal:
    @staticmethod
    def define():
        return "Animals are organisms in the biological kingdom Animalia."

They can be called on a class or an instance:

Animal.define()
Animal().define()

💡 Class and static methods are close concepts, but use the first only if you need the class type in parameter.

Properties, getters and setters

A Python property is an entity able to get, set or delete the attribute of an object

Its C++ equivalent are getters and setters e.g. car.getSpeed() & car.setSpeed(1.0)

Properties are useful to add code filters to public attributes

  • Example: raise exceptions when attributes are set to inconsistent values
  • Example: make sure that the self.month integer is between 1 and 12
  • Example make an attribute read-only (with a getter but no setter)

Create a property with the property() function or the @property decorator

property(fget=None, fset=None, fdel=None, doc=None)

Where:

  • fget is a function to get the value of the attribute (the getter)
  • fset is a function to set the value of the attribute (the setter)
  • fdel is a function to delete the attribute
  • doc is a docstring
class Date:
    def __init__(self):
        self.__month = 0
    
    def get_month(self):
        if self.__month == 0:
            raise ValueError("This date has not been initialised")
        return self.__month

    def set_month(self, new_month: int):
        if not 1 <= new_month <= 12:
            raise ValueError("Month can only be set between 1 and 12")
        self.__month = new_month
    
    month = property(get_month, set_month, doc="The integer month (1-12) of this date")
d = Date()
print(d.month)    # Will raise "This date has not been initialised"
d.month = 99      # Will raise "Month can only be set between 1 and 12"

Usually, properties are used as a decorator instead of a function:

class Date:
    def __init__(self):
        self.__month = 0
    
    @property
    def month(self):
        if self.__month == 0:
            raise ValueError("This date has not been initialised")
        return self.__month

    @month.setter
    def month(self, new_month: int):
        if not 1 <= new_month <= 12:
            raise ValueError("Month can only be set between 1 and 12")
        self.__month = new_month
d = Date()
print(d.month)    # Will raise "This date has not been initialised"
d.month = 99      # Will raise "Month can only be set between 1 and 12"

Magic methods

"Hidden" methods with an expected prototype and behaviour.

appart1 + appart2: Appartement.__add__(self, other)    # Addition
appart1 * appart2: Appartement.__mul__(self, other)    # Multiplication
appart1 == appart2: Appartement.__eq__(self, other)    # Equality test
str(appart): Appartement.__str__(self)                 # Readable string
repr(appart): Appartement.__repr__(self)               # Unique string

But also...

hasattr(appart, "price")                      # returns True if an attribute exists 
getattr(appart, "price"): Appartement.__getattr__(self, name)    # Get an attribute 
setattr(appart, "price", 1): Appartement.__setattr__(self, n, v) # Set an attribute
delattr(appart, "price"): Appartement.__delattr__(self, name)    # Drop an attribute

Let's try to alter the attributes of an instance using these magic methods...

In [1]: class Example: 
   ...:     def __init__(self): 
   ...:         self.some_attribute = 1 

In [2]: example=Example()

In [3]: example.some_attribute 
Out[3]: 1

In [4]: delattr(example, "some_attribute")

In [5]: example.some_attribute           
-------------------------------------------------------------------------
AttributeError: 'Example' object has no attribute 'some_attribute'

In [6]: type(example)  # The attribute no longer exists for this object...
Out[6]: Example        # ... but the type is still the same

🦆 Do you now understand why Python relies on duck typing?
(the runtime type type(obj) is less important than the methods it declares)

Reminder about class inheritance

All overriden methods are virtual: methods of the child classes are resolved first.

class Animal:
    def adopt(self):
        raise NotImplementedError(f"Sorry {__class__} cannot be adopted")

class Cat(Animal):
    def adopt(self):
        print("Thank you for adopting a 🐱!")

class Fly(Animal):
    pass
Cat().adopt()       # Thank you for adopting a 🐱!
Animal().adopt()    # Sorry class 'Animal' cannot be adopted
Fly().adopt()       # Sorry class 'Fly' cannot be adopted

This is how we implement abstract classes in Python, or class interfaces.

Multiple inheritance: the basic case

🐍 Learn more. Simple example: the tiger's taxonomy

class Animal:
    def avglifetime(self):
        return 67

class Mammalia:
    def avglifetime(self):
        return 35

class Felidae:
    pass

class Tiger(Felidae, Mammalia, Animal):
    pass
Tiger().avglifetime() # Returns 35: left-to-right resolution

Multiple inheritance: the MRO

Now let's consider that Mamalia & Felidae are also Animal:

class Animal:
    def avglifetime(self):
        return 67

class Mammalia(Animal):
    def avglifetime(self):
        return 35

class Felidae(Animal):
    def avglifetime(self):
        return 30

class Tiger(Animal, Mammalia):
    pass

There are now multiple paths for Tiger.avglifetime()

Python 3 uses C3 linearization algorithm. Resolution order is in a class attribute:

Tiger.mro()
# [__main__.Tiger, __main__.Felidae, __main__.Mammalia, __main__.Animal, object]

In some situations though, the MRO may be inconsistent:

class Animal: pass

class Mammalia(Animal): pass
  
class Tiger(Animal, Mammalia): pass  # The superclass Animal is before Mammalia

# TypeError: Cannot create a consistent method resolution order (MRO)
#   for bases Animal, Mammalia

A consistent MRO requires to validate the following:

  • All classes appear before their parents
  • All classes keep the same parent class order than their class statement

Metaclasses

A metaclass is to a class what a class is to an object.

A metaclass creates a new class at runtime.

type(int)    # Returns "type" 
type(type)   # Returns "type" 🤯

On this drawing:

  • The metaclass level (type is the default one, but you can create yours)
  • The class level (Dog, Cat ...)
  • The object level (dog, cat ...)

Metaprogramming: Modif of code at runtime

We may want to create a metaclass if we need to create a class at runtime.

Here NewClass is a class created out of type metaclass:

NewClass = type(name, bases, methods)
# name = name of the new class
# bases = tuple of base classes, if any
# methods = dictionary of methods (keys = method names, values = functions)

Then we can define a new class at runtime:

def init(self, petname):
    self.petname = petname

Dog = type("Dog", (), {"__init__": init})
cooper = Dog("Cooper")

Creating a ClassGenerator metaclass with the type metaclass:

animals = {
    "Dog": {"bark": lambda self: "WAF!"},
    "Cat": {"meow": lambda self: "MEOW"}
}

class ClassGenerator(type):
    def __new__(cls, name, bases, methods):
        return super().__new__(cls, name, bases, methods)


Dog = ClassGenerator("Dog", (), animals["Dog"])
Cat = ClassGenerator("Cat", (), animals["Cat"])

Dog().bark()    # WAF!
Cat().meow()    # MEOW!

Warning:

  • In a metaclass (inhereiting from type), __new__ returns a class
  • In a class, __new__ returns an (uninitialized) object

In-between example: substitute the current class:

Metaprogramming not using metaclass because of no type parent


class MyFilePathClass:
    def __new__(cls):
        cls = WindowsFilePathClass if os.name == 'nt' else UnixFilePathClass
        return cls.__new__(cls)
    
    def __init__(self, path: str):
        self.path = "./" + path

Some usecases of metaclasses:

  • Class generation at runtime (e.g. from classes described in a configuration file)
  • Class checking (e.g. check the existance of compulsory/forbidden methods or attributes)
  • Class mutation (e.g. remove some existing methods when inheriting from another class)
  • Class substitution (e.g. construct this object with another class)

⚠️ Use metaprogramming carefully. It makes the code difficult to read and the wished behaviour can often be reached without metaprogramming

Functional programming

In functional programming your program is the result of the mathematical composition of several function calls that all take an input and return an output:

f ∘ g ∘ h or in Python h(g(f(x)))

That operator is the pipe in shell languages e.g. grep a file.txt | sort | uniq

In strict functional programming, no side effect is allowed, which means that even variable assignment is forbidden!

🐍 Learn more

Comprehensions

A comprehension is an inline notation to build a new sequence (list, dict, set).
Here is a list-comprehension:

l = [i*i for i in range(10)]  # Select i*i for each i in the original "range" sequence
# Returns [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

You may optionally filter the selected values with an if statement:

l = [i*i for i in range(100) if i*i % 10 == 0]  # Select values that are multiple of 10
# Returns [0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100]

l = [(i, 2*i, 3*i) for i in range(5)] # Here we select tuples of integers:
# Returns [(0, 0, 0), (1, 2, 3), (2, 4, 6), (3, 6, 9), (4, 8, 12)]

Dict-comprehensions also work:

d = {i: i*i for i in range(10)}
# Returns {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

Lambda functions

Lambda functions (aka unnamed functions) use an inline definition with no name:

lambda x: x*x

This lambda is the function that takes x in input and returns x*x in output.

Like any other type, function can then be assigned to variables ... and called:

squared = lambda x: x*x
type(squared)           # Returns "function"
squared(5)              # Returns 25

It is then equivalent to the regular function definition:

def squared(x):
    return x*x

Mapping

The map() function applies a function to every element from an iterable.

map(f, l)

map(squared, [-5, 15, 10, -20])  # Returns [25, 225, 100, 400]

Mapping example: Get all hexadecimal notations of a list of integers

hex(1024)          # Returns the hexadecimal notation of an integer (here "0x400")

map(hex, [2**x for x in range(5)])  # Returns ['0x1', '0x2', '0x4', '0x8', '0x10']

Functional programming example: capitalize each word

sentence = "hello my friend"

How can we generate this string with each word capitalized "Hello My Friend"?

sentence.capitalize()   # returns "Hello my friend"

" ".join(map(lambda x: x.capitalize(), sentence.split(" ")))

If we do not want to cheat with str.capitalize():

" ".join(map(lambda x: x[0].upper() + x[1:], sentence.split(" ")))

If we do not want to cheat with str.upper() (only with ASCII lowercase strings):

" ".join(map(lambda x: chr(ord(x[0]) - 32) + x[1:], sentence.split(" ")))

Decorators

The role of a decorator is to alter the behaviour of the function that follows with no need to modify the implementation to the function itself.

It can be seen as adding "options" to a function, in the form of a wrapper code.

@decorator
def function():
    pass

In that case the call of function() will be equivalent to decorator(function()).

Decorators can take parameters in input, independant from parameters of the function.

🐍 Learn more

Example 1: @classmethod is a decorator that passes the class type cls passed as the first parameter to the following function.

class Animal:
    @classmethod
    def define(cls):
        return "An " + str(cls) + " is an organism in the biological kingdom Animalia."

Example 2: Web frameworks usually use decorators to associate a function e.g. get_bookings_list() to an endpoint e.g. + a HTTP verb

Here is how Flask works:

app = Flask()   # We create a web app

@app.route("/bookings/list", method="GET")
def get_bookings_list():
    return "<ul><li>Booking A</li><li>Booking B</li></ul>"

To define your own decorator, you need to write a function returning a function:

from functools import wraps

def log_this(f):
    @wraps(f)
    def __wrapper_function(*args, **kwargs):
        print("Call with params", args, kwargs)
        return f(*args, **kwargs)
    return __wrapper_function
@log_this
def mean(a, b, round=False):
    m = (a + b)/2
    return int(m) if round else m
mean(5, 15, round=True) # shows: Call with params (5, 15) {'round': True}

The functools module is a set of decorators intended to alter functions.

Closures

An inner function is a function in a block, made to limit its scope (encapsulation)
A closure is an inner function returning a function with captured variables.

# Example: Compute a price with a base amount + variable daily rate
def get_rate_calculator(base_amount):
    current_daily_rate = some_network_callback()
    # current_daily_rate and base_amount are captured

    def calculate_rate(amount):
        return amount*current_daily_rate + base_amount

    return calculate_rate

compute_amount = get_rate_calculator(100)  # capture now!

house1price = compute_amount(150000)
house2price = compute_amount(90000)

Benefits: Sort of caching, only one network call

Context manager: the with statement

The keyword with is a context manager that protects a resource to make sure it is actually teared down after allocation in any case.

f = open("file.json", "w")
f.write()
# PROCESSING WRITING [...] 
f.write()
f.close()

What if an exception occurs during the processing of f? It wouldn't be closed.

The context manager ensures that the resource is closed in any case:

with open("file.json", "w") as f:
    f.write()

The standard library is compatible with context managers for files, locks, and synchronisation primitives. But you may also create your own:

class ProtectedResource:
   def __enter__(self):
       print("The resource is being open")

   def __exit__(self, type, value, traceback):
       print("The resource is being closed, with or without exception")
resource = ProtectedResource()
with resource:
    raise ValueError("Let's see if it works")

# The resource is being open
# The resource is being closed, with or without exception
# Traceback (most recent call last):
# ValueError: Let's see if it works

See also the contextlib in the standard library

CHAPTER 2

CODE WITH QUALITY

Add type annotations + docstrings

PEP 257 describes how RST doc-strings should be inserted in Python code:

def is_same_sign_than_or_positive(a: Union[float, int], b: Optional[Union[float, int]] = None) -> bool:
    """
    Returns True if :
       * either both parameters are of the same sign or equal
       * or if the first parameter is positive, in case no second parameter is provided
    Returns False otherwise

    :Example:

    is_same_sign_than_or_positive(5, None)
    is_same_sign_than_or_positive(5.6, -5)
    is_same_sign_than_or_positive(-1.5)

    ..warning: if parameters are float, strict equality is not guaranteed

    :param a: First element to be compared
    :param b: Optional second element to be compared, or None
    :return: True if a and b have the same sign or equal, False otherwise
    """
    return a * b >= 0 if b is not None else a > 0

What should have a docstring?
Everything that is exported by a module:

  • modules
  • functions
  • classes
  • public methods (including the __init__(self) constructor)
  • packages (via their __init__.py)

Let your IDE (e.g. Pycharm) autocomplete the docstring syntax for you!

The docstring can be accessed with the magic __doc__ (used by help()):

print(is_same_sign_than_or_positive.__doc__)

Logging

Python has a module dedicated to logging. It classes each log entry in levels of criticity: debug, info, warning, error and allows to filter them. 🐍 Learn more

logging.debug('Debug message')  # Lowest priority
logging.info('Info message')    # Higher priority
# Prefer the use of lazy evaluation
%timeit logging.info("%s", 42)
# 645 ns ± 43.1 ns per loop
%timeit logging.info(f"{42}")
# 787 ns ± 51.1 ns per loop
%timeit logging.info("{}".format(42))
# 876 ns ± 54.9 ns per loop

Step 1: Produce log entries

The logging library uses modular approach to organize logs in a big app. Usually every module has its own logger named as itself:

logger = logging.getLogger(__name__)
# foo/bar.py will be named "foo.bar"

When a message is posted to logger L:

  1. L decides whether to handle the event based on the level/filters
  2. Handlers of L get notified and react if their own level/filter match
  3. L's parent is notified, if appropriate

Step 2: Consume log entries

h = logging.StreamHandler()
h.setLevel("INFO")   # Accept all entries more critical than INFO

l = logging.getLogger("mymodule.submodule.subsubmodule")
l.setLevel("DEBUG")    # Accept all entries more critical than DEBUG

l.addHandler(h)

Both the logger & handler must accept the minimum level so the entry is printed

The simple config is a quick way to activate a stream handler for all loggers. But the output will be fussy since logs from all modules will be printed, imports too:

logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
logging.basicConfig(filename='app.log', level=logging.INFO)

Virtual environments (venv)

Context: All installed packages go into the site-packages directory of the interpreter.

The venv module provides support for creating lightweight “virtual environments” with their own site directories, optionally isolated from system site directories.

Each virtual environment has its own Python binary (which matches the version of the binary that was used to create this environment) and can have its own independent set of installed Python packages in its site directories.

🐍 Learn more

For each new project you create/clone, create it its own dedicated virtual environment:

/usr/bin/python3.7 -m venv dev/Training/venv

Then, every time you work on this project, activate its environment first:

source Training/venv/bin/activate

Your terminal must prefix the prompt with the name of the env:

(venv) yoan@humancoders ~/dev/Training $

In an activated venv, every call to the interpreter and every package installation will target the isolated virtual environment:

(venv) yoan@humancoders ~/dev/Training $ python

will run the Python version targeted by the venv

(venv) yoan@humancoders ~/dev/Training $ pip install numpy

will install the latest numpy version into the venv

You can quit the venv every time you stop working on the project:

(venv) yoan@humancoders ~/dev/Training $ deactivate
yoan@humancoders ~/dev/Training $ 

In practice, your IDE can handle venv creation, activation and deactivation automatically for you when you create or open/close a project.

🆕 PEP 668

You can no longer use pip to install packages outside a venv.

You can override this behaviour by passing --break-system-packages.

Python Enhancement Proposals (PEPs)

The PEPs rule the Python language. In their lifetime the PEPs are proposed, debated, rejected/accepted and implemented.

They are usually not very user-friendly but allow to understand some implementation choices and the implementation of the interpreter.

  • PEP 0 is the index of all PEPs, on peps.python.org
  • PEP 8 is the style guide for Python code: Indentation, line length, blank lines...
  • PEP 257 describe how to document code using RST docstrings

Quality control tools

Pyflakes, the semantic analyser

Pyflakes only focuses on the semantics (what your code stands for) but is not concerned about style.

import logging
variable = inexisting_variable
./main.py:2: 'logging' imported but unused
./main.py:3: undefined name 'inexisting_variable'

Pyflakes and all linters presented here can be installed and invoked the same way:

pip install pyflakes
pyflakes mymodule.py

Flake8

Flake8 = PEP 8 + Pyflakes (syntax + semantic analysis)

import numpy

def f():
    print("Hello world!")
main.py:1:1: F401 'numpy' imported but unused              # Semantic
main.py:3:1: E302 expected 2 blank lines, found 1          # Style

Pytype, the typing checker for untyped code

Pytype infers types of your code and performs a static check.

def f(i):
    return i + 1

f(None)
FAILED: No attribute '__add__' on 'i: None' or '__radd__' on '1: int'

Mypy, the static type checker for annotated code

Mypy performs type checking on annotated code.

def talk(i: int) -> None:
    print(f"There are {i} potatoes in the bucket")

talk([1, 2, 3])
error: Argument 1 to "talk" has incompatible type "List[int]"; expected "int"

Pydantic, the dynamic type checker for annotated code

What if data are coming from the network? From a file?

The static type checker will not suffice... it needs to be dynamic!

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str

user = User(id="123", name=42)
print(user)  # id=123 name='42'

Performs data validation and coercion (conversion), in lax mode (e.g. int -> float possible), or strict mode (float is only a float).

Since APIs deal with external data, Pydantic is heavily used in APIs (e.g. FastAPI)

Testing

pytest and unittest are frequently used to test Python apps

Regular stages of a single unit test:

  • Setup: Prepare every prerequisite for the test
  • Call: call the tested function with input parameters setuped before
  • Assertion: an assert is a ground truth that must be true
  • Tear down: Cleanup everything that has been created for this test

On top of these :

  • hypothesis generate representative property-based test data
  • tox runs tests in multiple envs (e.g. Python versions, numpy versions ...)

Test files are sometimes placed in a tests/ directory, file names are prefixed with test_*.py and test function names are also prefixed with test_

pyproject.toml
mypkg/
    __init__.py
    app.py
    view.py
tests/
    test_app.py
    test_view.py
    ...

Naming tests according to these conventions will allow auto-discovery of tests by the test tool: it will go through all directories and subdirectories looking for tests to execute.

class WaterTank:                          # water_tank.py
    def __init__(self):
        self.level = 10
    def pour_into(self, recipient_tank: "WaterTank", quantity: int):
        self.level -= quantity
        recipient_tank.level += quantity
from water_tank import WaterTank          # test_water_tank.py
def test_water_transfer():                # Example-based testing
    a = WaterTank()
    b = WaterTank()
    a.pour_into(b, 6)
    assert a.level == 4 and b.level == 16 
@given(st.integers(min_value=0, max_value=1000))
def test_water_transfer(quantity):        # test_water_tank_hypothesis.py
    a = WaterTank()                       # Property-based testing
    b = WaterTank()
    a.pour_into(b, quantity)
    assert a.level == 10 - quantity and b.level == 10 + quantity

PACKAGE AND DISTRIBUTE

Reminders about Modules and packages

A module is a Python file, e.g. some/folder/mymodule.py. The module name uses the dotted notation to mirror the file hierarchy: some.folder.mymodule

Either the module is made to be:

  • executed from a shell: it is a script: python some/folder/mymodule.py
  • imported from another module: import mymodule (need to be installed in sys.path, see p97)

A Python package is a folder containing modules and optional sub-packages: some is a package, folder is a sub-package.

Shebangs of Python scripts

On UNIX OSes (Linux, MacOS...), a shebang is a header of a Python script that tells the system shell which interpreter is to be called to execute this Python module.

We invoke env to tell which is the interpreter for python3 with such header:

#!/usr/bin/env python3

Direct call to the interpreter is possible but NOT recommended, since it will force the interpreter by ignoring any virtual environment you could be in:

#!/usr/local/bin/python3

The Windows shell ignore shebangs.

Structure of Python packages

All packages and sub-packages must contain an __init__.py file each

In general __init__.py is empty but may contain code to be executed at import time

The package's hierarchy is inherited from the files-and-folders hierarchy.

Module names use the dotted notation:

import my_math.matrix.complex.arithmetic

my_math.matrix.complex.arithmetic.SomeClass().meth()

print(my_math.matrix.complex.arithmetic.__name__)
# my_math.matrix.complex.arithmetic

print(my_math.matrix.complex.arithmetic.__file__)
# /usr/[...]/my_math/matrix/complex/arithmetic.py
from my_math.matrix.complex import arithmetic
# Also works: loads arithmetic in the global scope
from my_math.matrix.complex.arithmetic import product
# Also works: loads a single function from the module

Optionally, a (sub-)package can by "executed" from command-line with the -m option:

python -m my_math.float

only if a module __main__.py has been placed at the root of the sub-package.

Then, executing the sub-package consists into running code in __main__.py

Relative imports (Imports internal to a package)

Relative import from the same folder:

from .my_math import my_sqrt
value = my_sqrt(25)

Relative import from a parent folder:

from ..my_math import my_sqrt
value = my_sqrt(25)
  • Do not put any slash such as import ../my_math
  • Relative imports can fetch . (same dir), .. (parent), ... (parent of parent)
  • Relative imports are forbidden when run from a module outside a package
  • Using absolute imports instead of relatives could result in name collisions

Choose the exported resources

By default, Python will export all resources (classes, constants, instances...) if they do not start with a _.

An user of your package will only "see" exported names. i.e. both autocompletion and from yourpackage import * will only concern exported names.

If it is required to limit the exported names to only some resources you can use the magic __all__:

__all__ = ["Class1", "Class2", "Class3"]

The Python path

The interpreter seeks for absolute import statements in the Python path sys.path.

This is a regular Python list and it can be modified at runtime (with append) to add paths to your libs.

The Python Package Index (PyPI.org)

pypi.org is a global server that allows to find, install and share Python projects.

pypi.org is operated by the Python Packaging Authority (PyPA): a working group from the Python Software Foundation (PSF).

The command-line tool Package Installer for Python (pip) can be used to install packages by their name, e.g. bottle. It can install from various sources (Link to code repos, ZIP file, local server...) and seeks on PyPI if no source is given:

pip install git+https://gitlab.com/bottlepy/bottle
pip install https://gitlab.com/bottlepy/bottle/archive/refs/heads/master.zip
pip install path/to/my/python/package/folder/
pip install path/to/my/python/package/zip/file.zip
pip install numpy    # Will seek on PyPI
pip install numpy==1.21.5   # Force a specific version
pip uninstall numpy

Pip installs packages in the current Python installation's site-packages directory which is, depending the situation:

  1. Inside your virtual environment if some venv is activated
  2. Inside your local home directory /home/<user>/.local/lib if you cannot write to system directories (i.e. you are not root)
  3. Inside system's directories such as /usr/lib/python3.8/ if you are root (🚨 This is dangerous and not advisable in general)

PyPI Security warning 🚨

PyPI packages caught stealing credit card numbers & Discord tokens

Perform sanity checks before installing a package

  • Is the package still maintained and documented?
Last update: November, 2017
  • Does the developer consider bugs and improvements?
# of solved Gitlab issues
  • Is the package developer reliable?
Moral entity or individual, which company, experience...
  • If not opensource, is the development of this package likely to continue?
# of opensource users, # of clients, company financial health if not opensource, ...

PyPI Typosquatting warning 🚨

pip install -r requirements.txt
# 🚨 pip install requirements.txt

pip install rabbitmq
# 🚨 pip install rabitmq

pip install matplotlib
# 🚨 pip install matploltib

Package distribution

setuptools simplifies the package distribution. 🐍 Learn more

You need a pyproject.toml file (formerly setup.py) that tells:

  • The package name and version number
  • The list of deps on other packages from PyPI, git repos, ...
  • The entry points (executables, commands, ...)
  • How to build the package (using hatching, setuptools...)

setuptools replaces the legacy distutils deprecated in 3.10.

The setup.py dynamic file is now discouraged in place of pyproject.toml but is still used in many projects.

# pyproject.toml example file

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "my_package"
description = "My package description"
readme = "README.rst"
requires-python = ">=3.7"
license = {text = "BSD 3-Clause License"}
classifiers = [
    "Framework :: Django",
    "Programming Language :: Python :: 3",
]
dependencies = [
    "requests",
]

[project.scripts]
my-script = "my_package.module:function"

The TOML file then offers distribution tools:

  • Install the package in the current environement:
pip install .
  • Build distribution:
    • sdist : Source distribution
    • bdist_wheel : Binary distribution
pip install build  # The latest recommended build tool by PyPA
python3 -m build   # Will build both a sdist and bdist

-rw-rw-r-- 1 16699  nov.  12 00:00 hampy-1.4.2-py3-none-any.whl
-rw-rw-r-- 1 326913 nov.  12 00:00 hampy-1.4.2.tar.gz

🐍 Learn more about package distribution: PyPA docs

Remarks about binary distribution bdist_*

  • Binary format at platform-dependant (OS, arch, Python implementation, ABI)
  • .egg files are just zip files containing sdist or bdist, you can unzip them
  • Several binary formats exist: wheel, egg... Nowadays, wheel is preferred
  • wheel files are named this way: my_math-3.0.4-py3-none-any.whl where:
    • my_math is your package name
    • 3.0.4 is your package version
    • py3 is the Python implementation tag
    • none is the ABI tag (the C API for Python)
    • any is the platform (x86_64, arm, macos...)

🐍 Learn more about package distribution: Python docs

Uploading your package distribution on PyPI

Once sdist and/or bdist are available, several pipelines exist to share your project.

Nowadays, uploading to PyPI with twine is the preferred option:

  1. Create an account on PyPI or in the sandbox TestPyPI if you're just testing
  2. pip install twine
  3. twine upload dist/* --repository testpypi

Drop the --repository argument to upload to the regular PyPI.
Parameter --repository can also target your own mirror server. Learn more.

Deploy Python package with Continuous Integration

GitLab CI + gitops

This example pipeline for GitLab CI will run the unit tests (from tox) and upload the new version on PyPI, every time a new tag is pushed to git.

pypi:
    image: python:3.11
    stage: release
    script:
        - pip install -U tox build twine
        - tox
        - python -m build
        - twine upload dist/*
    only:
        - tags

This pipeline requires TWINE_PASSWORD to be set to the right PyPI API key.

Optional: use poetry to handle dependencies

poetry is a package manager for Python such as pip or conda, with the following benefits:

  • Populates dependencies automatically according to your source code
  • Keep track of the version of dependencies in poetry.lock
  • Generates a pyproject.toml for you
  • Interacts with a prompt
  • Embeds a venv manager with dev and prod

🐍 Poetry doc

CHAPTER 3

TOOLING FOR PERFORMANCE

Profiling

Profiling measures time complexity of programs and returns, for each function:

  • ncalls: the number of calls of this function
  • tottime: the total time spent in the internal body of the function only
  • cumtime: the total time spent in the function + all inner functions calls

cProfile makes a profile. Then snakeviz, vprof, pstats help analyse them.

Snakeviz screenshot:

Python interpreters and compiling strategies

The reference implementation, CPython:

  • Compiles the Python code into bytecode, not machine code
  • Runs the bytecode
  • Caches it into __pycache__ and .pyc files, only for packages

Other Python interpreter implementations

Pypy, Jython, Anaconda, IronPython (.NET) ... with different strategies for building, memory management, unequal support of Python features & versions

Choose an interpreter according to your env, and the nature of your code

Other interpreters need to adapt code, depending their strenghs & weaknesses

Just-In-Time compiling

JIT = compile into machine code at runtime, not bytecode.

Up to x20 gain for CPU-bound code.

  • CPython 3.13 and Pypy have a JIT compiler, for frequent/specified code

  • numba compiles numpy code with a @njit decorator

JIT has good results with arithmetic calculus, predictable mem access, inferences.

Other compiling tools

Cython: C-Extensions for Python is used to call C-compiled code from Python.

SciPy is approximately 50% Python, 25% Fortran, 20% C, 3% Cython and 2% C++

Common design patterns

Design patterns are patterns of code solving common software problems be reused in any object-oriented programming language.

They were introduced in 1994 for C++ language by Gamma et al.

A few patterns are presented here.

The factory

A factory is a function building a class instance with the right type and parameters

class Animal:
    def __init__(self, type:str, vocalization:str):
        self._type = type
        self._vocalization = vocalization
    
    @classmethod
    def make_dog(cls):
        return cls("dog", "bark")
class Fox: pass

class Duck: pass

def make_animal(animal: str):
    return Fox() if animal == "fox" else Duck()

The singleton

A singleton is a class that must have only one instance.

class USBJoystickController:
    __controller : "USBJoystickController" = None

    @staticmethod
    def get_controller():
        if USBJoystickController.__controller is None:
            USBJoystickController()
        return USBJoystickController.__controller

    def __init__(self):
        if USBJoystickController.__controller is not None:
            raise ValueError("Singleton already existing, use get_controller()")
        USBJoystickController.__controller = self

The Interface / Abstract Base Class

The Interface class describes the structure & semantics that classes must comply with. A class implements an interface. Interfaces are usually named xxxxx-able.

  • the Car class may implement interfaces Movable and Drivable.
  • the Bird class may implement interfaces Movable and Cryable.

Classes implementing Movable must define a method to move them e.g. car_or_bird.move_to(self, x: float, y: float, z: float)

Classes implementing Drivable must define a method to associate a driver e.g. car.set_driver(self, d: Driver).

Classes implementing Cryable must define a method to make the animal cry e.g. bird.cry(self)

In Python, an Abstract Base Class is the closest concept to create an interface.

from abc import ABC, abstractmethod   # This is the Abstract Base Class module

class Cryable(ABC):
    @abstractmethod
    def cry(self):
        raise NotImplementedError("cry() must be overriden")

class Dog(Cryable): # Inheriting from the ABC means that Dog implements the ABC
    def cry(self):
        print("WAF!")

class Cat(Cryable): pass   # This class does not comply with the ABC

Here, cry() must be overriden in subclasses or TypeError is raised at instanciation. At runtime, calling super().cry() raises NotImplementedError.

If you are dealing with types for which you do not own the implementation (e.g. tuple or list) you can still state that it complies with your ABC by registering it explicitely:

MyABC.register(tuple)
MyABC.register(list)

def f(p: MyABC): pass # list and tuple are valid for p because registered

Registering an ABC not only forces registered classes to comply with:

  • the structure of the ABC (e.g. method cry() exist)
  • the semantics of the ABC, described in the ABC's documentation

Drawback: Explicit registering or inheritence is somehow unpythonic since duck typing is usually sufficient to decide if a class is suitable or not.

Protocols: duck typing for static type checking

Protocols fulfill the same need without need of an explicit registration or subclass

from typing import Protocol

class Cryable(Protocol):
    vocalization: str
    def cry(self):
        raise NotImplementedError("cry() must be overriden")
class Cat:  # No need to subclass Cryable here
    def __init__(self):
        self.vocalization = "meow" 
    def cry(self):
        pass
animal: Cryable = Cat() # Cat instance is accepted bcz Cat implements Cryable

A protocol has benefits for static type checking. At runtime it just acts as an ABC.

Python has built-in protocols:

  • Context Manager: __enter__ and __exit__
  • Iterable: __iter__
  • Iteratator: __iter__ and __next__
  • Sequence __getitem__ and __len__ (used by [] for instance)
  • Container: __contains__ (used by in for instance)
  • Callable: __call__
  • Hashable: __hash__
  • ...

You can use them in type annotations.

PARALLELIZE PYTHON APPLICATIONS

Multithreading and multiprocessing

Definitions:

  • Multithreading: Split work into several threads within the same process
  • Multiprocessing: Split work into several processes
  • Asynchronous tasks: Release the CPU and yield to another task when an IO is needed (e.g. network or hard disk response)

Before usage, define the nature of your code:

  • CPU-bound: CPU computations are the bottleneck
  • IO-bound: IO waiting time are the bottleneck

Depending the answer, you will pick a suitable parrallelization method & strategy

center

The bottleneck of Python Multithreading: the reference counter

The interpreter holds a counter counting how many references point to a literal.

In [1]: s = "Hello world!" 

In [2]: sys.getrefcount(s)
Out[2]: 2

In [3]: s2 = s             

In [4]: sys.getrefcount(s)
Out[4]: 3

In [5]: del s2             

In [6]: sys.getrefcount(s)
Out[6]: 2

If the counter reaches 0, the literal is destroyed. This is how Python frees memory.

The Python Global Interpreter Lock (GIL)

Several implementations of the Python interpreter exist:

  • CPython (By far the most popular)
  • Jython, IronPython, PyPy...

The GIL is a mutex that protects the reference counters of CPython objects.

However it prevents multiple threads from executing Python bytecodes at once. It offers poor performance for multi-threaded programs, if they are CPU-bound.

The GIL was a regular debate: so far too hard for low benefits

But hardware multiplied CPUs so Python 3.13 introduced --disable-gil

Because of the GIL, multiprocessing is way more efficient that multithreading.

Multithreading example

import time, threading
def second_thread():
    for i in range(10):
        print("Hello from the second thread!")
        time.sleep(1)

new_thread = threading.Thread(target=second_thread)
new_thread.start()

for i in range(10):
    print("Hello from the main thread!")
    time.sleep(1)

new_thread.join()
Hello from the second thread! # We can't tell in which order the prints will happen
Hello from the main thread!   # The OS schedules threads as fairly as possible
Hello from the main thread!   # ...according to the system load.
Hello from the second thread! # The GIL limits Python to 1 CPU 

Multiprocessing example

import time, multiprocessing
def second_process():
    for i in range(10):
        print("Hello from the second process!")
        time.sleep(1)

new_process = multiprocessing.Process(target=second_process)
new_process.start()

for i in range(10):
    print("Hello from the first process!")
    time.sleep(1)

new_process.join()
Hello from the first process!  # Pretty much the same as threads
Hello from the second process! # But mutliprocessing may use  all CPUs
Hello from the second process! # The OS schedules the processes on the CPUs
Hello from the first process!

Thread-safe: ability to be manipulate by several threads at once with no risk of unintended interaction with unpredictable issue

shared_dict = {"counter": 0}

def some_threaded_function(shared_dict: dict):
    shared_dict["counter"] += 1

+= is not an atomic operation, so result of threaded run may be unexpected.

A synchronization primitive such as Lock or Semaphore must protect execution:

shared_dict = {"counter": 0}
shared_lock = threading.Lock()

def some_threaded_function(shared_dict: dict, shared_lock: Lock):
    with shared_lock:
        some_shared_dict["counter"] += 1

Inter-Process Communication (IPC)

Python offers the following IPC primitives:

  • multiprocessing.Lock and threading.Lock (mutex, authorizes 1)
  • multiprocessing.Semaphore and threading.Semaphore (authorizes up to n)
  • multiprocessing.Pipe (1-to-1 FIFO)
  • multiprocessing.Queue (n-to-n FIFO)
  • multiprocessing.Event and threading.Event

Java's thread-safe collections offer some level of safety but do not solve all risks:

if (!synchronizedList.isEmpty()) {
    synchronizedList.remove(0); // NOT thread-safe despite the synchronizedList 
}               // This is Java code. Synchronized lists do not exist in Python

High-level multiprocessing API

multiprocessing has higher-level tools compatible with with:

  • Pool(n) : represents a Pool of n process
  • Manager : manages the pool, e.g. by sharing variables between processes:
  • manager.dict() for a shared dict, or manager.list() or manager.int()...
def f(shared_dict: dict):
    shared_dict[randint(0, 99)] = "running"

with multiprocessing.Manager() as manager:
    shared_dict = manager.dict()
    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(f, (shared_dict, shared_dict, shared_dict, shared_dict))
    print(shared_dict) # {26: 'running', 88: 'running', 60: 'running', 76: 'running'}

ℹ️ Unpack all args: pool.starmap(f, ((a, b), (c, d)) ➡️ f(a, b), f(c, d)
ℹ️ Apply same args: pool.apply(f, (a, b)) ➡️ f(a, b), f(a, b)

Subprocessing (launch external binaries)

import subprocess
result = subprocess.run(['traceroute', 'debian.org'],
                        capture_output=True, text=True)
# run() is blocking until the end of the process
print(result.stdout)  # Will print "Hello World"
  • capture_output=True captures the standard output and error to Python
  • text=True decodes the output bytes into a decoded string
process = subprocess.Popen(['traceroute', hostname], 
          stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

for line in process.stdout.readline():
    print("New traceroute hop:", line)

process.wait()

Asynchronous programming (Python coroutines)

A coroutine is an asynchronous function. To execute it you must schedule it in an event loop. Scheduling is cooperative-based: coroutines must yield execution flow.

The OS is NOT involved with coroutines: Python handles their executions by itself.

A future is a placeholder in which a future result value will be stored later.

A task is an execution scheduling of a coroutine. It knows when a coroutine is done: if it returned a value, a result, or an exception.

An awaitable is anything that can be awaited with await: coroutine, task, future

An async program is built as a concurrent one but it is a single-threaded process.

Here is a regular (synchronous) function:

def say(sentence):
    print(sentence)

say("Hello world!")

Let's make it a coroutine by adding the keyword async:

async def say(sentence):
    print(sentence)

say("Hello world!")
# Returns a <coroutine object say at 0x7fe4f837dbc0>

If we want to execute a coroutine, we can:

  • call it in asyncio.run() i.e:
asyncio.run(say("Hello"))  # It also creates an event loop
  • await it with keyword await i.e:
await say("Hello")     # It does NOT create an event loop
  • create a task from it i.e:
asyncio.create_task(say("Hello"))     # It does NOT create an event loop

The event loop is declared in the main thread (outermost scope). As a consequence, await and create_task are forbidden outside an async function

A the heart of the async app there is an event loop: This is a scheduler: you register tasks to be executed in the future ; and retrieve them when they are done.

async def waiting_coro(duration):
    await asyncio.sleep(duration)
    print(f"Finished to wait {duration}secs")

async def main():
    l3 = asyncio.create_task(waiting_coro(3))   # Plan for execution asap
    l5 = asyncio.create_task(waiting_coro(5))   # Plan for execution asap
    await asyncio.wait([l3, l5]) # Your moment of glory: don't abandon tasks

asyncio.run(main())        # Creates and destroys the event loop

Forgetting to await a task is like giving birth to a child and forgetting it: DON'T. Thus, keep track of the task and await it:

  • with await task ; or
  • with: asyncio.wait(task) (or equivalent)

Grouping task creations and waiting for them in the same function is handy because asyncio.run() creates and destroys the event loop.

No synchronous code (e.g. time.sleep() or requests.get()) or heavy code (e.g. a while loop that will take years) should be mixed with async code.

AsyncIO is single-threaded, the event loop uses it too. If you block it with synchronous code, the event loop will be delayed as well as all other tasks.

If needed though, executors can run synchronous code in thread or process pools.

Not all coroutines have to run asap. Some will first:

  • wait for another coroutine to end: await is made for it
  • wait for a returned value: asyncio.Future is made for it
  • wait for aquiring the right to access a resource: asyncio.Lock is made for it
  • wait for an event to happen: asyncio.Event is made for it
  • wait for a specific time or delay: asyncio.sleep is made for it

ℹ️ These Synchronization primitives are the same as the threading module, but they are not thread-safe.

⚠️ Do not mix primitives from threading, multiprocessing and asyncio. Even if they share the same name e.g. Lock, they are different for each library.

Example of task synchronisation with Event

async def coroutine(go):
    await go.wait()  # The coroutine waits for the event to happen
    print("I've received your go!")
    go.clear()   # The vent is clear so that it can be set again

async def trigger_event(go):
    await asyncio.sleep(5)
    go.set()

async def main():
    go = asyncio.Event()  # We trigger this event to launch the coroutine
    event_receiver = asyncio.create_task(coroutine(go))
    event_emitter = asyncio.create_task(trigger_event(go))
    await asyncio.wait([event_receiver, event_emitter])

asyncio.run(main(), debug=True)   

Example of fetching a web page (sync vs async)

import requests
response = requests.get("http://example.com")
print("Status:", response.status_code)
print("Content-type:", response.headers['content-type'])
print(response.content[:15], "...")
import asyncio, aiohttp
async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:

            print("Status:", response.status)
            print("Content-type:", response.headers['content-type'])

            html = await response.text()
            print("Body:", html[:15], "...")

asyncio.run(fetch("http://example.com"), debug=True)

Async streams

Streams provide async tools made for network communication where you need to:

  • Read from the recipient: using a StreamReader instance
  • Write to the recipient: using a StreamWriter instance

Async transports and protocols

SPECIALIZED FRAMEWORKS

  • XML : conversion to objects/dicts (untangle, xmltodict), data scraping (BS)
  • AI/ML : Tensorflow, Seaborn, Numpy, Pandas, Keras, Theano, PyTorch, SKL
  • Sciences : Numpy, Pandas, SciPy, Matplotib
  • Web : Django, Flask, Bottle, CherryPy, Falcon
    Side note: WSGI vs ASGI gateway interfaces
  • REST APIs: DRF, FastAPI
  • Dashboarding and observability: Datadog, Dynatrace

#####################################################################################################

#####################################################################################################

#####################################################################################################