Python - Type annotations

Overview
Questions:
  • What is typing?

  • How does it improve code?

  • Can it help me?

Objectives:
  • Understand the utility of annotating types on one’s code

  • Understand the limits of type annotations in python

Requirements:
Time estimation: 30 minutes
Level: Intermediate Intermediate
Supporting Materials:
Last modification: Oct 18, 2022
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT
Best viewed in a Jupyter Notebook

This tutorial is best viewed in a Jupyter notebook! You can load this notebook one of the following ways

Launching the notebook in Jupyter in Galaxy

  1. Instructions to Launch JupyterLab
  2. Open a Terminal in JupyterLab with File -> New -> Terminal
  3. Run wget https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-typing/data-science-python-typing.ipynb
  4. Select the notebook that appears in the list of files on the left.

Downloading the notebook

  1. Right click one of these links: Jupyter Notebook (With Solutions), Jupyter Notebook (Without Solutions)
  2. Save Link As..

In some languages type annotations are a core part of the language and types are checked at compile time, to ensure your code can never use the incorrect type of object. Python, and a few other dynamic languages, instead use “Duck Typing” wherein the type of the object is less important than whether or not the correct methods or attributes are available.

However, we can provide type hints as we write python which will allow our editor to type check code as we go, even if it is not typically enforced at any point.

Agenda

In this tutorial, we will cover:

  1. Types
  2. But why?
  3. Typing Variables
  4. Testing for Types
  5. Exercise
  6. Automation with MonkeyType

Types

Types used for annotations can be any of the base types:

str
int
float
bool
None
...

or they can be relabeling of existing types, letting you create new types as needed to represent your internal data structures

from typing import NewType

NameType = NewType("NameType", str)
Point2D = NewType("Point2D", tuple[float, float])

You might be on a python earlier than 3.9. Please update, or rewrite these as Tuple and List which must be imported.

But why?

Imagine for a minute you have a situation like the following, take a minute to read and understand the code:

# Fetch the user and history list
(history_id, user_id) = GetUserAndCurrentHistory("hexylena")

# And make sure all of the permissions are correct
history = History.fetch(history_id)
history.share_with(user_id)
history.save()
Question
  1. Can you be sure the history_id and user_id are in the correct order? It seems like potentially not, given the ordering of “user” and “history” in the function name, but without inspecting the definition of that function we won’t know.
  2. What happens if history_id and user_id are swapped?
  1. This is unanswerable without the code.
  2. Depending on the magnitude of history_id and user_id, those may be within allowable ranges. Take for example

    User History Id
    1 1
    1 2
    2 3
    2 4

    Given user_id=1 and history_id=2 we may intend that the second row in our tables, history #2 owned by user #1, is shared with that user, as they’re the owner. But if those are backwards, we’ll get a situation where history #1 is actually associated with user #1, but instead we’re sharing with user #2. We’ve created a situation where we’ve accidentally shared the wrong history with the wrong user! This could be a GDPR violation for our system and cause a lot of trouble.

However, if we have type definitions for the UserId and HistoryId that declare them as their own types:

from typing import NewType

UserId = NewType("UserId", int)
HistoryId = NewType("HistoryId", int)

And then defined on our function, e.g.

def GetUserAndCurrentHistory(username: str) -> tuple[UserId, HistoryId]:
    x = UserId(1) # Pretend this is fetching from the database
    y = HistoryId(2) # Likewise
    return (x, y)

we would be able to catch that, even if we call the variable user_id, it will still be typed checked.

history_id: HistoryId
user_id: UserId

(user_id, history_id) = GetUserAndCurrentHistory("hexylena")
(history_id, user_id) = GetUserAndCurrentHistory("hexylena")

If we’re using a code editor with typing hints, e.g. VSCode with PyLance, we’ll see something like:

Screenshot of VSCode showing the functions from above. The version with history_id first has a bright red line under the function call of GetUserAndCurrentHistory. A popup tab shown on hovering over the function name shows that Expression of type UserId cannot be assigned to declared type HistoryId. UserId is incompatible with HistoryId.

Here we see that we’re not allowed to call this function this way, it’s simply impossible.

Question

What happens if you execute this code?

It executes happily. Types are not enforced at runtime. So this case where they’re both custom types around an integer, Python sees that it expects an int in both versions of the function call, and that works fine for it. That is why we are repeatedly calling them “type hints”, they’re hints to your editor to show suggestions and help catch bugs, but they’re not enforced. If you modified the line y = HistoryId(2) to be something like y = "test", the code will also execute fine. Python doesn’t care that there’s suddenly a string where you promised and asked for, an int. It simply does not matter.

However, types are checked when you do operations involving them. Trying to get the len() of an integer? That will raise an TypeError, as integers don’t support the len() call.

Typing Variables

Adding types to variables is easy, you’ve seen a few examples already:

a: str = "Hello"
b: int = 3
c: float = 3.14159
d: bool = True

Complex Types

But you can go further than this with things like tuple and list types:

e: list[int] = [1, 2, 3]
f: tuple[int, str] = (3, "Hi.")
g: list[tuple[int, int]] = [(1, 2), (3, 4)]

Typing Functions

Likewise you’ve seen an example of adding type hints to a function:

def reverse_list_of_ints(a: list[int]) -> list[int]:
    return a[::-1]

But this is a very specific function, right? We can reverse lists with more than just integers. For this, you can use Any:

from typing import Any

def reverse_list(a: list[Any]) -> list[Any]:
    return a[::-1]

But this will lose the type information from the start of the function to the end. You said it was a list[Any] so your editor might not provide any type hints there, even though you could know, that calling it with a list[int] would always return the same type. Instead you can do

from typing import TypeVar

T = TypeVar("T") # Implicitly any

def reverse_list(a: list[T]) -> list[T]:
    return a[::-1]

Now this will allow the function to accept a list of any type of value, int, float, etc. But it will also accept types you might not have intended:

w: list[tuple[int, int]] = [(1, 2), (3, 4), (5, 8)]
reverse_list(w)

We can lock down what types we’ll accept by using a Union instead of Any. With a Union, we can define that a type in that position might be any one of a few more specific types. Say your function can only accept strings, integers, or floats:

from typing import Union

def reverse_list(a: list[Union[int, float, str]]) -> list[Union[int, float, str]]:
    return a[::-1]

Here we have used a Union[A, B, ...] to declare that it can only be one of these three types.

Question
  1. Are both of these valid definitions?`

    q1: list[Union[int, float, str]] = [1, 2, 3]
    q2: list[Union[int, float, str]] = [1, 2.3214, "asdf"]
    
  2. If that wasn’t what you expected, how would you define it so that it would be?

    Yes, both are valid, but maybe you expected a homogeneous list. If you wanted that, you could instead do

    q3: Union[list[int], list[float], list[str]] = [1, 2, 3]
    q4: Union[list[int], list[float], list[str]] = [1, 2.3243, "asdf"] # Fails
    

Optional

Sometimes you have an argument to a function that is truly optional, maybe you have a different code path if it isn’t there, or you simply process things differently but still correctly. You can explicitly declare this by defining it as Optional

from typing import Optional

def pretty(lines: list[str], padding: Optional[str] = None) -> None:
    for line in lines:
        if padding:
            print(f"{padding} {line}")
        else:
            print(line)


lines = ["hello", "world", "你好", "世界"]

# Without the optional argument
pretty(lines)
# And with the optional
pretty(lines, "★")

While this superficially looks like a keyword argument with a default value, however it’s subtly different. Here an explicit value of None is allowed, and we still know that it will either be a string, or it will be None. Not something that was possible with just a keyword argument.

Testing for Types

You can use mypy to ensure that these type annotations are working in a project, this is a step you could add to your automated testing, if you have that. Using the HistoryId/UserId example from above, we can write that out into a script and test it out by running mypy on that file:

$ mypy tmp.py
tmp.py:15: error: Incompatible types in assignment (expression has type "UserId", variable has type "HistoryId")
tmp.py:15: error: Incompatible types in assignment (expression has type "HistoryId", variable has type "UserId")

Here it reports the errors in the console, and you can use this to prevent bad code from being committed.

Exercise

Here is an example module that would be stored in corp/__init__.py

def repeat(x, n):
    """Return a list containing n references to x."""
    return [x]*n


def print_capitalized(x):
    """Print x capitalized, and return x."""
    print(x.capitalize())
    return x


def concatenate(x, y) :
    """Add two strings together."""
    return x + y

And here are some example invocations of that module, as found in test.py

from corp import *

x = repeat("A", 3) # Should return ["A", "A", "A"]
y = print_capitalized("hElLo WorLd") # Should print Hello World
z = concatenate("Hi", "Bob") # HiBob
Hands-on: Add type annotations
  1. Add type annotations to each of those functions AND the variables x, y, z
  2. How did you know which types were appropriate?
  3. Does mypy approve of your annotations? (Run mypy test.py, once you’ve written the above files out to their appropriate locations.)
  1. The proper annotations:

    def repeat(x: str, n: int) -> list[str]:
    # Or
    from typing import TypeVar
    T = TypeVar("T")
    def repeat(x: T, n: int) -> list[T]:
    
    def print_capitalized(x: str) -> str:
    
    def concatenate(x: str, y:str) -> str:
    

    and

    x: list[str] = ...
    y: str = ...
    z: str = ...
    
  2. You might have discovered this by a combination of looking at the function definitions and their documentation, and perhaps also the sample invocations and what types were passed there.
  3. We hope so!

Automation with MonkeyType

You can use MonkeyType to automatically apply type annotations to your code. Based on the execution of the code, it will make a best guess about what types are supported.

Hands-on: Using MonkeyType to generate automatic annotations
  1. Create a folder for a module named some
  2. Touch some/__init__.py to ensure it’s importable as a python module
  3. Create some/module.py and add the following contents:

    def add(a, b):
        return a + B
    
  4. Create a script that uses that module:

    from some.module import add
       
    add(1, 2)
    
  5. pip install monkeytype
  6. Run MonkeyType to generate the annotations

    monkeytype run myscript.py
    
  7. View the generated annotations

    monkeytype stub myscript.py
    
Question
  1. What was the output of that command?
  2. This function will accept strings as well, add a statement to exercise that in myscript.py and re-run monkeytype run and monkeytype stub. What is the new output?
  1. The expected output is:

    def add(a: int, b: int) -> int: ...
    
  2. You can add a statement like add("a", "b") below add(1, 2) to see:

    def add(a: Union[int, str], b: Union[int, str]) -> Union[int, str]: ...
    
Question

Why is it different?

Because MonkeyType works by running the code provided (myscript.py) and annotating based on what executions it saw. In the first invocation it had not seen any calls to add() with strings, so it only reported int as acceptable types. However, the second time it saw strs as well. Can you think of another type that would be supported by this operation, that was not caught? (list!)

Question
  1. Does that type annotation make sense based on what you’ve learned today?
  2. Can you write a better type annoation based on what you know?
  1. It works, but it’s not a great type annotation. Here the description looks like it can accept two ints and return a str which isn’t correct.
  2. Here is a better type annotation

    from typing import TypeVar
    T = TypeVar("T", int, str, list)
       
    def add(a: T, b: T) -> T:
        return a + b
    
Key points
  • Typing improves the correctness and quality of your code

  • It can ensure that editor provided hints are better and more accurate.

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the Foundations of Data Science topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Click here to load Google feedback frame

Citing this Tutorial

  1. , 2022 Python - Type annotations (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-typing/tutorial.html Online; accessed TODAY
  2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012


@misc{data-science-python-typing,
author = "Helena Rasche",
title = "Python - Type annotations (Galaxy Training Materials)",
year = "2022",
month = "10",
day = "18"
url = "\url{https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-typing/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                   

Funding

These individuals or organisations provided funding support for the development of this resource

A number of our employees contribute directly to the Galaxy Training Network and seek to make our higher education learning materials more accessible to a wider audience through the GTN platform. avans.nl

Congratulations on successfully completing this tutorial!