Type Checking#
Have uv? โก
If you have uv installed, you can instantly open this page as a Jupyter notebook using opennb:
uvx --with "pipefunc[docs]" opennb pipefunc/pipefunc/docs/source/concepts/type-checking.md
This command creates an ephemeral environment with all dependencies and launches the notebook in your browser in 1 second - no manual setup needed! โจ.
Alternatively, run:
uv run https://raw.githubusercontent.com/pipefunc/pipefunc/refs/heads/main/get-notebooks.py
to download all documentation as Jupyter notebooks.
How does type checking work in pipefunc?#
pipefunc supports type checking for function arguments and outputs using Python type hints.
It ensures that the output of one function matches the expected input types of the next function in the pipeline.
This is crucial for maintaining data integrity and catching errors early in pipeline-based workflows.
Basic type checking#
Hereโs an example of pipefunc raising a TypeError when the types donโt match:
from pipefunc import Pipeline, pipefunc
# All type hints that are not relevant for this example are omitted!
@pipefunc(output_name="y")
def f(a) -> int: # output 'y' is expected to be an `int`
return 2 * a
@pipefunc(output_name="z")
def g(y: str): # here 'y' is expected to be a `str`
return y.upper()
# Creating the `Pipeline` will raise a `TypeError`
pipeline = Pipeline([f, g])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[1], line 14
11 return y.upper()
13 # Creating the `Pipeline` will raise a `TypeError`
---> 14 pipeline = Pipeline([f, g])
File ~/checkouts/readthedocs.org/user_builds/pipefunc/checkouts/930/pipefunc/_pipeline/_base.py:208, in Pipeline.__init__(self, functions, lazy, debug, print_error, profile, cache_type, cache_kwargs, validate_type_annotations, scope, default_resources, name, description)
206 else:
207 mapspec = None
--> 208 self.add(f, mapspec=mapspec)
209 self._cache_type = cache_type
210 self._cache_kwargs = cache_kwargs
File ~/checkouts/readthedocs.org/user_builds/pipefunc/checkouts/930/pipefunc/_pipeline/_base.py:355, in Pipeline.add(self, f, mapspec)
352 f.print_error = self.print_error
354 self._clear_internal_cache() # reset cache
--> 355 self.validate()
356 return f
File ~/checkouts/readthedocs.org/user_builds/pipefunc/checkouts/930/pipefunc/_pipeline/_base.py:1578, in Pipeline.validate(self)
1576 self._validate_mapspec()
1577 if self.validate_type_annotations:
-> 1578 validate_consistent_type_annotations(self.graph)
File ~/checkouts/readthedocs.org/user_builds/pipefunc/checkouts/930/pipefunc/_pipeline/_validation.py:97, in validate_consistent_type_annotations(graph)
86 if not is_type_compatible(output_type, input_type):
87 msg = (
88 f"Inconsistent type annotations for:"
89 f"\n - Argument `{parameter_name}`"
(...) 95 " Disable this check by setting `validate_type_annotations=False`."
96 )
---> 97 raise TypeError(msg)
TypeError: Inconsistent type annotations for:
- Argument `y`
- Function `f(...)` returns:
`<class 'int'>`.
- Function `g(...)` expects:
`<class 'str'>`.
Please make sure the shared input arguments have the same type.
Note that the output type displayed above might be wrapped in `pipefunc.typing.Array` if using `MapSpec`s. Disable this check by setting `validate_type_annotations=False`.
In this example, function f outputs an int, but function g expects a str input.
When we try to create the pipeline, it will raise a TypeError due to this type mismatch.
Note
pipefunc only checks the type hints during pipeline construction, not during function execution.
However, soon we will add runtime type checking as an option.
To turn off this type checking, you can set the validate_type_annotations argument to False in the Pipeline constructor:
pipeline = Pipeline([f, g], validate_type_annotations=False)
Note that while disabling type checking allows the pipeline to run, it may lead to runtime errors or unexpected results if the types are not compatible.
Type checking for Pipelines with MapSpec and reductions#
When a pipeline contains a reduction operation (using MapSpecs), the type checking is more complex.
The results of a ND map operation are always stored in a numpy object array, which means that the original types are preserved in the elements of this array.
This means the type hints for the function should be numpy.ndarray[Any, np.dtype[numpy.object_]].
Unfortunately, it is not possible to statically check the types of the elements in the object array (e.g., with mypy).
We can however, check the types of the elements at runtime.
To do this, we can use the Array type hint from pipefunc.typing.
This Array generic contains the correct numpy.ndarray type hint for object arrays, but is annotated with the element type using typing.Annotated.
When using e.g., Array[int], the type hint is numpy.ndarray[Any, np.dtype[numpy.object_]] with the element type int in the metadata of Annotated.
MyPy will ensure the numpy array type, however, PipeFunc will ensure both the numpy object array and its element type.
Use it like this:
import numpy as np
from pipefunc import Pipeline, pipefunc
from pipefunc.typing import Array
@pipefunc(output_name="y", mapspec="x[i] -> y[i]")
def double_it(x: int) -> int:
assert isinstance(x, int)
return 2 * x
@pipefunc(output_name="sum")
def take_sum(y: Array[int]) -> int:
# y is a numpy object array of integers
# the original types are always preserved!
assert isinstance(y, np.ndarray)
assert isinstance(y.dtype, object)
assert isinstance(y[0], int)
return sum(y)
pipeline_map = Pipeline([double_it, take_sum])
pipeline_map.map({"x": [1, 2, 3]})
{'y': Result(function='double_it', kwargs={'x': [1, 2, 3]}, output_name='y', output=array([2, 4, 6], dtype=object), store=DictArray(folder=None, shape=(3,), internal_shape=(), shape_mask=(True,), mapping={(0,): 2, (1,): 4, (2,): 6})), 'sum': Result(function='take_sum', kwargs={'y': DictArray(folder=None, shape=(3,), internal_shape=(), shape_mask=(True,), mapping={(0,): 2, (1,): 4, (2,): 6})}, output_name='sum', output=12, store=<pipefunc.map._result.DirectValue object at 0x7f13c4138490>)}
For completeness, this is the type hint for Array[int]:
from pipefunc.typing import Array
Array[int]
typing.Annotated[numpy.ndarray[typing.Any, numpy.dtype[numpy.object_]], pipefunc.typing.ArrayElementType[int]]