Simplifying Pipelines

Simplifying Pipelines#

This section is about pipefunc.Pipeline.simplified_pipeline(), which is a convenient way to simplify a pipeline by merging multiple nodes into a single node (creating a pipefunc.NestedPipeFunc). Consider the following pipeline (look at the visualize() output to see the structure of the pipeline):

from pipefunc import Pipeline


def f1(a, b, c, d):
    return a + b + c + d


def f2(a, b, e):
    return a + b + e


def f3(a, b, f1):
    return a + b + f1


def f4(f1, f3):
    return f1 + f3


def f5(f1, f4):
    return f1 + f4


def f6(b, f5):
    return b + f5


def f7(a, f2, f6):
    return a + f2 + f6


# If the functions are not decorated with @pipefunc,
# they will be wrapped and the output_name will be the function name
pipeline_complex = Pipeline([f1, f2, f3, f4, f5, f6, f7])
pipeline_complex("f7", a=1, b=2, c=3, d=4, e=5)
pipeline_complex.visualize_matplotlib(
    color_combinable=True,
)  # combinable functions have the same color
../../_images/9e3d1d0fc50f5e0ae9dc3fbc1b99b75ada4c7d84c7feb5e119bbab0bb5b1306c.png

In the example code above, the complex pipeline composed of multiple functions (f1, f2, f3, f4, f5, f6, f7) can be simplified by merging the nodes f1, f3, f4, f5, f6 into a single node. This merging process simplifies the pipeline and allows to reduce the number of functions that need to be cached/saved.

The method reduced_pipeline from the Pipeline class is used to generate this simplified version of the pipeline.

simplified_pipeline_complex = pipeline_complex.simplified_pipeline("f7")
simplified_pipeline_complex.visualize()  # A `NestedPipeFunc` will have a red edge
cluster_legend Legend a a f2(...) → f2 f2(...) f2 a->f2(...) → f2 a f7(...) → f7 f7(...) f7 a->f7(...) → f7 a NestedPipeFunc_f6(...) → f6 NestedPipeFunc_f6(...) f6 a->NestedPipeFunc_f6(...) → f6 a b b b->f2(...) → f2 b b->NestedPipeFunc_f6(...) → f6 b e e e->f2(...) → f2 e c c c->NestedPipeFunc_f6(...) → f6 c d d d->NestedPipeFunc_f6(...) → f6 d f2(...) → f2->f7(...) → f7 f2 NestedPipeFunc_f6(...) → f6->f7(...) → f7 f6 legend_0 Argument legend_1 PipeFunc legend_2 NestedPipeFunc

However, simplifying a pipeline comes with a trade-off. The simplification process removes intermediate nodes that may be necessary for debugging or inspection.

For instance, if a developer wants to monitor the output of f3 while processing the pipeline, they would not be able to do so in the simplified pipeline as f3 has been merged into a pipefunc.NestedPipeFunc.

The simplified pipeline now contains a pipefunc.NestedPipeFunc object, which is a subclass of PipeFunc but contains an internal pipeline.

simplified_pipeline_complex.functions
[PipeFunc(f2),
 PipeFunc(f7),
 NestedPipeFunc(pipefuncs=[PipeFunc(f6), PipeFunc(f1), PipeFunc(f3), PipeFunc(f4), PipeFunc(f5)])]
nested_func = simplified_pipeline_complex.functions[-1]
print(f"{nested_func.parameters=}, {nested_func.output_name=}, {nested_func(a=1, b=2, c=3, d=4)=}")
nested_func.pipeline.visualize()
nested_func.parameters=('a', 'b', 'c', 'd'), nested_func.output_name='f6', nested_func(a=1, b=2, c=3, d=4)=35
cluster_legend Legend b b f6(...) → f6 f6(...) f6 b->f6(...) → f6 b f1(...) → f1 f1(...) f1 b->f1(...) → f1 b f3(...) → f3 f3(...) f3 b->f3(...) → f3 b a a a->f1(...) → f1 a a->f3(...) → f3 a c c c->f1(...) → f1 c d d d->f1(...) → f1 d f5(...) → f5 f5(...) f5 f5(...) → f5->f6(...) → f6 f5 f1(...) → f1->f5(...) → f5 f1 f1(...) → f1->f3(...) → f3 f1 f4(...) → f4 f4(...) f4 f1(...) → f1->f4(...) → f4 f1 f3(...) → f3->f4(...) → f4 f3 f4(...) → f4->f5(...) → f5 f4 legend_0 Argument legend_1 PipeFunc