Concepts#
Getting Started
If youโre new to pipefunc, we recommend starting with the Tutorial to get a hands-on introduction to the library. Then, explore the concepts in this section to deepen your understanding.
Welcome to the Concepts section of the pipefunc documentation.
Here, we delve into the core ideas and design principles that underpin the library.
Understanding these concepts will help you effectively utilize pipefuncโs features to build, manage, and optimize your computational workflows.
Each page in this section covers a specific aspect of pipefunc, explained in detail with examples and diagrams.
Whether youโre looking to understand the intricacies of data flow with mapspec, learn about parallel execution, or explore advanced features like resource management, this section provides the necessary insights.
Topics Covered#
Below are the key concepts discussed in this section. Click on any topic to learn more:
Function Inputs and Outputs: Manage inputs, outputs, defaults, renaming, and multiple returns. Use with
dataclassesandpydantic.Understanding
mapspec: Define data mappings withmapspecfor element-wise operations, reductions, and parallelization.Execution and Parallelism: Control pipeline execution: sequential and parallel, with mixed executors and storage. Includes post-execution hooks.
Parameter Scopes: Organize pipelines and avoid naming conflicts using parameter scopes.
Error Handling: Capture detailed error information with
ErrorSnapshotfor debugging.Type Checking: How
pipefuncuses type hints for static and runtime type checking to ensure data integrity.Variants: Use
VariantPipelineto manage multiple function implementations within a single pipeline.Caching: Cache function results with
memoizeandPipelinecache options. Understand cache types and shared memory caching.Automatic CLI: Automatically generate a CLI for your pipeline, complete with documentation.
SLURM Integration: Submit pipeline.map calls to SLURM clusters with
pipefuncandadaptive-scheduler.Resource Management: Specify and dynamically allocate resources (CPU, memory, GPU) for individual functions.
Function Chaining Helper: Connect functions linearly with the
chainhelper for simple data flow pipelines.Simplifying Pipelines: Merge nodes with
simplified_pipelineandNestedPipeFunc. Understand the trade-offs.Adaptive Integration: Optimize parameter space exploration with
adaptivelibrary integration.Testing: Best practices for testing, including mocking functions in pipelines.
Overhead and Efficiency: Measure the performance overhead of
pipefunc.Parameter Sweeps: Construct parameter sweeps and optimize execution with
pipefunc.sweep.MCP Server Integration: Expose pipelines as MCP servers for AI agents and assistants. Includes async execution, job management, and AI agent example.
Contributing#
We welcome contributions to the pipefunc documentation! If you find any issues or have suggestions for improving the concepts explained here, please open an issue or submit a pull request on our GitHub repository.