Concepts

Concepts#

Getting Started

If youโ€™re new to pipefunc, we recommend starting with the Tutorial to get a hands-on introduction to the library. Then, explore the concepts in this section to deepen your understanding.

Welcome to the Concepts section of the pipefunc documentation. Here, we delve into the core ideas and design principles that underpin the library. Understanding these concepts will help you effectively utilize pipefuncโ€™s features to build, manage, and optimize your computational workflows.

Each page in this section covers a specific aspect of pipefunc, explained in detail with examples and diagrams. Whether youโ€™re looking to understand the intricacies of data flow with mapspec, learn about parallel execution, or explore advanced features like resource management, this section provides the necessary insights.

Topics Covered#

Below are the key concepts discussed in this section. Click on any topic to learn more:

  • Function Inputs and Outputs: Manage inputs, outputs, defaults, renaming, and multiple returns. Use with dataclasses and pydantic.

  • Understanding mapspec: Define data mappings with mapspec for element-wise operations, reductions, and parallelization.

  • Execution and Parallelism: Control pipeline execution: sequential and parallel, with mixed executors and storage. Includes post-execution hooks.

  • Parameter Scopes: Organize pipelines and avoid naming conflicts using parameter scopes.

  • Error Handling: Capture detailed error information with ErrorSnapshot for debugging.

  • Type Checking: How pipefunc uses type hints for static and runtime type checking to ensure data integrity.

  • Variants: Use VariantPipeline to manage multiple function implementations within a single pipeline.

  • Caching: Cache function results with memoize and Pipeline cache options. Understand cache types and shared memory caching.

  • Automatic CLI: Automatically generate a CLI for your pipeline, complete with documentation.

  • SLURM Integration: Submit pipeline.map calls to SLURM clusters with pipefunc and adaptive-scheduler.

  • Resource Management: Specify and dynamically allocate resources (CPU, memory, GPU) for individual functions.

  • Function Chaining Helper: Connect functions linearly with the chain helper for simple data flow pipelines.

  • Simplifying Pipelines: Merge nodes with simplified_pipeline and NestedPipeFunc. Understand the trade-offs.

  • Adaptive Integration: Optimize parameter space exploration with adaptive library integration.

  • Testing: Best practices for testing, including mocking functions in pipelines.

  • Overhead and Efficiency: Measure the performance overhead of pipefunc.

  • Parameter Sweeps: Construct parameter sweeps and optimize execution with pipefunc.sweep.

  • MCP Server Integration: Expose pipelines as MCP servers for AI agents and assistants. Includes async execution, job management, and AI agent example.

Contributing#

We welcome contributions to the pipefunc documentation! If you find any issues or have suggestions for improving the concepts explained here, please open an issue or submit a pull request on our GitHub repository.