pipefunc.cache module#

Provides pipefunc.cache module with cache classes for memoization and caching.

class pipefunc.cache.HybridCache(max_size=128, access_weight=0.5, duration_weight=0.5, *, allow_cloudpickle=True, shared=True)[source]#

Bases: _CacheBase

A hybrid cache implementation.

This uses a combination of Least Frequently Used (LFU) and Least Computationally Expensive (LCE) strategies for invalidating cache entries.

The cache invalidation strategy calculates a score for each entry based on its access frequency and computation duration. The entry with the lowest score will be invalidated when the cache reaches its maximum size.

max_size#

The maximum number of entries the cache can store.

access_weight#

The weight given to the access frequency in the score calculation.

duration_weight#

The weight given to the computation duration in the score calculation.

allow_cloudpickle#

Use cloudpickle for storing the data in memory if using shared memory.

shared#

Whether the cache should be shared between multiple processes.

property cache: dict[Hashable, Any]#

Return the cache entries.

property access_counts: dict[Hashable, int]#

Return the access counts of the cache entries.

property computation_durations: dict[Hashable, float]#

Return the computation durations of the cache entries.

get(key)[source]#

Retrieve a value from the cache by its key.

If the key is present in the cache, its access count is incremented.

Parameters:

key (Hashable) – The key associated with the value in the cache.

Return type:

Any | None

Returns:

The value associated with the key if the key is present in the cache, otherwise None.

put(key, value, duration)[source]#

Add a value to the cache with its associated key and computation duration.

If the cache is full, the entry with the lowest score based on the access frequency and computation duration will be invalidated.

Parameters:
  • key (Hashable) – The key associated with the value.

  • value (Any) – The value to store in the cache.

  • duration (float) – The duration of the computation that generated the value.

Return type:

None

clear()[source]#

Clear the cache.

Return type:

None

class pipefunc.cache.LRUCache(*, max_size=128, allow_cloudpickle=True, shared=True)[source]#

Bases: _CacheBase

A shared memory LRU cache implementation.

Parameters:
  • max_size (int) – Cache size of the LRU cache, by default 128.

  • allow_cloudpickle (bool) – Use cloudpickle for storing the data in memory if using shared memory.

  • shared (bool) – Whether the cache should be shared between multiple processes.

get(key)[source]#

Get a value from the cache by key.

Return type:

Any

put(key, value)[source]#

Insert a key value pair into the cache.

Return type:

None

property cache: dict#

Returns a copy of the cache.

clear()[source]#

Clear the cache.

Return type:

None

class pipefunc.cache.SimpleCache[source]#

Bases: _CacheBase

A simple cache without any eviction strategy.

get(key)[source]#

Get a value from the cache by key.

Return type:

Any

put(key, value)[source]#

Insert a key value pair into the cache.

Return type:

None

property cache: dict#

Returns a copy of the cache.

clear()[source]#

Clear the cache.

Return type:

None

class pipefunc.cache.DiskCache(cache_dir, max_size=None, *, use_cloudpickle=True, with_lru_cache=True, lru_cache_size=128, lru_shared=True, permissions=None)[source]#

Bases: _CacheBase

Disk cache implementation using pickle or cloudpickle for serialization.

Parameters:
  • cache_dir (str | Path) – The directory where the cache files are stored.

  • max_size (int | None) – The maximum number of cache files to store. If None, no limit is set.

  • use_cloudpickle (bool) – Use cloudpickle for storing the data in memory.

  • with_lru_cache (bool) – Use an in-memory LRU cache to prevent reading from disk too often.

  • lru_cache_size (int) – The maximum size of the in-memory LRU cache. Only used if with_lru_cache is True.

  • lru_shared (bool) – Whether the in-memory LRU cache should be shared between multiple processes.

  • permissions (int | None) –

    The file permissions to set for the cache files. If None, the default permissions are used. Some examples:

    • 0o660 (read/write for owner and group, no access for others)

    • 0o644 (read/write for owner, read-only for group and others)

    • 0o777 (read/write/execute for everyone - generally not recommended)

    • 0o600 (read/write for owner, no access for group and others)

    • None (use the system’s default umask)

get(key)[source]#

Get a value from the cache by key.

Return type:

Any

put(key, value)[source]#

Insert a key value pair into the cache.

Return type:

None

clear()[source]#

Clear the cache by deleting all cache files.

Return type:

None

property cache: dict#

Returns a copy of the cache, but only if with_lru_cache is True.

property shared: bool#

Return whether the cache is shared.

pipefunc.cache.memoize(cache=None, key_func=None, *, fallback_to_pickle=True, unhashable_action='error')[source]#

A flexible memoization decorator that works with different cache types.

Parameters:
  • cache (HybridCache | LRUCache | SimpleCache | DiskCache | None) – An instance of a cache class (_CacheBase). If None, a SimpleCache is used.

  • key_func (Callable[..., Hashable] | None) – A function to generate cache keys. If None, the default key generation which attempts to make all arguments hashable.

  • fallback_to_pickle (bool) – If True, unhashable objects will be pickled to bytes using cloudpickle as a last resort. If False, an exception will be raised for unhashable objects. Only used if key_func is None.

  • unhashable_action (Literal['error', 'warning', 'ignore']) – Determines the behavior when encountering unhashable objects: - β€œerror”: Raise an UnhashableError (default). - β€œwarning”: Log a warning and skip caching for that call. - β€œignore”: Silently skip caching for that call. Only used if key_func is None.

Return type:

Decorated function with memoization.

Raises:

UnhashableError – If the object cannot be made hashable and fallback_to_pickle is False.

Notes

This function creates a hashable representation of both positional and keyword arguments, allowing for effective caching of function calls with various argument types.

pipefunc.cache.try_to_hashable(obj, fallback_to_pickle=True, unhashable_action='error', where='function')[source]#

Try to convert an object to a hashable representation.

Wrapper around to_hashable that allows for different actions when encountering unhashable objects.

Parameters:
  • obj (Any) – The object to convert.

  • fallback_to_pickle (bool) – If True, unhashable objects will be pickled to bytes using cloudpickle as a last resort. If False, an exception will be raised for unhashable objects.

  • unhashable_action (Literal['error', 'warning', 'ignore']) – Determines the behavior when encountering unhashable objects: - "error": Raise an UnhashableError (default). - "warning": Log a warning and skip caching for that call. - "ignore": Silently skip caching for that call. Returns UnhashableError.

  • where (str) – The location where the unhashable object was encountered. Used for warning or error messages.

Return type:

Hashable | type[UnhashableError]

Returns:

A hashable representation of the input object.

Raises:

UnhashableError – If the object cannot be made hashable and fallback_to_pickle is False.

Notes

This function attempts to create a hashable representation of any input object. It handles most built-in Python types and some common third-party types like numpy arrays and pandas Series/DataFrames.

pipefunc.cache.to_hashable(obj, fallback_to_pickle=True)[source]#

Convert any object to a hashable representation if not hashable yet.

Parameters:
  • obj (Any) – The object to convert.

  • fallback_to_pickle (bool) – If True, unhashable objects will be pickled to bytes using cloudpickle as a last resort. If False, an exception will be raised for unhashable objects.

Return type:

A hashable representation of the input object.

Raises:

UnhashableError – If the object cannot be made hashable and fallback_to_pickle is False.

Notes

This function attempts to create a hashable representation of any input object. It handles most built-in Python types and some common third-party types like numpy arrays and pandas Series/DataFrames.

exception pipefunc.cache.UnhashableError(obj)[source]#

Bases: TypeError

Exception raised for objects that cannot be made hashable.