Weather Simulation and Analysis Pipeline#
Have uv? ⚡
If you have uv installed, you can instantly open this page as a Jupyter notebook using opennb:
uvx --with "pipefunc[docs]" opennb pipefunc/pipefunc/docs/source/examples/weather-simulation.md
This command creates an ephemeral environment with all dependencies and launches the notebook in your browser in 1 second - no manual setup needed! ✨.
Alternatively, run:
uv run https://raw.githubusercontent.com/pipefunc/pipefunc/refs/heads/main/get-notebooks.py
to download all documentation as Jupyter notebooks.
In this example, we’ll generate temperature data for multiple cities over several days, compute statistics like mean and variance, and then use xarray to load and visualize the results.
import numpy as np
import pandas as pd
from pipefunc import Pipeline, pipefunc
# Step 1: Simulate Temperature Data
@pipefunc(output_name="temperature", mapspec="city[i], day[j] -> temperature[i, j]")
def simulate_temperature(city: str, day) -> float:
np.random.seed(hash(city) % 2**32) # For reproducibility
mean_temp = 20 + (hash(city) % 10) # Base temp varies by city
temp_variation = 5 * np.sin(day.dayofyear * (2 * np.pi / 365)) # Seasonal variation
noise = np.random.normal(0, 2) # Random daily fluctuation
return float(mean_temp + temp_variation + noise) # Ensure this is a float
# Step 2: Compute Statistics
@pipefunc(
output_name=("mean_temp", "variance"),
mapspec="temperature[i, :] -> mean_temp[i], variance[i]",
output_picker=dict.__getitem__,
)
def compute_statistics(temperature):
temp_array = np.array(temperature, dtype=float) # Ensure it's a numeric array
mean_temp = np.mean(temp_array)
var_temp = np.var(temp_array)
return {"mean_temp": mean_temp, "variance": var_temp}
# Create the pipeline
pipeline_weather = Pipeline([simulate_temperature, compute_statistics])
# Define cities and days
cities = ["New York", "Los Angeles", "Chicago"]
days = pd.date_range("2023-01-01", "2023-01-10") # 10 days
# Run the pipeline
results = pipeline_weather.map({"city": cities, "day": days}, run_folder="weather_simulation_results")
# Load and display the xarray dataset
weather_dataset = results.to_xarray()
display(weather_dataset)
# Plot temperatures for each city
weather_dataset.temperature.plot.line(
x="day",
hue="city",
marker="o",
figsize=(12, 6),
)
<xarray.Dataset> Size: 392B
Dimensions: (i: 3, j: 10)
Coordinates:
city (i) object 24B 'New York' 'Los Angeles' 'Chicago'
day (j) datetime64[ns] 80B 2023-01-01 2023-01-02 ... 2023-01-10
Dimensions without coordinates: i, j
Data variables:
temperature (i, j) float64 240B 24.94 25.02 25.11 ... 22.18 22.27 22.35
mean_temp (i) object 24B 25.32 24.95 21.97
variance (i) object 24B 0.06048 0.06048 0.06048[<matplotlib.lines.Line2D at 0x7408a96763c0>,
<matplotlib.lines.Line2D at 0x7408a96b06e0>,
<matplotlib.lines.Line2D at 0x7408a96b0830>]
Explanation:
Temperature Simulation (
simulate_temperature): Each city has its synthetic daily temperature calculated using a sinusoidal pattern and noise. Themapspeccity[i], day[j] -> temperature[i, j]allows us to handle city-by-day combinations automatically.Statistics Calculation (
compute_statistics): Computes the mean and variance of the daily temperature, mapping over cities.Automatic
xarray.Dataset: Thepipeline.map()call ensures that the data is structured into an N-dimensional format, representing the outputs naturally as anxarray.Dataset.Convert to
xarray.Dataset: Quickly access the results organized by city and day indices without manually constructing them.
This showcases pipefunc’s powerful ability to manage multi-dimensional computations and data structuring, presenting an efficient workflow for simulating and analyzing temperature variations.