Weather Simulation and Analysis Pipeline

Weather Simulation and Analysis Pipeline#

In this example, we’ll generate temperature data for multiple cities over several days, compute statistics like mean and variance, and then use xarray to load and visualize the results.

import numpy as np
import pandas as pd

from pipefunc import Pipeline, pipefunc


# Step 1: Simulate Temperature Data
@pipefunc(output_name="temperature", mapspec="city[i], day[j] -> temperature[i, j]")
def simulate_temperature(city: str, day) -> float:
    np.random.seed(hash(city) % 2**32)  # For reproducibility
    mean_temp = 20 + (hash(city) % 10)  # Base temp varies by city
    temp_variation = 5 * np.sin(day.dayofyear * (2 * np.pi / 365))  # Seasonal variation
    noise = np.random.normal(0, 2)  # Random daily fluctuation
    return float(mean_temp + temp_variation + noise)  # Ensure this is a float


# Step 2: Compute Statistics
@pipefunc(
    output_name=("mean_temp", "variance"),
    mapspec="temperature[i, :] -> mean_temp[i], variance[i]",
    output_picker=dict.__getitem__,
)
def compute_statistics(temperature):
    temp_array = np.array(temperature, dtype=float)  # Ensure it's a numeric array
    mean_temp = np.mean(temp_array)
    var_temp = np.var(temp_array)
    return {"mean_temp": mean_temp, "variance": var_temp}


# Create the pipeline
pipeline_weather = Pipeline([simulate_temperature, compute_statistics])

# Define cities and days
cities = ["New York", "Los Angeles", "Chicago"]
days = pd.date_range("2023-01-01", "2023-01-10")  # 10 days

# Run the pipeline
results = pipeline_weather.map({"city": cities, "day": days}, run_folder="weather_simulation_results")

# Load and display the xarray dataset
weather_dataset = results.to_xarray()
display(weather_dataset)

# Plot temperatures for each city
weather_dataset.temperature.plot.line(
    x="day",
    hue="city",
    marker="o",
    figsize=(12, 6),
)
<xarray.Dataset> Size: 392B
Dimensions:      (i: 3, j: 10)
Coordinates:
    city         (i) object 24B 'New York' 'Los Angeles' 'Chicago'
    day          (j) datetime64[ns] 80B 2023-01-01 2023-01-02 ... 2023-01-10
Dimensions without coordinates: i, j
Data variables:
    temperature  (i, j) float64 240B 24.94 25.02 25.11 ... 22.18 22.27 22.35
    mean_temp    (i) object 24B 25.32 24.95 21.97
    variance     (i) object 24B 0.06048 0.06048 0.06048
[<matplotlib.lines.Line2D at 0x7408a96763c0>,
 <matplotlib.lines.Line2D at 0x7408a96b06e0>,
 <matplotlib.lines.Line2D at 0x7408a96b0830>]
../../_images/2a3f80ea2fabd688b6ff90c5c4ba8ae95e0f7b86385ce61fc6f43c15a8c2f28f.png

Explanation:

  • Temperature Simulation (simulate_temperature): Each city has its synthetic daily temperature calculated using a sinusoidal pattern and noise. The mapspec city[i], day[j] -> temperature[i, j] allows us to handle city-by-day combinations automatically.

  • Statistics Calculation (compute_statistics): Computes the mean and variance of the daily temperature, mapping over cities.

  • Automatic xarray.Dataset: The pipeline.map() call ensures that the data is structured into an N-dimensional format, representing the outputs naturally as an xarray.Dataset.

  • Convert to xarray.Dataset: Quickly access the results organized by city and day indices without manually constructing them.

This showcases pipefunc’s powerful ability to manage multi-dimensional computations and data structuring, presenting an efficient workflow for simulating and analyzing temperature variations.