Skip to content

Hierarchical evaluation

This section documents evaluation utilities for hierarchical data structures.

Hierarchical evaluation supports rollups and drill-downs across multi-level structures such as region → market → entity.

eb_evaluation.dataframe.hierarchy

Hierarchy-level evaluation (DataFrame utilities).

This module provides a convenience helper for evaluating forecasts at multiple levels of a grouping hierarchy (e.g., overall, by store, by item, by store x item).

It returns a dictionary mapping each hierarchy level name to a DataFrame of metrics for that level. Metric definitions are delegated to eb_metrics.metrics; this module focuses on grouping orchestration and tabular output suitable for reporting.

The EB metric suite here includes CWSL and related service/readiness diagnostics (NSL, UD, HR@tau, FRS) as well as wMAPE.

evaluate_hierarchy_df(df, levels, actual_col, forecast_col, cu, co, tau=None)

Evaluate EB metrics at multiple hierarchy levels.

This helper evaluates forecast performance across several grouping levels, each defined by a list of column names. For each level, it computes:

  • CWSL
  • NSL
  • UD
  • wMAPE
  • HR@tau (optional)
  • FRS

where each metric is computed over the subset (group) implied by that level.

The levels mapping accepts an empty list to represent the overall aggregate, e.g. {"overall": []}.

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame containing at minimum actual_col and forecast_col plus any grouping columns referenced by levels.

required
levels dict[str, Sequence[str]]

Mapping from level name to the column names used to group at that level.

Example:

levels = { ... "overall": [], ... "by_store": ["store_id"], ... "by_item": ["item_id"], ... "by_store_item": ["store_id", "item_id"], ... }

An empty sequence means evaluate the entire DataFrame as a single group.

required
actual_col str

Column name for actual demand / realized values.

required
forecast_col str

Column name for forecast values.

required
cu

Underbuild (shortfall) cost coefficient passed through to eb_metrics.metrics.cwsl and eb_metrics.metrics.frs.

required
co

Overbuild (excess) cost coefficient passed through to eb_metrics.metrics.cwsl and eb_metrics.metrics.frs.

required
tau float | None

Tolerance parameter for HR@tau. If None, HR@tau is omitted from outputs.

None

Returns:

Type Description
dict[str, DataFrame]

Dictionary mapping level name to a DataFrame of metrics for that level.

Each DataFrame includes:

  • the level's grouping columns (if any), first
  • n_intervals : number of rows evaluated in that group
  • total_demand : sum of actual_col for that group
  • cwsl : cost-weighted service loss
  • nsl : no-shortage level
  • ud : underbuild deviation
  • wmape : weighted mean absolute percentage error (per eb_metrics definition)
  • hr_at_tau : hit rate within tolerance tau (only if tau is provided)
  • frs : forecast readiness score

Raises:

Type Description
KeyError

If required columns are missing from df (actual/forecast and any columns referenced in levels).

ValueError

If df is empty, or if levels is empty.

Notes
  • This function does not catch per-group metric exceptions. If eb_metrics raises a ValueError for a specific group (e.g., invalid inputs), that error will propagate. If you want best-effort reporting (NaN on failure), wrap metric calls similarly to evaluate_groups_df.
  • groupby(..., dropna=False) is used so that missing values in grouping keys form explicit groups, which is often desirable in operational reporting.