Hierarchical evaluation¶

This section documents evaluation utilities for hierarchical data structures.

Hierarchical evaluation supports rollups and drill-downs across multi-level structures such as region → market → entity.

`eb_evaluation.dataframe.hierarchy` ¶

Hierarchy-level evaluation (DataFrame utilities).

This module provides a convenience helper for evaluating forecasts at multiple levels of a grouping hierarchy (e.g., overall, by store, by item, by store x item).

It returns a dictionary mapping each hierarchy level name to a DataFrame of metrics for that level. Metric definitions are delegated to eb_metrics.metrics; this module focuses on grouping orchestration and tabular output suitable for reporting.

The EB metric suite here includes CWSL and related service/readiness diagnostics (NSL, UD, HR@tau, FRS) as well as wMAPE.

`evaluate_hierarchy_df(df, levels, actual_col, forecast_col, cu, co, tau=None)` ¶

Evaluate EB metrics at multiple hierarchy levels.

This helper evaluates forecast performance across several grouping levels, each defined by a list of column names. For each level, it computes:

CWSL
NSL
UD
wMAPE
HR@tau (optional)
FRS

where each metric is computed over the subset (group) implied by that level.

The levels mapping accepts an empty list to represent the overall aggregate, e.g. {"overall": []}.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input DataFrame containing at minimum `actual_col` and `forecast_col` plus any grouping columns referenced by `levels`.	required
`levels`	`dict[str, Sequence[str]]`	Mapping from level name to the column names used to group at that level. Example: levels = { ... "overall": [], ... "by_store": ["store_id"], ... "by_item": ["item_id"], ... "by_store_item": ["store_id", "item_id"], ... } An empty sequence means evaluate the entire DataFrame as a single group.	required
`actual_col`	`str`	Column name for actual demand / realized values.	required
`forecast_col`	`str`	Column name for forecast values.	required
`cu`		Underbuild (shortfall) cost coefficient passed through to `eb_metrics.metrics.cwsl` and `eb_metrics.metrics.frs`.	required
`co`		Overbuild (excess) cost coefficient passed through to `eb_metrics.metrics.cwsl` and `eb_metrics.metrics.frs`.	required
`tau`	`float \| None`	Tolerance parameter for HR@tau. If `None`, HR@tau is omitted from outputs.	`None`

Returns:

Type Description

dict[str, DataFrame]

Dictionary mapping level name to a DataFrame of metrics for that level.

Each DataFrame includes:

the level's grouping columns (if any), first
n_intervals : number of rows evaluated in that group
total_demand : sum of actual_col for that group
cwsl : cost-weighted service loss
nsl : no-shortage level
ud : underbuild deviation
wmape : weighted mean absolute percentage error (per eb_metrics definition)
hr_at_tau : hit rate within tolerance tau (only if tau is provided)
frs : forecast readiness score

Raises:

Type	Description
`KeyError`	If required columns are missing from `df` (actual/forecast and any columns referenced in `levels`).
`ValueError`	If `df` is empty, or if `levels` is empty.

Notes

This function does not catch per-group metric exceptions. If eb_metrics raises a ValueError for a specific group (e.g., invalid inputs), that error will propagate. If you want best-effort reporting (NaN on failure), wrap metric calls similarly to evaluate_groups_df.
groupby(..., dropna=False) is used so that missing values in grouping keys form explicit groups, which is often desirable in operational reporting.

Hierarchical evaluation¶

eb_evaluation.dataframe.hierarchy ¶

evaluate_hierarchy_df(df, levels, actual_col, forecast_col, cu, co, tau=None) ¶

`eb_evaluation.dataframe.hierarchy` ¶

`evaluate_hierarchy_df(df, levels, actual_col, forecast_col, cu, co, tau=None)` ¶