Hierarchical evaluation¶
This section documents evaluation utilities for hierarchical data structures.
Hierarchical evaluation supports rollups and drill-downs across multi-level structures such as region → market → entity.
eb_evaluation.dataframe.hierarchy
¶
Hierarchy-level evaluation (DataFrame utilities).
This module provides a convenience helper for evaluating forecasts at multiple levels of a grouping hierarchy (e.g., overall, by store, by item, by store x item).
It returns a dictionary mapping each hierarchy level name to a DataFrame of metrics for that
level. Metric definitions are delegated to eb_metrics.metrics; this module focuses on
grouping orchestration and tabular output suitable for reporting.
The EB metric suite here includes CWSL and related service/readiness diagnostics (NSL, UD, HR@tau, FRS) as well as wMAPE.
evaluate_hierarchy_df(df, levels, actual_col, forecast_col, cu, co, tau=None)
¶
Evaluate EB metrics at multiple hierarchy levels.
This helper evaluates forecast performance across several grouping levels, each defined by a list of column names. For each level, it computes:
- CWSL
- NSL
- UD
- wMAPE
- HR@tau (optional)
- FRS
where each metric is computed over the subset (group) implied by that level.
The levels mapping accepts an empty list to represent the overall aggregate, e.g.
{"overall": []}.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing at minimum |
required |
levels
|
dict[str, Sequence[str]]
|
Mapping from level name to the column names used to group at that level. Example:
An empty sequence means evaluate the entire DataFrame as a single group. |
required |
actual_col
|
str
|
Column name for actual demand / realized values. |
required |
forecast_col
|
str
|
Column name for forecast values. |
required |
cu
|
Underbuild (shortfall) cost coefficient passed through to |
required | |
co
|
Overbuild (excess) cost coefficient passed through to |
required | |
tau
|
float | None
|
Tolerance parameter for HR@tau. If |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
Dictionary mapping level name to a DataFrame of metrics for that level. Each DataFrame includes:
|
Raises:
| Type | Description |
|---|---|
KeyError
|
If required columns are missing from |
ValueError
|
If |
Notes
- This function does not catch per-group metric exceptions. If eb_metrics raises a
ValueErrorfor a specific group (e.g., invalid inputs), that error will propagate. If you want best-effort reporting (NaN on failure), wrap metric calls similarly toevaluate_groups_df. groupby(..., dropna=False)is used so that missing values in grouping keys form explicit groups, which is often desirable in operational reporting.