Skip to content

Group-level evaluation

This section documents evaluation utilities for computing metrics and diagnostics across grouped entities.

Group-level evaluation supports aggregation and comparison across cohorts, segments, or experimental groups.

eb_evaluation.dataframe.group

Group-level evaluation (DataFrame utilities).

This module provides helpers for evaluating forecasts on grouped subsets of a DataFrame (e.g., by store, item, daypart, region). It orchestrates grouping, parameter handling, and tabular output while delegating metric definitions to eb_metrics.metrics.

The primary entry point is evaluate_groups_df, which computes the Electric Barometer metric suite (CWSL, NSL, UD, HR@tau, FRS) plus common symmetric diagnostics (wMAPE, MAE, RMSE, MAPE) for each group.

evaluate_groups_df(df, group_cols, *, actual_col='actual_qty', forecast_col='forecast_qty', cu=2.0, co=1.0, tau=2.0, sample_weight_col=None)

Evaluate core EB metrics per group from a DataFrame.

For each group defined by group_cols, this helper computes:

  • CWSL
  • NSL
  • UD
  • wMAPE
  • HR@tau
  • FRS
  • MAE
  • RMSE
  • MAPE

Cost parameters can be provided either globally (scalar) or per-row (column name).

Parameters:

Name Type Description Default
df DataFrame

Input data containing actuals, forecasts, and grouping columns.

required
group_cols list[str]

Column names used to define groups (e.g., ["store_id", "item_id"]).

required
actual_col str

Name of the column containing actual demand values.

"actual_qty"
forecast_col str

Name of the column containing forecast values.

"forecast_qty"
cu float | str

Underbuild (shortfall) cost coefficient.

  • If float: scalar cost applied uniformly across all rows/groups.
  • If str: name of a column in df containing per-row underbuild costs.
2.0
co float | str

Overbuild (excess) cost coefficient.

  • If float: scalar cost applied uniformly across all rows/groups.
  • If str: name of a column in df containing per-row overbuild costs.
1.0
tau float

Absolute-error tolerance parameter for the hit-rate metric HR@tau.

2.0
sample_weight_col str | None

Optional column name containing non-negative sample weights per row. If provided, weights are passed into metrics that accept a sample_weight argument.

None

Returns:

Type Description
DataFrame

DataFrame with one row per group and columns::

group_cols + ["CWSL", "NSL", "UD", "wMAPE", "HR@tau", "FRS", "MAE", "RMSE", "MAPE"].

If a metric is undefined for a particular group (e.g., invalid values for that group), the corresponding value is returned as NaN rather than raising an error for the entire evaluation.

Raises:

Type Description
KeyError

If required columns are missing from df.

ValueError

If df is empty, or if group_cols is empty.

Notes
  • wmape in eb_metrics.metrics does not take sample_weight, so it is computed unweighted here.
  • Symmetric diagnostics (MAE, RMSE, MAPE) are computed unweighted to match the current eb_metrics signatures.
  • Metrics are evaluated group-by-group; a failure in one group does not prevent evaluation of other groups.