Model comparison¶

This section documents model comparison utilities provided by eb-evaluation.

Model comparison utilities support side-by-side evaluation of multiple models across metrics, diagnostics, and operational outcomes.

`eb_evaluation.model_selection.compare` ¶

Forecast comparison and cost-aware model selection helpers.

This module provides evaluation-oriented utilities built on top of eb_metrics.metrics:

compare_forecasts computes CWSL and related diagnostics for multiple forecast vectors against a common target series.
select_model_by_cwsl fits candidate estimators (using their native training objective) and selects the model with the lowest validation CWSL.
select_model_by_cwsl_cv performs K-fold cross-validation, selecting the model with the lowest mean CWSL and refitting it on the full dataset.

CWSL is evaluated with asymmetric costs for underbuild and overbuild, typically summarized by a cost ratio:

\[ R = \frac{c_u}{c_o} \]

where \(c_u\) is the cost per unit of shortfall and \(c_o\) is the cost per unit of excess.

`compare_forecasts(y_true, forecasts, cu, co, sample_weight=None, tau=2.0)` ¶

Compare multiple forecast models on the same target series.

For each forecast vector, compute CWSL and a standard set of diagnostics:

CWSL
NSL
UD
wMAPE
HR@tau
FRS
MAE
RMSE
MAPE

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Actual (ground-truth) values.	required
`forecasts`	`Mapping[str, array - like]`	Mapping from model name to forecast vector. Each forecast must be shape `(n_samples,)`.	required
`cu`	`float or array-like of shape (n_samples,)`	Underbuild (shortfall) cost per unit.	required
`co`	`float or array-like of shape (n_samples,)`	Overbuild (excess) cost per unit.	required
`sample_weight`	`array-like of shape (n_samples,)`	Optional non-negative weights per interval. Passed to metrics that support `sample_weight` (CWSL, NSL, UD, HR@tau, FRS). Metrics that are currently unweighted in `eb_metrics` (e.g., wMAPE, MAE, RMSE, MAPE) are computed without weights.	`None`
`tau`	`float or array - like`	Tolerance parameter for HR@tau. May be scalar or per-interval.	`2.0`

Returns:

Type	Description
`DataFrame`	DataFrame indexed by model name with columns: `["CWSL", "NSL", "UD", "wMAPE", "HR@tau", "FRS", "MAE", "RMSE", "MAPE"]`.

Raises:

Type	Description
`ValueError`	If `y_true` is not 1D, if `forecasts` is empty, or if any forecast length is incompatible with `y_true`.

`select_model_by_cwsl(models, X_train, y_train, X_val, y_val, *, cu, co, sample_weight_val=None)` ¶

Fit multiple models, then select the best by validation CWSL.

Each estimator is fit on (X_train, y_train) using its native objective (typically MSE/RMSE), then evaluated on the validation set via CWSL:

\[ \text{CWSL} = \mathrm{cwsl}(y_{\mathrm{val}}, \hat{y}_{\mathrm{val}}; c_u, c_o) \]

The model with the lowest CWSL is returned, along with a compact results table.

Parameters:

Name	Type	Description	Default
`models`	`dict[str, Any]`	Mapping from model name to an unfitted estimator implementing: `fit(X, y)` `predict(X)`	required
`X_train`		Training data used to fit each model.	required
`y_train`		Training data used to fit each model.	required
`X_val`		Validation data used only for evaluation.	required
`y_val`		Validation data used only for evaluation.	required
`cu`	`float`	Underbuild (shortfall) cost per unit for CWSL.	required
`co`	`float`	Overbuild (excess) cost per unit for CWSL.	required
`sample_weight_val`	`array - like or None`	Optional per-interval weights for the validation set, passed to CWSL.	`None`

Returns:

Name	Type	Description
`best_name`	`str`	Name of the model with the lowest CWSL on the validation set.
`best_model`	`Any`	Fitted estimator corresponding to `best_name`.
`results`	`DataFrame`	DataFrame indexed by model name with columns `["CWSL", "RMSE", "wMAPE"]`.

Raises:

Type	Description
`ValueError`	If no models are evaluated.

Notes

RMSE and wMAPE are computed unweighted (consistent with current eb_metrics behavior).
This function is intentionally simple and does not handle time-series splitting; callers should ensure the split is appropriate.

`select_model_by_cwsl_cv(models, X, y, *, cu, co, cv=5, sample_weight=None)` ¶

Select a model by cross-validated CWSL and refit on the full dataset.

This is a simple K-fold cross-validation loop:

Split indices into cv folds.
For each model and fold:
fit on (cv - 1) folds
evaluate on the held-out fold using CWSL, RMSE, and wMAPE
Aggregate metrics across folds for each model.
Choose the model with the lowest mean CWSL.
Refit the chosen model once on all data (X, y).

Parameters:

Name	Type	Description	Default
`models`	`dict[str, Any]`	Mapping from model name to an unfitted estimator implementing `fit` and `predict`.	required
`X`	`array-like of shape (n_samples, n_features)`	Feature matrix.	required
`y`	`array-like of shape (n_samples,)`	Target vector.	required
`cu`	`float`	Underbuild (shortfall) cost per unit for CWSL.	required
`co`	`float`	Overbuild (excess) cost per unit for CWSL.	required
`cv`	`int`	Number of folds. Must be >= 2.	`5`
`sample_weight`	`numpy.ndarray of shape (n_samples,)`	Optional per-sample weights used only for CWSL metric calculation. RMSE and wMAPE remain unweighted.	`None`

Returns:

Name	Type	Description
`best_name`	`str`	Model name with the lowest mean CWSL across folds.
`best_model`	`Any`	The chosen estimator refit on all data.
`results`	`DataFrame`	DataFrame indexed by model name with columns: `CWSL_mean`, `CWSL_std` `RMSE_mean`, `RMSE_std` `wMAPE_mean`, `wMAPE_std` `n_folds`

Raises:

Type	Description
`ValueError`	If X/y dimensions mismatch, `cv < 2`, sample_weight length mismatch, or no models are evaluated.

Notes

This function uses a naive split of indices into contiguous folds via numpy.array_split. For time-series problems, callers should prefer time-aware splitting outside this helper.

Model comparison¶

eb_evaluation.model_selection.compare ¶

compare_forecasts(y_true, forecasts, cu, co, sample_weight=None, tau=2.0) ¶

select_model_by_cwsl(models, X_train, y_train, X_val, y_val, *, cu, co, sample_weight_val=None) ¶

select_model_by_cwsl_cv(models, X, y, *, cu, co, cv=5, sample_weight=None) ¶

`eb_evaluation.model_selection.compare` ¶

`compare_forecasts(y_true, forecasts, cu, co, sample_weight=None, tau=2.0)` ¶

`select_model_by_cwsl(models, X_train, y_train, X_val, y_val, *, cu, co, sample_weight_val=None)` ¶

`select_model_by_cwsl_cv(models, X, y, *, cu, co, cv=5, sample_weight=None)` ¶