Model comparison¶
This section documents model comparison utilities provided by eb-evaluation.
Model comparison utilities support side-by-side evaluation of multiple models across metrics, diagnostics, and operational outcomes.
eb_evaluation.model_selection.compare
¶
Forecast comparison and cost-aware model selection helpers.
This module provides evaluation-oriented utilities built on top of
eb_metrics.metrics:
compare_forecastscomputes CWSL and related diagnostics for multiple forecast vectors against a common target series.select_model_by_cwslfits candidate estimators (using their native training objective) and selects the model with the lowest validation CWSL.select_model_by_cwsl_cvperforms K-fold cross-validation, selecting the model with the lowest mean CWSL and refitting it on the full dataset.
CWSL is evaluated with asymmetric costs for underbuild and overbuild, typically summarized by a cost ratio:
where \(c_u\) is the cost per unit of shortfall and \(c_o\) is the cost per unit of excess.
compare_forecasts(y_true, forecasts, cu, co, sample_weight=None, tau=2.0)
¶
Compare multiple forecast models on the same target series.
For each forecast vector, compute CWSL and a standard set of diagnostics:
- CWSL
- NSL
- UD
- wMAPE
- HR@tau
- FRS
- MAE
- RMSE
- MAPE
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Actual (ground-truth) values. |
required |
forecasts
|
Mapping[str, array - like]
|
Mapping from model name to forecast vector. Each forecast must be shape |
required |
cu
|
float or array-like of shape (n_samples,)
|
Underbuild (shortfall) cost per unit. |
required |
co
|
float or array-like of shape (n_samples,)
|
Overbuild (excess) cost per unit. |
required |
sample_weight
|
array-like of shape (n_samples,)
|
Optional non-negative weights per interval. Passed to metrics that support
|
None
|
tau
|
float or array - like
|
Tolerance parameter for HR@tau. May be scalar or per-interval. |
2.0
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame indexed by model name with columns:
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
select_model_by_cwsl(models, X_train, y_train, X_val, y_val, *, cu, co, sample_weight_val=None)
¶
Fit multiple models, then select the best by validation CWSL.
Each estimator is fit on (X_train, y_train) using its native objective
(typically MSE/RMSE), then evaluated on the validation set via CWSL:
The model with the lowest CWSL is returned, along with a compact results table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
models
|
dict[str, Any]
|
Mapping from model name to an unfitted estimator implementing:
|
required |
X_train
|
Training data used to fit each model. |
required | |
y_train
|
Training data used to fit each model. |
required | |
X_val
|
Validation data used only for evaluation. |
required | |
y_val
|
Validation data used only for evaluation. |
required | |
cu
|
float
|
Underbuild (shortfall) cost per unit for CWSL. |
required |
co
|
float
|
Overbuild (excess) cost per unit for CWSL. |
required |
sample_weight_val
|
array - like or None
|
Optional per-interval weights for the validation set, passed to CWSL. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
best_name |
str
|
Name of the model with the lowest CWSL on the validation set. |
best_model |
Any
|
Fitted estimator corresponding to |
results |
DataFrame
|
DataFrame indexed by model name with columns |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no models are evaluated. |
Notes
- RMSE and wMAPE are computed unweighted (consistent with current eb_metrics behavior).
- This function is intentionally simple and does not handle time-series splitting; callers should ensure the split is appropriate.
select_model_by_cwsl_cv(models, X, y, *, cu, co, cv=5, sample_weight=None)
¶
Select a model by cross-validated CWSL and refit on the full dataset.
This is a simple K-fold cross-validation loop:
- Split indices into
cvfolds. - For each model and fold:
- fit on (cv - 1) folds
- evaluate on the held-out fold using CWSL, RMSE, and wMAPE
- Aggregate metrics across folds for each model.
- Choose the model with the lowest mean CWSL.
- Refit the chosen model once on all data
(X, y).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
models
|
dict[str, Any]
|
Mapping from model name to an unfitted estimator implementing
|
required |
X
|
array-like of shape (n_samples, n_features)
|
Feature matrix. |
required |
y
|
array-like of shape (n_samples,)
|
Target vector. |
required |
cu
|
float
|
Underbuild (shortfall) cost per unit for CWSL. |
required |
co
|
float
|
Overbuild (excess) cost per unit for CWSL. |
required |
cv
|
int
|
Number of folds. Must be >= 2. |
5
|
sample_weight
|
numpy.ndarray of shape (n_samples,)
|
Optional per-sample weights used only for CWSL metric calculation. RMSE and wMAPE remain unweighted. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
best_name |
str
|
Model name with the lowest mean CWSL across folds. |
best_model |
Any
|
The chosen estimator refit on all data. |
results |
DataFrame
|
DataFrame indexed by model name with columns:
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If X/y dimensions mismatch, |
Notes
This function uses a naive split of indices into contiguous folds via
numpy.array_split. For time-series problems, callers should prefer
time-aware splitting outside this helper.