Electric Barometer model selection¶

This section documents Electric Barometer–specific model selection utilities.

These utilities integrate evaluation metrics, diagnostics, and readiness signals to support holistic model ranking and selection.

`eb_evaluation.model_selection.electric_barometer` ¶

Cost-aware model selection using the Electric Barometer workflow.

This module defines ElectricBarometer, a lightweight selector that evaluates a set of candidate regressors using Cost-Weighted Service Loss (CWSL) as the primary objective and selects the model that minimizes expected operational cost.

Selection preference is governed by asymmetric unit costs:

cu: underbuild (shortfall) cost per unit
co: overbuild (excess) cost per unit

A convenient summary is the cost ratio:

\[ R = \frac{c_u}{c_o} \]

Notes

ElectricBarometer is intentionally a selector (not a trainer that optimizes CWSL directly). Candidate models are trained using their native objectives (e.g., squared error) and are selected using a chosen selection objective on validation data (holdout) or across folds (CV).

`ElectricBarometer` ¶

Cost-aware selector that chooses the best model by minimizing a selection objective.

ElectricBarometer evaluates each candidate model on either:

a provided train/validation split (selection_mode="holdout"), or
K-fold cross-validation on the provided dataset (selection_mode="cv"),

and selects the model with the best (lowest) score under the chosen selection objective. For interpretability, it also reports reference diagnostics (CWSL, RMSE, wMAPE).

Operational preference is captured by asymmetric costs and the induced ratio:

$$

R = \frac{c_u}{c_o}

$$

Parameters:

Name	Type	Description	Default
`models`	`dict[str, Any]`	Mapping of candidate model name to an unfitted estimator implementing: `fit(X, y)` `predict(X)` Models can be scikit-learn regressors/pipelines or EB adapters implementing the same interface.	required
`include`	`set[str] \| None`	Optional allowlist of model names to include from `models`. If provided, only these names are retained (after validation).	`None`
`exclude`	`set[str] \| None`	Optional blocklist of model names to exclude from `models` (after validation). Applied after `include` filtering.	`None`
`metric`	`('cwsl', 'rmse', 'wmape')`	Selection objective used to choose the winning model. All metrics are computed and reported; this parameter determines which column is optimized.	`"cwsl"`
`tie_tol`	`float`	Absolute tolerance applied to the selection metric when determining ties. Any model with score <= (best_score + tie_tol) is considered tied.	`0.0`
`tie_breaker`	`('metric', 'simpler', 'name')`	How to break ties among models within `tie_tol` of the best score. `"metric"`: choose the tied model with the lowest metric (deterministic by insertion/index order) `"simpler"`: prefer a "simpler" model based on a lightweight heuristic `"name"`: choose lexicographically smallest model name	`"metric"`
`validate_inputs`	`('strict', 'coerce', 'off')`	Input validation level. `"strict"`: require numeric arrays and error on NaN/inf `"coerce"`: coerce to float and error on NaN/inf `"off"`: minimal validation (legacy behavior)	`"strict"`
`error_policy`	`('raise', 'skip', 'warn_skip')`	Behavior when a candidate model fails to fit/predict or otherwise errors. `"raise"`: raise immediately `"skip"`: skip failing models silently (recorded in `failures_`) `"warn_skip"`: warn and skip (recorded in `failures_`)	`"raise"`
`time_budget_s`	`float \| None`	Optional wall-clock time budget (seconds) for the full selection run. If exceeded, remaining models are not evaluated. Note: this cannot forcibly interrupt a model already running; it gates starting new candidates and can mark a candidate as timed out if it exceeds budgets.	`None`
`per_model_time_budget_s`	`float \| None`	Optional wall-clock time budget (seconds) per candidate model (across folds in CV). If exceeded, that model is marked as timed out and skipped (or raises under `error_policy="raise"`).	`None`
`cu`	`float`	Underbuild (shortfall) cost per unit. Must be strictly positive.	`2.0`
`co`	`float`	Overbuild (excess) cost per unit. Must be strictly positive.	`1.0`
`tau`	`float`	Reserved for downstream diagnostics (e.g., HR@τ) that may be integrated into selection reporting. Currently not used in the selection criterion.	`2.0`
`training_mode`	`'selection_only'`	Training behavior. In the current implementation, candidate models are trained using their native objectives and only selection is external.	`"selection_only"`
`refit_on_full`	`bool`	Refit behavior in holdout mode: If True, after selecting the best model by the chosen metric on validation data, refit a fresh clone of the winning model on train and validation. If False, keep the fitted winning model as trained on the training split (and selected on the validation split). In CV mode, the selected model is always refit on the full dataset provided to `fit` (i.e., `X_train, y_train`).	`False`
`selection_mode`	`('holdout', 'cv')`	Selection strategy: `"holdout"`: use the provided `(X_train, y_train, X_val, y_val)`. `"cv"`: ignore `X_val, y_val` and run K-fold selection on `X_train, y_train`.	`"holdout"`
`cv`	`int`	Number of folds when `selection_mode="cv"`. Must be at least 2.	`3`
`random_state`	`int \| None`	Seed used for CV shuffling/splitting.	`None`

Attributes:

Name	Type	Description
`best_name_`	`str \| None`	Name of the winning model after calling `fit`.
`best_model_`	`Any \| None`	Fitted estimator corresponding to `best_name_`.
`results_`	`DataFrame \| None`	Per-model comparison table. In holdout mode: one row per model with columns `["CWSL", "RMSE", "wMAPE"]` In CV mode: mean scores across folds with the same columns
`failures_`	`dict[str, str]`	Mapping of model name to a failure reason for models that errored or timed out.
`validation_cwsl_`	`float \| None`	CWSL of the winning model on validation (holdout) or mean across folds (CV).
`validation_rmse_`	`float \| None`	RMSE of the winning model on validation (holdout) or mean across folds (CV).
`validation_wmape_`	`float \| None`	wMAPE of the winning model on validation (holdout) or mean across folds (CV).
`candidate_names_`	`list[str]`	Names of candidate models remaining after include/exclude filtering.
`evaluated_names_`	`list[str]`	Names of models that were actually attempted during the most recent `fit`.
`stopped_early_`	`bool`	Whether evaluation stopped early due to the global time budget.
`stop_reason_`	`str \| None`	If `stopped_early_` is True, a human-readable reason string.

`r_` `property` ¶

Cost ratio.

Returns:

Type	Description
`float`	The ratio: $$ `R = \frac{c_u}{c_o}` $$

`fit(X_train, y_train, X_val, y_val, sample_weight_train=None, sample_weight_val=None, refit_on_full=None)` ¶

Fit candidate models and select the best one by minimizing the chosen metric.

`predict(X)` ¶

Predict using the selected best model.

`cwsl_score(y_true, y_pred, sample_weight=None, cu=None, co=None)` ¶

Compute CWSL using this selector's costs (or overrides).

Electric Barometer model selection¶

eb_evaluation.model_selection.electric_barometer ¶

ElectricBarometer ¶

r_ property ¶

fit(X_train, y_train, X_val, y_val, sample_weight_train=None, sample_weight_val=None, refit_on_full=None) ¶

predict(X) ¶

cwsl_score(y_true, y_pred, sample_weight=None, cu=None, co=None) ¶