Metrics API¶

This section documents the core metric implementations provided by eb-metrics. All content below is generated automatically from NumPy-style docstrings in the source code.

Metrics Package¶

`eb_metrics.metrics` ¶

Public metric API for the Electric Barometer ecosystem.

The eb_metrics.metrics package provides a curated, stable import surface for Electric Barometer evaluation metrics.

This package groups metrics into four main categories:

Asymmetric loss metrics (loss) Cost-aware losses that encode directional operational asymmetry.
Service and readiness diagnostics (service) Metrics that quantify shortfall avoidance, tolerance coverage, and readiness.
Classical regression metrics (regression) Standard symmetric error metrics used for baseline comparison.
Cost-ratio utilities (cost_ratio) Helpers for selecting and analyzing the asymmetric cost ratio :math:R = c_u / c_o.

Conceptual definitions and interpretation of Electric Barometer metrics are documented in the companion research repository (eb-papers). This package is the executable reference implementation.

Notes

Users are encouraged to import from eb_metrics.metrics or from the relevant submodule (e.g., eb_metrics.metrics.service) rather than internal helpers.

Examples:

Import from the package surface:

>>> from eb_metrics.metrics import cwsl, nsl, frs

Or import from a submodule:

>>> from eb_metrics.metrics.loss import cwsl
>>> from eb_metrics.metrics.service import nsl

`estimate_R_cost_balance(y_true, y_pred, R_grid=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None)` ¶

Estimate a global cost ratio \(R = c_u / c_o\) via cost balance.

This routine selects a single, global cost ratio \(R\) by searching a candidate grid and choosing the value where the total weighted underbuild cost is closest to the total weighted overbuild cost.

For each candidate \(R\) in R_grid:

\[ \begin{aligned} c_{u,i} &= R \cdot c_{o,i} \\ s_i &= \max(0, y_i - \hat{y}_i) \\ e_i &= \max(0, \hat{y}_i - y_i) \\ C_u(R) &= \sum_i w_i \; c_{u,i} \; s_i \\ C_o(R) &= \sum_i w_i \; c_{o,i} \; e_i \end{aligned} \]

and the selected value is:

\[ R^* = \arg\min_R \; \left| C_u(R) - C_o(R) \right|. \]

The returned \(R^*\) can be used as: - a reasonable default global cost ratio for evaluation, and/or - the center of a sensitivity sweep (e.g., {R*/2, R*, 2*R*}).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Realized demand (non-negative).	required
`y_pred`	`array-like of shape (n_samples,)`	Forecast demand (non-negative). Must have the same shape as `y_true`.	required
`R_grid`	`sequence of float`	Candidate cost ratios \(R\) to search over. Only strictly positive values are considered.	`(0.5, 1.0, 2.0, 3.0)`
`co`	`float or array-like of shape (n_samples,)`	Overbuild cost \(c_o\) per unit. Can be: scalar: same overbuild cost for all intervals 1D array: per-interval overbuild cost For each \(R\), the implied underbuild cost is \(c_{u,i} = R \cdot c_{o,i}\).	`1.0`
`sample_weight`	`float or array-like of shape (n_samples,)`	Optional non-negative weights per interval used to weight the cost aggregation. If `None`, all intervals receive weight `1.0`.	`None`

Returns:

Type	Description
`float`	The value in `R_grid` that minimizes \(\left\| C_u(R) - C_o(R) \right\|\). If multiple values yield the same minimal gap, the first such value in the (filtered) grid is returned, except in the degenerate perfect forecast case (zero error everywhere), where the candidate closest to `1.0` is returned.

Raises:

Type	Description
`ValueError`	If inputs are invalid (e.g., negative `y_true` or `y_pred`), if `R_grid` is empty, or if it contains no positive values.

Notes

This helper is intentionally simple: it does not infer cost structure from business inputs, nor does it estimate per-item costs. It provides a reproducible, data-driven way to select a reasonable global \(R\) given realized outcomes and forecast behavior.

References

Electric Barometer Technical Note: Cost Ratio Estimation (Choosing \(R\)).

`cwsl(y_true, y_pred, cu, co, sample_weight=None)` ¶

Compute Cost-Weighted Service Loss (CWSL).

CWSL is a demand-normalized, directionally-aware loss that penalizes shortfalls and overbuilds using explicit per-unit costs.

For each interval \(i\):

\[ \begin{aligned} s_i &= \max(0, y_i - \hat{y}_i) \\ o_i &= \max(0, \hat{y}_i - y_i) \\ \text{cost}_i &= c_{u,i} \; s_i + c_{o,i} \; o_i \end{aligned} \]

and the aggregated metric is:

\[ \mathrm{CWSL} = \frac{\sum_i w_i \; \text{cost}_i}{\sum_i w_i \; y_i} \]

where \(w_i\) are optional sample weights (default \(w_i = 1\)). Lower values indicate better performance.

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Realized demand \(y\). Must be non-negative.	required
`y_pred`	`array-like of shape (n_samples,)`	Forecast demand \(\hat{y}\). Must be non-negative and have the same shape as `y_true`.	required
`cu`	`float or array-like of shape (n_samples,)`	Per-unit shortfall cost \(c_u\). Can be a scalar (global cost) or a 1D array specifying per-interval costs. Must be non-negative.	required
`co`	`float or array-like of shape (n_samples,)`	Per-unit overbuild cost \(c_o\). Can be a scalar (global cost) or a 1D array specifying per-interval costs. Must be non-negative.	required
`sample_weight`	`float or array-like of shape (n_samples,)`	Optional non-negative weights per interval. If `None`, all intervals receive weight `1.0`.	`None`

Returns:

Type	Description
`float`	The CWSL value. Lower is better.

Raises:

Type	Description
`ValueError`	If `y_true` and `y_pred` have different shapes, if any demand or forecast values are negative, if any costs are negative, or if the metric is undefined due to zero total (weighted) demand with positive total (weighted) cost.

Notes

When cu == co (up to a constant scaling), CWSL behaves similarly to a demand-normalized absolute error (wMAPE-like), but retains explicit cost semantics.
If total (weighted) demand is zero and total (weighted) cost is zero, this implementation returns 0.0.
If total (weighted) demand is zero but total (weighted) cost is positive, the metric is undefined under this formulation and a ValueError is raised.

References

Electric Barometer Technical Note: Cost-Weighted Service Loss (CWSL).

`mae(y_true, y_pred)` ¶

Mean Absolute Error (MAE).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values.	required

Returns:

Type	Description
`float`	Mean absolute error. Lower is better.

`mase(y_true, y_pred, y_naive)` ¶

Mean Absolute Scaled Error (MASE).

MASE scales the model MAE by the MAE of a naive forecast:

\[ \mathrm{MASE} = \frac{\mathrm{MAE}(y,\hat{y})}{\mathrm{MAE}(y, y^{\text{naive}})} \]

where y_naive is typically a naive baseline such as \(y_{t-1}\).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values.	required
`y_naive`	`array-like of shape (n_samples,)`	Naive forecast values aligned to `y_true`.	required

Returns:

Type	Description
`float`	MASE value. Lower is better.

Raises:

Type	Description
`ValueError`	If the naive MAE is zero (MASE undefined).

Notes

MASE is scale-free and can be compared across series with different magnitudes, assuming the naive baseline is meaningful.

`medae(y_true, y_pred)` ¶

Median Absolute Error (MedAE).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values.	required

Returns:

Type	Description
`float`	Median absolute error. Lower is better.

`mape(y_true, y_pred)` ¶

Mean Absolute Percentage Error (MAPE).

MAPE is computed over samples where y_true != 0:

\[ \mathrm{MAPE} = 100 \cdot \mathrm{mean}\left(\left|\frac{y-\hat{y}}{y}\right|\right) \]

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values.	required

Returns:

Type	Description
`float`	Mean absolute percentage error in percent. Lower is better.

Raises:

Type	Description
`ValueError`	If all values of `y_true` are zero (MAPE undefined).

Notes

MAPE can be unstable when y_true is near zero. Consider WMAPE or a domain-specific metric (e.g., CWSL) when percentage behavior is undesirable.

`mse(y_true, y_pred)` ¶

Mean Squared Error (MSE).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values.	required

Returns:

Type	Description
`float`	Mean squared error. Lower is better.

`msle(y_true, y_pred)` ¶

Mean Squared Log Error (MSLE).

MSLE is defined as:

\[ \mathrm{MSLE} = \mathrm{mean}\left((\log(1+y) - \log(1+\hat{y}))^2\right) \]

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values. Must be non-negative.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values. Must be non-negative.	required

Returns:

Type	Description
`float`	Mean squared log error. Lower is better.

Raises:

Type	Description
`ValueError`	If any value in `y_true` or `y_pred` is negative.

Notes

MSLE down-weights large absolute errors at high magnitudes and is commonly used when relative error is more meaningful than absolute error.

`rmse(y_true, y_pred)` ¶

Root Mean Squared Error (RMSE).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values.	required

Returns:

Type	Description
`float`	Root mean squared error. Lower is better.

`rmsle(y_true, y_pred)` ¶

Root Mean Squared Log Error (RMSLE).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values. Must be non-negative.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values. Must be non-negative.	required

Returns:

Type	Description
`float`	Root mean squared log error. Lower is better.

`smape(y_true, y_pred)` ¶

Symmetric Mean Absolute Percentage Error (sMAPE).

This implementation follows a common competition definition:

\[ \mathrm{sMAPE} = 200 \cdot \mathrm{mean}\left(\frac{|y-\hat{y}|}{|y| + |\hat{y}|}\right) \]

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values.	required

Returns:

Type	Description
`float`	sMAPE in percent. Lower is better.

Notes

If |y| + |y_pred| == 0 for all samples, this function returns 0.0.

`wmape(y_true, y_pred)` ¶

Weighted Mean Absolute Percentage Error (WMAPE).

WMAPE is also commonly described as demand-normalized absolute error:

\[ \mathrm{WMAPE} = 100 \cdot \frac{\sum_i |y_i-\hat{y}_i|}{\sum_i |y_i|} \]

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Ground-truth values.	required
`y_pred`	`array-like of shape (n_samples,)`	Predicted values.	required

Returns:

Type	Description
`float`	WMAPE in percent. Lower is better.

Raises:

Type	Description
`ValueError`	If `sum(\|y_true\|) == 0` (WMAPE undefined).

Notes

WMAPE is symmetric: it does not distinguish underprediction from overprediction. Use CWSL when directional cost asymmetry matters.

`cwsl_sensitivity(y_true, y_pred, R_list=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None)` ¶

Evaluate CWSL across a grid of cost ratios (cost sensitivity analysis).

This helper computes Cost-Weighted Service Loss (CWSL) for each candidate cost ratio:

\[ R = c_u / c_o \]

holding co fixed and setting:

\[ c_u = R \cdot c_o \]

for each value in R_list.

This provides a simple way to assess how model ranking or absolute loss changes under alternative assumptions about shortfall vs. overbuild cost.

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Realized demand. Must be non-negative.	required
`y_pred`	`array-like of shape (n_samples,)`	Forecast demand. Must be non-negative and have the same shape as `y_true`.	required
`R_list`	`sequence of float`	Candidate cost ratios to evaluate. Only strictly positive values are used.	`(0.5, 1.0, 2.0, 3.0)`
`co`	`float or array-like of shape (n_samples,)`	Overbuild cost \(c_o\). Can be scalar or per-interval.	`1.0`
`sample_weight`	`float or array-like of shape (n_samples,)`	Optional non-negative weights per interval, passed through to CWSL.	`None`

Returns:

Type	Description
`dict[float, float]`	Mapping `{R: cwsl_value}` for each valid `R` in `R_list`.

Raises:

Type	Description
`ValueError`	If `R_list` contains no positive values, or if inputs are invalid or CWSL is undefined for the given data slice.

Notes

This is a pure evaluation utility; it does not attempt to infer the “correct” cost ratio. For that, see cost-ratio estimation utilities.

References

Electric Barometer Technical Note: Cost Sensitivity Utilities for CWSL.

`frs(y_true, y_pred, cu, co, sample_weight=None)` ¶

Compute Forecast Readiness Score (FRS).

FRS is a simple composite score defined as:

\[ \mathrm{FRS} = \mathrm{NSL} - \mathrm{CWSL} \]

where: - NSL measures the frequency of avoiding shortfall (higher is better) - CWSL measures asymmetric, demand-normalized cost (lower is better)

This construction rewards forecasts that simultaneously: - maintain high service reliability (high NSL), and - avoid costly asymmetric error (low CWSL).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Realized demand. Must be non-negative.	required
`y_pred`	`array-like of shape (n_samples,)`	Forecast demand. Must be non-negative and have the same shape as `y_true`.	required
`cu`	`float or array-like of shape (n_samples,)`	Per-unit shortfall cost passed through to CWSL.	required
`co`	`float or array-like of shape (n_samples,)`	Per-unit overbuild cost passed through to CWSL.	required
`sample_weight`	`float or array-like of shape (n_samples,)`	Optional non-negative weights per interval, applied consistently to NSL and CWSL.	`None`

Returns:

Type	Description
`float`	Forecast Readiness Score. Higher indicates better readiness. Values are typically bounded above by 1, but can be negative depending on cost and forecast error.

Raises:

Type	Description
`ValueError`	If inputs are invalid or CWSL is undefined for the given data slice.

Notes

This metric is intentionally simple and should be interpreted as a readiness-oriented summary rather than a standalone loss function.

References

Electric Barometer Technical Note: Forecast Readiness Score (FRS).

`hr_at_tau(y_true, y_pred, tau, sample_weight=None)` ¶

Compute Hit Rate within Tolerance (HR@τ).

HR@τ measures the (optionally weighted) fraction of intervals whose absolute error falls within a tolerance band \(\tau\).

Define absolute error and hit indicator:

\[ \begin{aligned} e_i &= |y_i - \hat{y}_i| \\ h_i &= \mathbb{1}[e_i \le \tau_i] \end{aligned} \]

Then:

\[ \mathrm{HR@\tau} = \frac{\sum_i w_i \; h_i}{\sum_i w_i} \]

Higher values are better, with \(\mathrm{HR@\tau} \in [0, 1]\).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Realized demand. Must be non-negative.	required
`y_pred`	`array-like of shape (n_samples,)`	Forecast demand. Must be non-negative and have the same shape as `y_true`.	required
`tau`	`float or array-like of shape (n_samples,)`	Non-negative absolute error tolerance. Can be: - scalar: same tolerance for all intervals - 1D array: per-interval tolerance	required
`sample_weight`	`float or array-like of shape (n_samples,)`	Optional non-negative weights per interval. If provided, HR@τ is computed as a weighted fraction. If total weight is zero, HR@τ is undefined and a `ValueError` is raised.	`None`

Returns:

Type	Description
`float`	HR@τ value in [0, 1]. Higher indicates more intervals within tolerance.

Raises:

Type	Description
`ValueError`	If inputs are invalid (shape mismatch, negative values), if `tau` is negative anywhere, or if total sample weight is zero.

Notes

HR@τ is a symmetric tolerance measure; it treats underbuild and overbuild equally within the tolerance band.
Use HR@τ alongside asymmetric metrics (e.g., CWSL) when operational costs differ by direction.

References

Electric Barometer Technical Note: HR@τ (Hit Rate within Tolerance).

`nsl(y_true, y_pred, sample_weight=None)` ¶

Compute No-Shortfall Level (NSL).

NSL is the (optionally weighted) fraction of evaluation intervals in which the forecast does not underpredict realized demand.

For each interval \(i\), define a hit indicator:

\[ h_i = \mathbb{1}[\hat{y}_i \ge y_i] \]

Then:

\[ \mathrm{NSL} = \frac{\sum_i w_i \; h_i}{\sum_i w_i} \]

where \(w_i\) are optional sample weights (default \(w_i = 1\)). Higher values are better, with \(\mathrm{NSL} \in [0, 1]\).

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Realized demand. Must be non-negative.	required
`y_pred`	`array-like of shape (n_samples,)`	Forecast demand. Must be non-negative and have the same shape as `y_true`.	required
`sample_weight`	`float or array-like of shape (n_samples,)`	Optional non-negative weights per interval. If provided, NSL is computed as a weighted fraction. If total weight is zero, NSL is undefined and a `ValueError` is raised.	`None`

Returns:

Type	Description
`float`	NSL value in [0, 1]. Higher indicates better shortfall avoidance.

Raises:

Type	Description
`ValueError`	If inputs are invalid (shape mismatch, negative values), or if total sample weight is zero.

Notes

NSL is a service reliability measure: it does not quantify how large a shortfall is—only whether a shortfall occurred.
UD complements NSL by measuring shortfall magnitude when shortfalls occur.

References

Electric Barometer Technical Note: No Shortfall Level (NSL).

`ud(y_true, y_pred, sample_weight=None)` ¶

Compute Underbuild Depth (UD).

UD measures the (optionally weighted) average magnitude of shortfall. Unlike NSL (which counts shortfalls), UD quantifies how severe shortfalls are when they occur.

Define per-interval shortfall:

\[ s_i = \max(0, y_i - \hat{y}_i) \]

Then:

\[ \mathrm{UD} = \frac{\sum_i w_i \; s_i}{\sum_i w_i} \]

Higher values indicate deeper average shortfall; lower is better.

Parameters:

Name	Type	Description	Default
`y_true`	`array-like of shape (n_samples,)`	Realized demand. Must be non-negative.	required
`y_pred`	`array-like of shape (n_samples,)`	Forecast demand. Must be non-negative and have the same shape as `y_true`.	required
`sample_weight`	`float or array-like of shape (n_samples,)`	Optional non-negative weights per interval. If provided, UD is computed as a weighted average. If total weight is zero, UD is undefined and a `ValueError` is raised.	`None`

Returns:

Type	Description
`float`	UD value (units match `y_true`/`y_pred`). Lower indicates better shortfall control.

Raises:

Type	Description
`ValueError`	If inputs are invalid (shape mismatch, negative values), or if total sample weight is zero.

Notes

UD ignores overbuild entirely; it is a pure shortfall severity measure.
UD is often interpreted alongside NSL: high NSL + low UD indicates strong service consistency.

References

Electric Barometer Technical Note: Underbuild Depth (UD).

Metrics API¶

Metrics Package¶

eb_metrics.metrics ¶

estimate_R_cost_balance(y_true, y_pred, R_grid=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None) ¶

cwsl(y_true, y_pred, cu, co, sample_weight=None) ¶

mae(y_true, y_pred) ¶

mase(y_true, y_pred, y_naive) ¶

medae(y_true, y_pred) ¶

mape(y_true, y_pred) ¶

mse(y_true, y_pred) ¶

msle(y_true, y_pred) ¶

rmse(y_true, y_pred) ¶

rmsle(y_true, y_pred) ¶

smape(y_true, y_pred) ¶

wmape(y_true, y_pred) ¶

cwsl_sensitivity(y_true, y_pred, R_list=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None) ¶

frs(y_true, y_pred, cu, co, sample_weight=None) ¶

hr_at_tau(y_true, y_pred, tau, sample_weight=None) ¶

nsl(y_true, y_pred, sample_weight=None) ¶

ud(y_true, y_pred, sample_weight=None) ¶

`eb_metrics.metrics` ¶

`estimate_R_cost_balance(y_true, y_pred, R_grid=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None)` ¶

`cwsl(y_true, y_pred, cu, co, sample_weight=None)` ¶

`mae(y_true, y_pred)` ¶

`mase(y_true, y_pred, y_naive)` ¶

`medae(y_true, y_pred)` ¶

`mape(y_true, y_pred)` ¶

`mse(y_true, y_pred)` ¶

`msle(y_true, y_pred)` ¶

`rmse(y_true, y_pred)` ¶

`rmsle(y_true, y_pred)` ¶

`smape(y_true, y_pred)` ¶

`wmape(y_true, y_pred)` ¶

`cwsl_sensitivity(y_true, y_pred, R_list=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None)` ¶

`frs(y_true, y_pred, cu, co, sample_weight=None)` ¶

`hr_at_tau(y_true, y_pred, tau, sample_weight=None)` ¶

`nsl(y_true, y_pred, sample_weight=None)` ¶

`ud(y_true, y_pred, sample_weight=None)` ¶