Skip to content

Metrics API

This section documents the core metric implementations provided by eb-metrics. All content below is generated automatically from NumPy-style docstrings in the source code.

Metrics Package

eb_metrics.metrics

Public metric API for the Electric Barometer ecosystem.

The eb_metrics.metrics package provides a curated, stable import surface for Electric Barometer evaluation metrics.

This package groups metrics into four main categories:

  • Asymmetric loss metrics (loss) Cost-aware losses that encode directional operational asymmetry.

  • Service and readiness diagnostics (service) Metrics that quantify shortfall avoidance, tolerance coverage, and readiness.

  • Classical regression metrics (regression) Standard symmetric error metrics used for baseline comparison.

  • Cost-ratio utilities (cost_ratio) Helpers for selecting and analyzing the asymmetric cost ratio :math:R = c_u / c_o.

Conceptual definitions and interpretation of Electric Barometer metrics are documented in the companion research repository (eb-papers). This package is the executable reference implementation.

Notes

Users are encouraged to import from eb_metrics.metrics or from the relevant submodule (e.g., eb_metrics.metrics.service) rather than internal helpers.

Examples:

Import from the package surface:

>>> from eb_metrics.metrics import cwsl, nsl, frs

Or import from a submodule:

>>> from eb_metrics.metrics.loss import cwsl
>>> from eb_metrics.metrics.service import nsl

estimate_R_cost_balance(y_true, y_pred, R_grid=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None)

Estimate a global cost ratio \(R = c_u / c_o\) via cost balance.

This routine selects a single, global cost ratio \(R\) by searching a candidate grid and choosing the value where the total weighted underbuild cost is closest to the total weighted overbuild cost.

For each candidate \(R\) in R_grid:

\[ \begin{aligned} c_{u,i} &= R \cdot c_{o,i} \\ s_i &= \max(0, y_i - \hat{y}_i) \\ e_i &= \max(0, \hat{y}_i - y_i) \\ C_u(R) &= \sum_i w_i \; c_{u,i} \; s_i \\ C_o(R) &= \sum_i w_i \; c_{o,i} \; e_i \end{aligned} \]

and the selected value is:

\[ R^* = \arg\min_R \; \left| C_u(R) - C_o(R) \right|. \]

The returned \(R^*\) can be used as: - a reasonable default global cost ratio for evaluation, and/or - the center of a sensitivity sweep (e.g., {R*/2, R*, 2*R*}).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Realized demand (non-negative).

required
y_pred array-like of shape (n_samples,)

Forecast demand (non-negative). Must have the same shape as y_true.

required
R_grid sequence of float

Candidate cost ratios \(R\) to search over. Only strictly positive values are considered.

(0.5, 1.0, 2.0, 3.0)
co float or array-like of shape (n_samples,)

Overbuild cost \(c_o\) per unit. Can be:

  • scalar: same overbuild cost for all intervals
  • 1D array: per-interval overbuild cost

For each \(R\), the implied underbuild cost is \(c_{u,i} = R \cdot c_{o,i}\).

1.0
sample_weight float or array-like of shape (n_samples,)

Optional non-negative weights per interval used to weight the cost aggregation. If None, all intervals receive weight 1.0.

None

Returns:

Type Description
float

The value in R_grid that minimizes \(\left| C_u(R) - C_o(R) \right|\).

If multiple values yield the same minimal gap, the first such value in the (filtered) grid is returned, except in the degenerate perfect forecast case (zero error everywhere), where the candidate closest to 1.0 is returned.

Raises:

Type Description
ValueError

If inputs are invalid (e.g., negative y_true or y_pred), if R_grid is empty, or if it contains no positive values.

Notes

This helper is intentionally simple: it does not infer cost structure from business inputs, nor does it estimate per-item costs. It provides a reproducible, data-driven way to select a reasonable global \(R\) given realized outcomes and forecast behavior.

References

Electric Barometer Technical Note: Cost Ratio Estimation (Choosing \(R\)).

cwsl(y_true, y_pred, cu, co, sample_weight=None)

Compute Cost-Weighted Service Loss (CWSL).

CWSL is a demand-normalized, directionally-aware loss that penalizes shortfalls and overbuilds using explicit per-unit costs.

For each interval \(i\):

\[ \begin{aligned} s_i &= \max(0, y_i - \hat{y}_i) \\ o_i &= \max(0, \hat{y}_i - y_i) \\ \text{cost}_i &= c_{u,i} \; s_i + c_{o,i} \; o_i \end{aligned} \]

and the aggregated metric is:

\[ \mathrm{CWSL} = \frac{\sum_i w_i \; \text{cost}_i}{\sum_i w_i \; y_i} \]

where \(w_i\) are optional sample weights (default \(w_i = 1\)). Lower values indicate better performance.

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Realized demand \(y\). Must be non-negative.

required
y_pred array-like of shape (n_samples,)

Forecast demand \(\hat{y}\). Must be non-negative and have the same shape as y_true.

required
cu float or array-like of shape (n_samples,)

Per-unit shortfall cost \(c_u\). Can be a scalar (global cost) or a 1D array specifying per-interval costs. Must be non-negative.

required
co float or array-like of shape (n_samples,)

Per-unit overbuild cost \(c_o\). Can be a scalar (global cost) or a 1D array specifying per-interval costs. Must be non-negative.

required
sample_weight float or array-like of shape (n_samples,)

Optional non-negative weights per interval. If None, all intervals receive weight 1.0.

None

Returns:

Type Description
float

The CWSL value. Lower is better.

Raises:

Type Description
ValueError

If y_true and y_pred have different shapes, if any demand or forecast values are negative, if any costs are negative, or if the metric is undefined due to zero total (weighted) demand with positive total (weighted) cost.

Notes
  • When cu == co (up to a constant scaling), CWSL behaves similarly to a demand-normalized absolute error (wMAPE-like), but retains explicit cost semantics.
  • If total (weighted) demand is zero and total (weighted) cost is zero, this implementation returns 0.0.
  • If total (weighted) demand is zero but total (weighted) cost is positive, the metric is undefined under this formulation and a ValueError is raised.
References

Electric Barometer Technical Note: Cost-Weighted Service Loss (CWSL).

mae(y_true, y_pred)

Mean Absolute Error (MAE).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values.

required
y_pred array-like of shape (n_samples,)

Predicted values.

required

Returns:

Type Description
float

Mean absolute error. Lower is better.

mase(y_true, y_pred, y_naive)

Mean Absolute Scaled Error (MASE).

MASE scales the model MAE by the MAE of a naive forecast:

\[ \mathrm{MASE} = \frac{\mathrm{MAE}(y,\hat{y})}{\mathrm{MAE}(y, y^{\text{naive}})} \]

where y_naive is typically a naive baseline such as \(y_{t-1}\).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values.

required
y_pred array-like of shape (n_samples,)

Predicted values.

required
y_naive array-like of shape (n_samples,)

Naive forecast values aligned to y_true.

required

Returns:

Type Description
float

MASE value. Lower is better.

Raises:

Type Description
ValueError

If the naive MAE is zero (MASE undefined).

Notes

MASE is scale-free and can be compared across series with different magnitudes, assuming the naive baseline is meaningful.

medae(y_true, y_pred)

Median Absolute Error (MedAE).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values.

required
y_pred array-like of shape (n_samples,)

Predicted values.

required

Returns:

Type Description
float

Median absolute error. Lower is better.

mape(y_true, y_pred)

Mean Absolute Percentage Error (MAPE).

MAPE is computed over samples where y_true != 0:

\[ \mathrm{MAPE} = 100 \cdot \mathrm{mean}\left(\left|\frac{y-\hat{y}}{y}\right|\right) \]

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values.

required
y_pred array-like of shape (n_samples,)

Predicted values.

required

Returns:

Type Description
float

Mean absolute percentage error in percent. Lower is better.

Raises:

Type Description
ValueError

If all values of y_true are zero (MAPE undefined).

Notes

MAPE can be unstable when y_true is near zero. Consider WMAPE or a domain-specific metric (e.g., CWSL) when percentage behavior is undesirable.

mse(y_true, y_pred)

Mean Squared Error (MSE).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values.

required
y_pred array-like of shape (n_samples,)

Predicted values.

required

Returns:

Type Description
float

Mean squared error. Lower is better.

msle(y_true, y_pred)

Mean Squared Log Error (MSLE).

MSLE is defined as:

\[ \mathrm{MSLE} = \mathrm{mean}\left((\log(1+y) - \log(1+\hat{y}))^2\right) \]

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values. Must be non-negative.

required
y_pred array-like of shape (n_samples,)

Predicted values. Must be non-negative.

required

Returns:

Type Description
float

Mean squared log error. Lower is better.

Raises:

Type Description
ValueError

If any value in y_true or y_pred is negative.

Notes

MSLE down-weights large absolute errors at high magnitudes and is commonly used when relative error is more meaningful than absolute error.

rmse(y_true, y_pred)

Root Mean Squared Error (RMSE).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values.

required
y_pred array-like of shape (n_samples,)

Predicted values.

required

Returns:

Type Description
float

Root mean squared error. Lower is better.

rmsle(y_true, y_pred)

Root Mean Squared Log Error (RMSLE).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values. Must be non-negative.

required
y_pred array-like of shape (n_samples,)

Predicted values. Must be non-negative.

required

Returns:

Type Description
float

Root mean squared log error. Lower is better.

smape(y_true, y_pred)

Symmetric Mean Absolute Percentage Error (sMAPE).

This implementation follows a common competition definition:

\[ \mathrm{sMAPE} = 200 \cdot \mathrm{mean}\left(\frac{|y-\hat{y}|}{|y| + |\hat{y}|}\right) \]

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values.

required
y_pred array-like of shape (n_samples,)

Predicted values.

required

Returns:

Type Description
float

sMAPE in percent. Lower is better.

Notes

If |y| + |y_pred| == 0 for all samples, this function returns 0.0.

wmape(y_true, y_pred)

Weighted Mean Absolute Percentage Error (WMAPE).

WMAPE is also commonly described as demand-normalized absolute error:

\[ \mathrm{WMAPE} = 100 \cdot \frac{\sum_i |y_i-\hat{y}_i|}{\sum_i |y_i|} \]

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Ground-truth values.

required
y_pred array-like of shape (n_samples,)

Predicted values.

required

Returns:

Type Description
float

WMAPE in percent. Lower is better.

Raises:

Type Description
ValueError

If sum(|y_true|) == 0 (WMAPE undefined).

Notes

WMAPE is symmetric: it does not distinguish underprediction from overprediction. Use CWSL when directional cost asymmetry matters.

cwsl_sensitivity(y_true, y_pred, R_list=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None)

Evaluate CWSL across a grid of cost ratios (cost sensitivity analysis).

This helper computes Cost-Weighted Service Loss (CWSL) for each candidate cost ratio:

\[ R = c_u / c_o \]

holding co fixed and setting:

\[ c_u = R \cdot c_o \]

for each value in R_list.

This provides a simple way to assess how model ranking or absolute loss changes under alternative assumptions about shortfall vs. overbuild cost.

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Realized demand. Must be non-negative.

required
y_pred array-like of shape (n_samples,)

Forecast demand. Must be non-negative and have the same shape as y_true.

required
R_list sequence of float

Candidate cost ratios to evaluate. Only strictly positive values are used.

(0.5, 1.0, 2.0, 3.0)
co float or array-like of shape (n_samples,)

Overbuild cost \(c_o\). Can be scalar or per-interval.

1.0
sample_weight float or array-like of shape (n_samples,)

Optional non-negative weights per interval, passed through to CWSL.

None

Returns:

Type Description
dict[float, float]

Mapping {R: cwsl_value} for each valid R in R_list.

Raises:

Type Description
ValueError

If R_list contains no positive values, or if inputs are invalid or CWSL is undefined for the given data slice.

Notes
  • This is a pure evaluation utility; it does not attempt to infer the “correct” cost ratio. For that, see cost-ratio estimation utilities.
References

Electric Barometer Technical Note: Cost Sensitivity Utilities for CWSL.

frs(y_true, y_pred, cu, co, sample_weight=None)

Compute Forecast Readiness Score (FRS).

FRS is a simple composite score defined as:

\[ \mathrm{FRS} = \mathrm{NSL} - \mathrm{CWSL} \]

where: - NSL measures the frequency of avoiding shortfall (higher is better) - CWSL measures asymmetric, demand-normalized cost (lower is better)

This construction rewards forecasts that simultaneously: - maintain high service reliability (high NSL), and - avoid costly asymmetric error (low CWSL).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Realized demand. Must be non-negative.

required
y_pred array-like of shape (n_samples,)

Forecast demand. Must be non-negative and have the same shape as y_true.

required
cu float or array-like of shape (n_samples,)

Per-unit shortfall cost passed through to CWSL.

required
co float or array-like of shape (n_samples,)

Per-unit overbuild cost passed through to CWSL.

required
sample_weight float or array-like of shape (n_samples,)

Optional non-negative weights per interval, applied consistently to NSL and CWSL.

None

Returns:

Type Description
float

Forecast Readiness Score. Higher indicates better readiness. Values are typically bounded above by 1, but can be negative depending on cost and forecast error.

Raises:

Type Description
ValueError

If inputs are invalid or CWSL is undefined for the given data slice.

Notes

This metric is intentionally simple and should be interpreted as a readiness-oriented summary rather than a standalone loss function.

References

Electric Barometer Technical Note: Forecast Readiness Score (FRS).

hr_at_tau(y_true, y_pred, tau, sample_weight=None)

Compute Hit Rate within Tolerance (HR@τ).

HR@τ measures the (optionally weighted) fraction of intervals whose absolute error falls within a tolerance band \(\tau\).

Define absolute error and hit indicator:

\[ \begin{aligned} e_i &= |y_i - \hat{y}_i| \\ h_i &= \mathbb{1}[e_i \le \tau_i] \end{aligned} \]

Then:

\[ \mathrm{HR@\tau} = \frac{\sum_i w_i \; h_i}{\sum_i w_i} \]

Higher values are better, with \(\mathrm{HR@\tau} \in [0, 1]\).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Realized demand. Must be non-negative.

required
y_pred array-like of shape (n_samples,)

Forecast demand. Must be non-negative and have the same shape as y_true.

required
tau float or array-like of shape (n_samples,)

Non-negative absolute error tolerance. Can be: - scalar: same tolerance for all intervals - 1D array: per-interval tolerance

required
sample_weight float or array-like of shape (n_samples,)

Optional non-negative weights per interval. If provided, HR@τ is computed as a weighted fraction. If total weight is zero, HR@τ is undefined and a ValueError is raised.

None

Returns:

Type Description
float

HR@τ value in [0, 1]. Higher indicates more intervals within tolerance.

Raises:

Type Description
ValueError

If inputs are invalid (shape mismatch, negative values), if tau is negative anywhere, or if total sample weight is zero.

Notes
  • HR@τ is a symmetric tolerance measure; it treats underbuild and overbuild equally within the tolerance band.
  • Use HR@τ alongside asymmetric metrics (e.g., CWSL) when operational costs differ by direction.
References

Electric Barometer Technical Note: HR@τ (Hit Rate within Tolerance).

nsl(y_true, y_pred, sample_weight=None)

Compute No-Shortfall Level (NSL).

NSL is the (optionally weighted) fraction of evaluation intervals in which the forecast does not underpredict realized demand.

For each interval \(i\), define a hit indicator:

\[ h_i = \mathbb{1}[\hat{y}_i \ge y_i] \]

Then:

\[ \mathrm{NSL} = \frac{\sum_i w_i \; h_i}{\sum_i w_i} \]

where \(w_i\) are optional sample weights (default \(w_i = 1\)). Higher values are better, with \(\mathrm{NSL} \in [0, 1]\).

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Realized demand. Must be non-negative.

required
y_pred array-like of shape (n_samples,)

Forecast demand. Must be non-negative and have the same shape as y_true.

required
sample_weight float or array-like of shape (n_samples,)

Optional non-negative weights per interval. If provided, NSL is computed as a weighted fraction. If total weight is zero, NSL is undefined and a ValueError is raised.

None

Returns:

Type Description
float

NSL value in [0, 1]. Higher indicates better shortfall avoidance.

Raises:

Type Description
ValueError

If inputs are invalid (shape mismatch, negative values), or if total sample weight is zero.

Notes
  • NSL is a service reliability measure: it does not quantify how large a shortfall is—only whether a shortfall occurred.
  • UD complements NSL by measuring shortfall magnitude when shortfalls occur.
References

Electric Barometer Technical Note: No Shortfall Level (NSL).

ud(y_true, y_pred, sample_weight=None)

Compute Underbuild Depth (UD).

UD measures the (optionally weighted) average magnitude of shortfall. Unlike NSL (which counts shortfalls), UD quantifies how severe shortfalls are when they occur.

Define per-interval shortfall:

\[ s_i = \max(0, y_i - \hat{y}_i) \]

Then:

\[ \mathrm{UD} = \frac{\sum_i w_i \; s_i}{\sum_i w_i} \]

Higher values indicate deeper average shortfall; lower is better.

Parameters:

Name Type Description Default
y_true array-like of shape (n_samples,)

Realized demand. Must be non-negative.

required
y_pred array-like of shape (n_samples,)

Forecast demand. Must be non-negative and have the same shape as y_true.

required
sample_weight float or array-like of shape (n_samples,)

Optional non-negative weights per interval. If provided, UD is computed as a weighted average. If total weight is zero, UD is undefined and a ValueError is raised.

None

Returns:

Type Description
float

UD value (units match y_true/y_pred). Lower indicates better shortfall control.

Raises:

Type Description
ValueError

If inputs are invalid (shape mismatch, negative values), or if total sample weight is zero.

Notes
  • UD ignores overbuild entirely; it is a pure shortfall severity measure.
  • UD is often interpreted alongside NSL: high NSL + low UD indicates strong service consistency.
References

Electric Barometer Technical Note: Underbuild Depth (UD).