Metrics API¶
This section documents the core metric implementations provided by eb-metrics. All content below is generated automatically from NumPy-style docstrings in the source code.
Metrics Package¶
eb_metrics.metrics
¶
Public metric API for the Electric Barometer ecosystem.
The eb_metrics.metrics package provides a curated, stable import surface for
Electric Barometer evaluation metrics.
This package groups metrics into four main categories:
-
Asymmetric loss metrics (
loss) Cost-aware losses that encode directional operational asymmetry. -
Service and readiness diagnostics (
service) Metrics that quantify shortfall avoidance, tolerance coverage, and readiness. -
Classical regression metrics (
regression) Standard symmetric error metrics used for baseline comparison. -
Cost-ratio utilities (
cost_ratio) Helpers for selecting and analyzing the asymmetric cost ratio :math:R = c_u / c_o.
Conceptual definitions and interpretation of Electric Barometer metrics are
documented in the companion research repository (eb-papers). This package is
the executable reference implementation.
Notes
Users are encouraged to import from eb_metrics.metrics or from the relevant
submodule (e.g., eb_metrics.metrics.service) rather than internal helpers.
Examples:
Import from the package surface:
>>> from eb_metrics.metrics import cwsl, nsl, frs
Or import from a submodule:
>>> from eb_metrics.metrics.loss import cwsl
>>> from eb_metrics.metrics.service import nsl
estimate_R_cost_balance(y_true, y_pred, R_grid=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None)
¶
Estimate a global cost ratio \(R = c_u / c_o\) via cost balance.
This routine selects a single, global cost ratio \(R\) by searching a candidate grid and choosing the value where the total weighted underbuild cost is closest to the total weighted overbuild cost.
For each candidate \(R\) in R_grid:
and the selected value is:
The returned \(R^*\) can be used as:
- a reasonable default global cost ratio for evaluation, and/or
- the center of a sensitivity sweep (e.g., {R*/2, R*, 2*R*}).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Realized demand (non-negative). |
required |
y_pred
|
array-like of shape (n_samples,)
|
Forecast demand (non-negative). Must have the same shape as |
required |
R_grid
|
sequence of float
|
Candidate cost ratios \(R\) to search over. Only strictly positive values are considered. |
(0.5, 1.0, 2.0, 3.0)
|
co
|
float or array-like of shape (n_samples,)
|
Overbuild cost \(c_o\) per unit. Can be:
For each \(R\), the implied underbuild cost is \(c_{u,i} = R \cdot c_{o,i}\). |
1.0
|
sample_weight
|
float or array-like of shape (n_samples,)
|
Optional non-negative weights per interval used to weight the cost
aggregation. If |
None
|
Returns:
| Type | Description |
|---|---|
float
|
The value in If multiple values yield the same minimal gap, the first such value in
the (filtered) grid is returned, except in the degenerate perfect
forecast case (zero error everywhere), where the candidate closest to
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs are invalid (e.g., negative |
Notes
This helper is intentionally simple: it does not infer cost structure from business inputs, nor does it estimate per-item costs. It provides a reproducible, data-driven way to select a reasonable global \(R\) given realized outcomes and forecast behavior.
References
Electric Barometer Technical Note: Cost Ratio Estimation (Choosing \(R\)).
cwsl(y_true, y_pred, cu, co, sample_weight=None)
¶
Compute Cost-Weighted Service Loss (CWSL).
CWSL is a demand-normalized, directionally-aware loss that penalizes shortfalls and overbuilds using explicit per-unit costs.
For each interval \(i\):
and the aggregated metric is:
where \(w_i\) are optional sample weights (default \(w_i = 1\)). Lower values indicate better performance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Realized demand \(y\). Must be non-negative. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Forecast demand \(\hat{y}\). Must be non-negative and have the same
shape as |
required |
cu
|
float or array-like of shape (n_samples,)
|
Per-unit shortfall cost \(c_u\). Can be a scalar (global cost) or a 1D array specifying per-interval costs. Must be non-negative. |
required |
co
|
float or array-like of shape (n_samples,)
|
Per-unit overbuild cost \(c_o\). Can be a scalar (global cost) or a 1D array specifying per-interval costs. Must be non-negative. |
required |
sample_weight
|
float or array-like of shape (n_samples,)
|
Optional non-negative weights per interval. If |
None
|
Returns:
| Type | Description |
|---|---|
float
|
The CWSL value. Lower is better. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
- When
cu == co(up to a constant scaling), CWSL behaves similarly to a demand-normalized absolute error (wMAPE-like), but retains explicit cost semantics. - If total (weighted) demand is zero and total (weighted) cost is zero,
this implementation returns
0.0. - If total (weighted) demand is zero but total (weighted) cost is positive,
the metric is undefined under this formulation and a
ValueErroris raised.
References
Electric Barometer Technical Note: Cost-Weighted Service Loss (CWSL).
mae(y_true, y_pred)
¶
Mean Absolute Error (MAE).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Mean absolute error. Lower is better. |
mase(y_true, y_pred, y_naive)
¶
Mean Absolute Scaled Error (MASE).
MASE scales the model MAE by the MAE of a naive forecast:
where y_naive is typically a naive baseline such as \(y_{t-1}\).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. |
required |
y_naive
|
array-like of shape (n_samples,)
|
Naive forecast values aligned to |
required |
Returns:
| Type | Description |
|---|---|
float
|
MASE value. Lower is better. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the naive MAE is zero (MASE undefined). |
Notes
MASE is scale-free and can be compared across series with different magnitudes, assuming the naive baseline is meaningful.
medae(y_true, y_pred)
¶
Median Absolute Error (MedAE).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Median absolute error. Lower is better. |
mape(y_true, y_pred)
¶
Mean Absolute Percentage Error (MAPE).
MAPE is computed over samples where y_true != 0:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Mean absolute percentage error in percent. Lower is better. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If all values of |
Notes
MAPE can be unstable when y_true is near zero. Consider WMAPE or a
domain-specific metric (e.g., CWSL) when percentage behavior is undesirable.
mse(y_true, y_pred)
¶
Mean Squared Error (MSE).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Mean squared error. Lower is better. |
msle(y_true, y_pred)
¶
Mean Squared Log Error (MSLE).
MSLE is defined as:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. Must be non-negative. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. Must be non-negative. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Mean squared log error. Lower is better. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any value in |
Notes
MSLE down-weights large absolute errors at high magnitudes and is commonly used when relative error is more meaningful than absolute error.
rmse(y_true, y_pred)
¶
Root Mean Squared Error (RMSE).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Root mean squared error. Lower is better. |
rmsle(y_true, y_pred)
¶
Root Mean Squared Log Error (RMSLE).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. Must be non-negative. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. Must be non-negative. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Root mean squared log error. Lower is better. |
smape(y_true, y_pred)
¶
Symmetric Mean Absolute Percentage Error (sMAPE).
This implementation follows a common competition definition:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
sMAPE in percent. Lower is better. |
Notes
If |y| + |y_pred| == 0 for all samples, this function returns 0.0.
wmape(y_true, y_pred)
¶
Weighted Mean Absolute Percentage Error (WMAPE).
WMAPE is also commonly described as demand-normalized absolute error:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Ground-truth values. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
WMAPE in percent. Lower is better. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
WMAPE is symmetric: it does not distinguish underprediction from overprediction. Use CWSL when directional cost asymmetry matters.
cwsl_sensitivity(y_true, y_pred, R_list=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None)
¶
Evaluate CWSL across a grid of cost ratios (cost sensitivity analysis).
This helper computes Cost-Weighted Service Loss (CWSL) for each candidate cost ratio:
holding co fixed and setting:
for each value in R_list.
This provides a simple way to assess how model ranking or absolute loss changes under alternative assumptions about shortfall vs. overbuild cost.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Realized demand. Must be non-negative. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Forecast demand. Must be non-negative and have the same shape as
|
required |
R_list
|
sequence of float
|
Candidate cost ratios to evaluate. Only strictly positive values are used. |
(0.5, 1.0, 2.0, 3.0)
|
co
|
float or array-like of shape (n_samples,)
|
Overbuild cost \(c_o\). Can be scalar or per-interval. |
1.0
|
sample_weight
|
float or array-like of shape (n_samples,)
|
Optional non-negative weights per interval, passed through to CWSL. |
None
|
Returns:
| Type | Description |
|---|---|
dict[float, float]
|
Mapping |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
- This is a pure evaluation utility; it does not attempt to infer the “correct” cost ratio. For that, see cost-ratio estimation utilities.
References
Electric Barometer Technical Note: Cost Sensitivity Utilities for CWSL.
frs(y_true, y_pred, cu, co, sample_weight=None)
¶
Compute Forecast Readiness Score (FRS).
FRS is a simple composite score defined as:
where: - NSL measures the frequency of avoiding shortfall (higher is better) - CWSL measures asymmetric, demand-normalized cost (lower is better)
This construction rewards forecasts that simultaneously: - maintain high service reliability (high NSL), and - avoid costly asymmetric error (low CWSL).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Realized demand. Must be non-negative. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Forecast demand. Must be non-negative and have the same shape as
|
required |
cu
|
float or array-like of shape (n_samples,)
|
Per-unit shortfall cost passed through to CWSL. |
required |
co
|
float or array-like of shape (n_samples,)
|
Per-unit overbuild cost passed through to CWSL. |
required |
sample_weight
|
float or array-like of shape (n_samples,)
|
Optional non-negative weights per interval, applied consistently to NSL and CWSL. |
None
|
Returns:
| Type | Description |
|---|---|
float
|
Forecast Readiness Score. Higher indicates better readiness. Values are typically bounded above by 1, but can be negative depending on cost and forecast error. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs are invalid or CWSL is undefined for the given data slice. |
Notes
This metric is intentionally simple and should be interpreted as a readiness-oriented summary rather than a standalone loss function.
References
Electric Barometer Technical Note: Forecast Readiness Score (FRS).
hr_at_tau(y_true, y_pred, tau, sample_weight=None)
¶
Compute Hit Rate within Tolerance (HR@τ).
HR@τ measures the (optionally weighted) fraction of intervals whose absolute error falls within a tolerance band \(\tau\).
Define absolute error and hit indicator:
Then:
Higher values are better, with \(\mathrm{HR@\tau} \in [0, 1]\).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Realized demand. Must be non-negative. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Forecast demand. Must be non-negative and have the same shape as
|
required |
tau
|
float or array-like of shape (n_samples,)
|
Non-negative absolute error tolerance. Can be: - scalar: same tolerance for all intervals - 1D array: per-interval tolerance |
required |
sample_weight
|
float or array-like of shape (n_samples,)
|
Optional non-negative weights per interval. If provided, HR@τ is computed
as a weighted fraction. If total weight is zero, HR@τ is undefined and a
|
None
|
Returns:
| Type | Description |
|---|---|
float
|
HR@τ value in [0, 1]. Higher indicates more intervals within tolerance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs are invalid (shape mismatch, negative values), if |
Notes
- HR@τ is a symmetric tolerance measure; it treats underbuild and overbuild equally within the tolerance band.
- Use HR@τ alongside asymmetric metrics (e.g., CWSL) when operational costs differ by direction.
References
Electric Barometer Technical Note: HR@τ (Hit Rate within Tolerance).
nsl(y_true, y_pred, sample_weight=None)
¶
Compute No-Shortfall Level (NSL).
NSL is the (optionally weighted) fraction of evaluation intervals in which the forecast does not underpredict realized demand.
For each interval \(i\), define a hit indicator:
Then:
where \(w_i\) are optional sample weights (default \(w_i = 1\)). Higher values are better, with \(\mathrm{NSL} \in [0, 1]\).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Realized demand. Must be non-negative. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Forecast demand. Must be non-negative and have the same shape as
|
required |
sample_weight
|
float or array-like of shape (n_samples,)
|
Optional non-negative weights per interval. If provided, NSL is computed
as a weighted fraction. If total weight is zero, NSL is undefined and a
|
None
|
Returns:
| Type | Description |
|---|---|
float
|
NSL value in [0, 1]. Higher indicates better shortfall avoidance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs are invalid (shape mismatch, negative values), or if total sample weight is zero. |
Notes
- NSL is a service reliability measure: it does not quantify how large a shortfall is—only whether a shortfall occurred.
- UD complements NSL by measuring shortfall magnitude when shortfalls occur.
References
Electric Barometer Technical Note: No Shortfall Level (NSL).
ud(y_true, y_pred, sample_weight=None)
¶
Compute Underbuild Depth (UD).
UD measures the (optionally weighted) average magnitude of shortfall. Unlike NSL (which counts shortfalls), UD quantifies how severe shortfalls are when they occur.
Define per-interval shortfall:
Then:
Higher values indicate deeper average shortfall; lower is better.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of shape (n_samples,)
|
Realized demand. Must be non-negative. |
required |
y_pred
|
array-like of shape (n_samples,)
|
Forecast demand. Must be non-negative and have the same shape as
|
required |
sample_weight
|
float or array-like of shape (n_samples,)
|
Optional non-negative weights per interval. If provided, UD is computed
as a weighted average. If total weight is zero, UD is undefined and a
|
None
|
Returns:
| Type | Description |
|---|---|
float
|
UD value (units match |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs are invalid (shape mismatch, negative values), or if total sample weight is zero. |
Notes
- UD ignores overbuild entirely; it is a pure shortfall severity measure.
- UD is often interpreted alongside NSL: high NSL + low UD indicates strong service consistency.
References
Electric Barometer Technical Note: Underbuild Depth (UD).