Cost ratio tuning¶

This section documents utilities for estimating and tuning cost-ratio parameters in eb-optimization.

Cost ratio tuning supports calibration of asymmetric over- and under-prediction penalties based on observed outcomes, enabling alignment between optimization behavior and business risk preferences.

`eb_optimization.tuning.cost_ratio` ¶

Cost ratio (R) tuning utilities.

This module provides calibration helpers for selecting the underbuild-to-overbuild cost ratio:

\[ R = \frac{c_u}{c_o} \]

These routines belong in eb-optimization because they choose/govern parameters from data over a candidate set (grid search + calibration diagnostics). They are not metric primitives (eb-metrics) and are not runtime policies (eb-optimization/policies).

Layering: - search/ : reusable candidate-space utilities (grids, kernels) - tuning/ : define candidate grids + objectives + return calibration artifacts - policies/ : frozen artifacts that apply parameters deterministically at runtime

Selection strategy

estimate_R_cost_balance can select the optimal ratio in two equivalent ways:

1) selection="curve" (default) - Materialize the full sensitivity curve over the candidate grid. - Select \(R^*\) by a direct reduction on the curve (NumPy argmin over gap). - This is typically faster and emphasizes that the curve is the primary audit artifact.

2) selection="kernel" - Materialize the full sensitivity curve over the candidate grid. - Select \(R^*\) using the generic candidate-search kernel (argmin_over_candidates) by scoring each candidate using the curve-derived gap. - This preserves a consistent “kernel-governed” selection pathway that matches other search utilities in the library.

Both strategies are deterministic and use the same tie-breaking semantics: the first candidate (in filtered grid order) achieving the minimum gap is chosen.

Returning governance artifacts

If return_curve=False (default), estimate_R_cost_balance returns the selected ratio R_star as a plain float (backwards compatible).

If return_curve=True, it returns a CostRatioEstimate object that mirrors the intent of TauEstimate: it includes the selected ratio plus the candidate grid actually searched, the selection strategy, the tie-break rule, sample count, and a diagnostics payload alongside the full sensitivity curve.

Entity-level calibration (recommended API)

Entity-level calibration produces richer outputs than a flat DataFrame can cleanly represent (because each entity has its own sensitivity curve).

To keep both: - analysis convenience (legacy DataFrame), and - gold-standard “persistable artifact” structure,

estimate_entity_R_from_balance supports two return modes:

1) Default (backwards compatible): - Returns a DataFrame with one row per entity and scalar outputs (no curves). - This preserves legacy column names (R, diff) for compatibility.

2) return_result=True: - Returns an EntityCostRatioEstimate artifact containing: - table: one row per entity with standardized scalar outputs + per-entity diagnostics (no DataFrame-in-cell curves) - curves: dict mapping entity_id -> sensitivity curve DataFrame - shared governance metadata (method, grid, selection, tie_break)

Serialization guidance

The entity-level table is designed to be Parquet/CSV friendly.
The diagnostics dict is intended to remain JSON-serializable (floats/bools/ints/str).
Full per-entity curves are kept in the curves mapping to avoid object dtype columns.

`CostRatioEstimate` `dataclass` ¶

Result container for cost-ratio (R) calibration.

This is intentionally "TauEstimate-like": downstream code can persist or log a single object that fully describes the calibration decision.

Attributes:

Name	Type	Description
`R_star`	`float`	Selected cost ratio from the candidate grid.
`method`	`str`	Method identifier used to produce the estimate (e.g., "cost_balance").
`n`	`int`	Number of samples used in the calibration (after validation).
`grid`	`ndarray`	The filtered candidate grid actually searched (strictly positive values). The order of this grid defines tie-breaking.
`selection`	`str`	Selection strategy used once the curve is computed ("curve" or "kernel").
`tie_break`	`str`	Tie-breaking rule applied when multiple candidates achieve the same objective. For this routine it is always "first".
`diagnostics`	`dict[str, Any]`	Method-specific diagnostic metadata intended for governance and reporting. Keys include: - "over_cost_const": float - "min_gap": float - "degenerate_perfect_forecast": bool - "rel_min_gap": float - "grid_sensitivity": dict[str, float] - "grid_instability_log": float - "identifiability_thresholds": dict[str, float] - "is_identifiable": bool
`rel_min_gap`	`float`	Relative imbalance at the chosen point: min_gap / over_cost_const (or inf if over_cost_const==0 and min_gap>0). This is a cost-separation diagnostic.
`R_min`	`float`	Minimum R_star across built-in grid perturbations (base/exclude/shift).
`R_max`	`float`	Maximum R_star across built-in grid perturbations (base/exclude/shift).
`grid_instability_log`	`float`	log(R_max / R_min) across built-in perturbations. This is a grid-sensitivity diagnostic capturing weak identifiability due to discretization.
`is_identifiable`	`bool`	Boolean summary derived from conservative thresholds on rel_min_gap and grid_instability_log. This does not change selection; it only reports stability.
`curve`	`DataFrame`	Sensitivity curve / diagnostics for each candidate ratio. Columns are: - R - under_cost - over_cost - gap (= \|under_cost - over_cost\|)

`EntityCostRatioEstimate` `dataclass` ¶

Entity-level cost ratio calibration artifact.

This container is designed to be persistable and ergonomic:

table can be saved to Parquet/CSV without DataFrame-in-cell object columns.
curves retains the full per-entity sensitivity curves for audit/plotting.
Shared metadata captures the governance context of the run.

Attributes:

Name	Type	Description
`entity_col`	`str`	Name of the entity identifier column used in `table`.
`method`	`str`	Method identifier used to produce the estimate (e.g., "cost_balance").
`grid`	`ndarray`	The filtered candidate grid actually searched (strictly positive values). The order of this grid defines tie-breaking for each entity.
`selection`	`str`	Selection strategy used once each curve is computed ("curve" or "kernel").
`tie_break`	`str`	Tie-breaking rule applied when multiple candidates achieve the same objective. For this routine it is always "first".
`table`	`DataFrame`	One row per entity with scalar results and summary diagnostics. Columns include: - entity_col - R_star - n - under_cost - over_cost - gap - diagnostics (dict; intended to be JSON-serializable)
`curves`	`dict[Any, DataFrame]`	Mapping of entity_id -> sensitivity curve DataFrame for that entity. Each curve has columns: [R, under_cost, over_cost, gap].

`estimate_R_cost_balance(y_true, y_pred, R_grid=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None, *, return_curve=False, selection='curve')` ¶

Estimate a global cost ratio \(R = c_u / c_o\) via cost balance.

This routine selects a single, global cost ratio \(R\) by searching a candidate grid and choosing the value where the total weighted underbuild cost is closest to the total weighted overbuild cost.

For each candidate \(R\) in R_grid:

\[ \begin{aligned} c_{u,i} &= R \cdot c_{o,i} \\ s_i &= \max(0, y_i - \hat{y}_i) \\ e_i &= \max(0, \hat{y}_i - y_i) \\ C_u(R) &= \sum_i w_i \; c_{u,i} \; s_i \\ C_o(R) &= \sum_i w_i \; c_{o,i} \; e_i \end{aligned} \]

and the selected value is:

\[ R^* = \arg\min_R \; \left| C_u(R) - C_o(R) \right|. \]

Parameters:

Name	Type	Description	Default
`y_true`	`ArrayLike`	Realized demand (non-negative), shape (n_samples,).	required
`y_pred`	`ArrayLike`	Forecast demand (non-negative), shape (n_samples,). Must match `y_true`.	required
`R_grid`	`Sequence[float]`	Candidate ratios to search. Only strictly positive values are considered. The filtered grid order is preserved for tie-breaking.	`(0.5, 1.0, 2.0, 3.0)`
`co`	`float \| ArrayLike`	Overbuild cost coefficient \(c_o\). May be scalar or 1D array of shape (n_samples,). Underbuild cost is implied as \(c_{u,i} = R \cdot c_{o,i}\).	`1.0`
`sample_weight`	`ArrayLike \| None`	Optional non-negative weights. If None, all intervals receive weight 1.0.	`None`
`return_curve`	`bool`	If True, return a CostRatioEstimate containing both the chosen R and the full sensitivity curve diagnostics over the filtered grid, plus governance metadata (grid used, selection strategy, tie-break rule, sample size).	`False`
`selection`	`Literal['curve', 'kernel']`	Strategy used to select \(R^*\) once the sensitivity curve has been computed: "curve" : Select via NumPy argmin over the curve's `gap` column. This is typically faster and treats the curve as the primary artifact. "kernel" : Select via `argmin_over_candidates`, scoring each candidate using curve-derived gap values. This maintains consistency with other candidate-search kernels. Both strategies are deterministic and share the same tie-breaking behavior.	`'curve'`

Returns:

Type Description

float or CostRatioEstimate

By default returns the selected cost ratio minimizing |under_cost - over_cost|.

If return_curve=True, returns a CostRatioEstimate with: - R_star - method, n, grid, selection, tie_break, diagnostics - rel_min_gap, R_min, R_max, grid_instability_log, is_identifiable - curve : pd.DataFrame with columns [R, under_cost, over_cost, gap]

Tie-breaking: - In the degenerate perfect-forecast case (zero error everywhere), chooses the candidate closest to 1.0. - Otherwise, if multiple candidates yield the same minimal gap, the first encountered candidate (in filtered grid order) is returned.

`estimate_entity_R_from_balance(df, entity_col, y_true_col, y_pred_col, ratios=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight_col=None, *, return_result=False, selection='curve')` ¶

Estimate an entity-level cost ratio via a cost-balance grid search.

This function estimates a per-entity underbuild-to-overbuild cost ratio:

\[ R_e = \frac{c_{u,e}}{c_o} \]

by searching over a user-provided grid of candidate ratios.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input DataFrame containing entity identifiers, true values, and predictions.	required
`entity_col`	`str`	Column identifying the entity (e.g., store, restaurant, item).	required
`y_true_col`	`str`	Column containing realized demand.	required
`y_pred_col`	`str`	Column containing forecast demand.	required
`ratios`	`Sequence[float]`	Candidate ratio grid. For backward compatibility, the default return mode requires all candidates be positive. In artifact mode (`return_result=True`), the grid is filtered to strictly positive candidates and the filtered order governs tie-breaking.	`(0.5, 1.0, 2.0, 3.0)`
`co`	`float`	Overbuild cost coefficient (scalar). Underbuild cost coefficient for a given candidate is `cu = R * co`.	`1.0`
`sample_weight_col`	`str \| None`	Optional column containing non-negative sample weights.	`None`
`return_result`	`bool`	If False (default), returns a backward-compatible DataFrame with one row per entity using legacy column names (`R`, `diff`). If True, returns an `EntityCostRatioEstimate` artifact with: - `table`: one row per entity with standardized columns (R_star, gap, etc.) - `curves`: dict mapping entity_id -> curve DataFrame with columns [R, under_cost, over_cost, gap] - shared governance metadata (method, grid, selection, tie_break)	`False`
`selection`	`Literal['curve', 'kernel']`	Strategy used to select each entity's \(R^*\) once its curve has been computed: "curve" : Select via NumPy argmin over the curve's `gap` column. "kernel" : Select via `argmin_over_candidates`, scoring each candidate using curve-derived `gap`. Both strategies are deterministic and share the same tie-breaking behavior ("first").	`'curve'`

Returns:

Type	Description
`DataFrame or EntityCostRatioEstimate`	If `return_result=False`, returns a DataFrame with one row per entity (legacy schema). If `return_result=True`, returns a persistable artifact with per-entity curves.

Cost ratio tuning¶

eb_optimization.tuning.cost_ratio ¶

CostRatioEstimate dataclass ¶

EntityCostRatioEstimate dataclass ¶

estimate_R_cost_balance(y_true, y_pred, R_grid=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None, *, return_curve=False, selection='curve') ¶

estimate_entity_R_from_balance(df, entity_col, y_true_col, y_pred_col, ratios=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight_col=None, *, return_result=False, selection='curve') ¶

`eb_optimization.tuning.cost_ratio` ¶

`CostRatioEstimate` `dataclass` ¶

`EntityCostRatioEstimate` `dataclass` ¶

`estimate_R_cost_balance(y_true, y_pred, R_grid=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight=None, *, return_curve=False, selection='curve')` ¶

`estimate_entity_R_from_balance(df, entity_col, y_true_col, y_pred_col, ratios=(0.5, 1.0, 2.0, 3.0), co=1.0, sample_weight_col=None, *, return_result=False, selection='curve')` ¶