Skip to content

Validation utilities

This section documents validation utilities provided by eb-evaluation.

These utilities perform structural and semantic validation of inputs and outputs used during evaluation workflows, ensuring consistency and correctness before metrics, diagnostics, or model-selection logic is applied.

eb_evaluation.utils.validation

Lightweight DataFrame validation utilities.

This module provides small, explicit validation helpers for pandas DataFrames used throughout the Electric Barometer evaluation and model-selection stack.

The intent is to:

  • fail fast with clear error messages
  • keep validation logic centralized and reusable
  • distinguish data validation errors from other ValueError instances

These helpers are intentionally minimal and do not attempt schema inference or coercion; they only assert required structural properties.

DataFrameValidationError

Bases: ValueError

Raised when a pandas DataFrame fails a validation check.

This is a thin subclass of ValueError that allows callers to explicitly catch DataFrame-related validation issues and distinguish them from other value errors (e.g., numerical domain errors).

ensure_columns_present(df, required, *, context=None)

Ensure that all required columns are present in a DataFrame.

Parameters:

Name Type Description Default
df DataFrame

DataFrame to validate.

required
required Sequence[str]

Column names that must be present in df.

required
context str | None

Optional context string (e.g., function or module name) included in the error message to aid debugging.

None

Raises:

Type Description
DataFrameValidationError

If one or more required columns are missing.

Notes

This function performs a presence-only check. It does not validate column dtypes or contents.

ensure_non_empty(df, *, context=None)

Ensure that a DataFrame is not empty.

Parameters:

Name Type Description Default
df DataFrame

DataFrame to validate.

required
context str | None

Optional context string (e.g., function or module name) included in the error message to aid debugging.

None

Raises:

Type Description
DataFrameValidationError

If the DataFrame has zero rows.

Notes

This check is commonly used after filtering or grouping operations to ensure downstream computations have at least one observation.