Skip to content

Lag Features

This section documents lag-based feature construction utilities provided by eb-features.

Lag features are computed strictly within entity boundaries and expressed in index steps relative to the input frequency.

All content below is generated automatically from NumPy-style docstrings in the source code.

Lag Feature API

eb_features.panel.lags

Lag feature construction for panel time series.

This module provides stateless utilities to construct lagged versions of a target series within each entity of a panel (entity-by-timestamp) dataset.

Lag features are expressed in index steps (rows) at the input frequency rather than wall-clock units. This makes the transformation frequency-agnostic (5-min, 30-min, hourly, daily, etc.), assuming the panel is sorted by timestamp within each entity.

Definition

For a target series y_t and lag step k:

\[ \mathrm{lag}_k(t) = y_{t-k} \]

The resulting feature column is named lag_{k}.

Notes
  • Lag features are computed strictly within each entity using grouped shifts.
  • The calling pipeline is responsible for handling missing values introduced by lagging (e.g., dropping rows or applying imputation).

add_lag_features(df, *, entity_col, target_col, lag_steps)

Add target lag features to a panel DataFrame.

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame containing at least entity_col and target_col.

required
entity_col str

Name of the entity identifier column.

required
target_col str

Name of the numeric target column to be lagged.

required
lag_steps Sequence[int] | None

Positive lag offsets (in steps). For each k in lag_steps, the feature lag_{k} is added. If None or empty, no lag features are added.

required

Returns:

Name Type Description
df_out DataFrame

Copy of df with lag feature columns added.

feature_cols list[str]

Names of the lag feature columns added.

Raises:

Type Description
KeyError

If entity_col or target_col is missing from df.

ValueError

If any lag step is non-positive.

Notes

Lagging introduces missing values for the first k observations of each entity. These are typically removed downstream when dropna=True or handled via imputation.