Lag Features¶
This section documents lag-based feature construction utilities provided by eb-features.
Lag features are computed strictly within entity boundaries and expressed in index steps relative to the input frequency.
All content below is generated automatically from NumPy-style docstrings in the source code.
Lag Feature API¶
eb_features.panel.lags
¶
Lag feature construction for panel time series.
This module provides stateless utilities to construct lagged versions of a target series within each entity of a panel (entity-by-timestamp) dataset.
Lag features are expressed in index steps (rows) at the input frequency rather than wall-clock units. This makes the transformation frequency-agnostic (5-min, 30-min, hourly, daily, etc.), assuming the panel is sorted by timestamp within each entity.
Definition
For a target series y_t and lag step k:
The resulting feature column is named lag_{k}.
Notes
- Lag features are computed strictly within each entity using grouped shifts.
- The calling pipeline is responsible for handling missing values introduced by lagging (e.g., dropping rows or applying imputation).
add_lag_features(df, *, entity_col, target_col, lag_steps)
¶
Add target lag features to a panel DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing at least |
required |
entity_col
|
str
|
Name of the entity identifier column. |
required |
target_col
|
str
|
Name of the numeric target column to be lagged. |
required |
lag_steps
|
Sequence[int] | None
|
Positive lag offsets (in steps). For each |
required |
Returns:
| Name | Type | Description |
|---|---|---|
df_out |
DataFrame
|
Copy of |
feature_cols |
list[str]
|
Names of the lag feature columns added. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
ValueError
|
If any lag step is non-positive. |
Notes
Lagging introduces missing values for the first k observations of each entity.
These are typically removed downstream when dropna=True or handled via
imputation.