Skip to content

Feature Engineering

This section documents the panel feature engineering orchestration utilities provided by eb-features.

These tools transform entity × timestamp panel data into model-ready feature matrices while enforcing leakage safety, monotonic time ordering, and deterministic feature construction.

All content below is generated automatically from NumPy-style docstrings in the source code.

Feature Engineering API

eb_features.panel.engineering

Panel feature engineering orchestrator.

This module defines a lightweight, frequency-agnostic feature engineering utility for panel time-series data (entity-by-timestamp). The implementation is intentionally stateless: each call constructs features from the provided input DataFrame and configuration.

The output is designed for classical supervised learning pipelines that expect a fixed-width design matrix X and target vector y.

Features

Given an entity identifier column and a target series y_t (per entity), the feature pipeline can construct:

1) Lag features:

\[ \mathrm{lag}_k(t) = y_{t-k} \]

2) Rolling window statistics over the last w observations (leakage-safe by default):

\[ \mathrm{roll\_mean}_w(t) = \frac{1}{w}\sum_{j=1}^{w} y_{t-j} \]

3) Calendar features derived from timestamp: hour, day-of-week, day-of-month, month, and weekend indicator.

4) Optional cyclical encodings for periodic calendar features:

\[ \sin\left(2\pi \frac{\mathrm{hour}}{24}\right), \quad \cos\left(2\pi \frac{\mathrm{hour}}{24}\right) \]

and similarly for day-of-week with period 7.

5) Optional passthrough features: numeric regressors and static metadata columns.

Notes
  • Lags and rolling windows are expressed in index steps (rows) at the input frequency.
  • All time-dependent features are computed strictly within each entity.
  • Passthrough non-numeric columns are encoded using stable integer category codes for the values present in the provided DataFrame.

FeatureConfig dataclass

Configuration for panel time-series feature engineering.

FeatureEngineer

Transform panel time-series data into a model-ready (X, y, feature_names) triple.

transform(df, config)

Transform a panel DataFrame into (X, y, feature_names).