Skip to content

Feature Encoding

This section documents encoding utilities provided by eb-features for transforming non-numeric features into numeric representations.

These encoders are intentionally stateless and operate only on the provided dataset.

All content below is generated automatically from NumPy-style docstrings in the source code.

Encoding API

eb_features.panel.encoders

Encoding utilities for panel feature engineering.

This module provides small, stateless helpers to make feature matrices numeric.

Current scope

The panel feature engineering pipeline produces a feature frame that may contain a mixture of numeric and non-numeric columns (e.g., entity metadata strings). Many downstream estimators expect purely numeric arrays. The helper in this module encodes non-numeric columns using pandas categorical codes.

Important

Categorical codes are stable only within the provided DataFrame. Because this module is intentionally stateless (no fitted mapping is persisted), codes may differ between training and inference if category sets or ordering differ.

For production modeling pipelines that require consistent encodings across datasets, consider: - pre-encoding categoricals upstream (one-hot, target encoding, etc.), or - introducing a fitted encoder with persisted category mappings.

encode_non_numeric_as_category_codes(feature_frame, *, columns=None, dtype='int32')

Encode non-numeric feature columns as categorical codes.

Parameters:

Name Type Description Default
feature_frame DataFrame

Feature DataFrame whose columns will be encoded as needed.

required
columns Iterable[str] | None

Columns to consider for encoding. If None, all columns are considered.

None
dtype str

Output dtype for encoded columns. (Booleans are converted to 0/1 and cast to this dtype; categorical codes are integers cast to this dtype.)

"int32"

Returns:

Type Description
DataFrame

Copy of feature_frame where: - boolean columns are converted to {0, 1} - non-numeric, non-boolean columns are replaced by categorical integer codes

Notes
  • Missing values in non-numeric columns are assigned the code -1 by pandas.
  • Category ordering is made deterministic by sorting observed values by their string representation before assigning codes.