Utils API¶
The utils module provides utilities for entity key generation and transformation.
Functions¶
mlforge.utils.surrogate_key ¶
Generate a surrogate key by hashing column values.
Concatenates column values with null handling, then produces a hash. Useful for creating stable identifiers from natural keys.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*columns
|
str
|
Column names to include in the hash |
()
|
Returns:
| Type | Description |
|---|---|
Expr
|
Polars expression that produces a string hash |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no columns are provided |
Example
df.with_columns( surrogate_key("first_name", "last_name", "dob").alias("user_id") )
Source code in src/mlforge/utils.py
mlforge.utils.entity_key ¶
Create a reusable entity key transformation function.
Returns a function that adds a surrogate key column to a DataFrame by hashing the specified source columns. Useful for defining entity relationships and passing to get_training_data().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*columns
|
str
|
Source column names to hash |
()
|
alias
|
str
|
Name for the generated surrogate key column |
required |
Returns:
| Type | Description |
|---|---|
EntityKeyTransform
|
Transform function compatible with df.pipe() and get_training_data() |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no columns provided or alias is empty |
Example
Define reusable transform¶
with_user_id = entity_key("first", "last", "dob", alias="user_id")
Apply to DataFrame¶
users_df = df.pipe(with_user_id)
Use in feature retrieval¶
training_data = get_training_data( features=["user_age"], entity_df=transactions, entities=[with_user_id] )
Source code in src/mlforge/utils.py
Protocols¶
mlforge.utils.EntityKeyTransform ¶
Bases: Protocol
Protocol for entity key transformation functions.
Defines the interface for functions created by entity_key() that add surrogate keys to DataFrames. Includes metadata attributes for column tracking.
Attributes:
| Name | Type | Description |
|---|---|---|
_entity_key_columns |
tuple[str, ...]
|
Source columns used to generate the key |
_entity_key_alias |
str
|
Name of the generated key column |