Retrieval API¶
The retrieval module provides functions for retrieving features for both training (offline) and inference (online) use cases.
Functions¶
Offline Retrieval (Training)¶
mlforge.retrieval.get_training_data ¶
get_training_data(
features: list[FeatureSpec],
entity_df: DataFrame,
store: str | Path | Store = "./feature_store",
entities: list[EntityKeyTransform] | None = None,
timestamp: str | None = None,
) -> pl.DataFrame
Retrieve features and join to an entity DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
list[FeatureSpec]
|
Feature specifications. Can be: - "feature_name" - uses latest version - ("feature_name", "1.0.0") - uses specific version |
required |
entity_df
|
DataFrame
|
DataFrame with entity keys to join on |
required |
store
|
str | Path | Store
|
Path to feature store or Store instance |
'./feature_store'
|
entities
|
list[EntityKeyTransform] | None
|
Entity key transforms to apply to entity_df before joining |
None
|
timestamp
|
str | None
|
Column in entity_df to use for point-in-time joins. If provided, features with timestamps will be asof-joined. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
entity_df with feature columns joined |
Example
from mlforge import get_training_data from transactions.entities import with_user_id
transactions = pl.read_parquet("data/transactions.parquet")
Point-in-time correct training data with mixed versions¶
training_df = get_training_data( features=[ "user_spend_mean_30d", # latest version ("merchant_features", "1.0.0"), # pinned version ], entity_df=transactions, entities=[with_user_id], timestamp="trans_date_trans_time", )
Source code in src/mlforge/retrieval.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
Online Retrieval (Inference)¶
mlforge.retrieval.get_online_features ¶
get_online_features(
features: list[str],
entity_df: DataFrame,
store: OnlineStore,
entities: list[EntityKeyTransform] | None = None,
) -> pl.DataFrame
Retrieve features from an online store for inference.
Unlike get_training_data(), this function: - Always returns latest values (no point-in-time joins) - Does not support versioning (online stores hold latest only) - Uses direct key lookups instead of DataFrame joins
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
list[str]
|
List of feature names to retrieve |
required |
entity_df
|
DataFrame
|
DataFrame with entity keys (e.g., inference requests) |
required |
store
|
OnlineStore
|
Online store instance (e.g., RedisStore) |
required |
entities
|
list[EntityKeyTransform] | None
|
Optional entity key transforms to apply before lookup |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
entity_df with feature columns joined (None for missing entities) |
Example
from mlforge import get_online_features, RedisStore from myproject.entities import with_user_id
store = RedisStore(host="localhost") request_df = pl.DataFrame({"user_id": ["user_123", "user_456"]})
features_df = get_online_features( features=["user_spend"], entity_df=request_df, entities=[with_user_id], store=store, )