Manifest API¶
The manifest module provides dataclasses and utilities for tracking feature metadata.
Overview¶
When features are built, mlforge automatically captures metadata about each feature, including:
- Feature configuration (keys, timestamp, interval)
- Storage details (path, row count)
- Column information (names, types, aggregations)
- Build timestamp and source data
This metadata is stored in .meta.json files alongside the feature parquet files and can be queried using the CLI or programmatically.
Dataclasses¶
mlforge.manifest.ColumnMetadata
dataclass
¶
Metadata for a single column in a feature.
For columns derived from Rolling metrics, captures the source column, aggregation type, and window size. For other columns, captures dtype. For base columns, captures validator information.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Column name in the output |
dtype |
str | None
|
Data type string (e.g., "Int64", "Float64") |
input |
str | None
|
Source column name for aggregations |
agg |
str | None
|
Aggregation type (count, mean, sum, etc.) |
window |
str | None
|
Time window for rolling aggregations (e.g., "7d") |
validators |
list[dict[str, Any]] | None
|
List of validator specifications applied to this column |
Source code in src/mlforge/manifest.py
from_dict
classmethod
¶
Create from dictionary.
Source code in src/mlforge/manifest.py
mlforge.manifest.FeatureMetadata
dataclass
¶
Metadata for a single materialized feature.
Captures all information about a feature from both its definition and the results of building it.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Feature identifier |
path |
str
|
Storage path for the parquet file |
entity |
str
|
Primary entity key (first key in keys list) |
keys |
list[str]
|
All entity key columns |
source |
str
|
Source data file path |
row_count |
int
|
Number of rows in materialized feature |
updated_at |
str
|
ISO 8601 timestamp of last build (renamed from last_updated in v0.5.0) |
version |
str
|
Semantic version string (v0.5.0) |
created_at |
str
|
ISO 8601 timestamp when version was first created (v0.5.0) |
content_hash |
str
|
Hash of data.parquet for integrity verification (v0.5.0) |
schema_hash |
str
|
Hash of column names + dtypes for change detection (v0.5.0) |
config_hash |
str
|
Hash of keys, timestamp, interval, metrics config (v0.5.0) |
source_hash |
str
|
Hash of source data file for reproducibility verification (v0.5.0) |
timestamp |
str | None
|
Timestamp column for temporal features |
interval |
str | None
|
Time interval for rolling aggregations |
columns |
list[ColumnMetadata]
|
Base column metadata (from feature function before metrics) |
features |
list[ColumnMetadata]
|
Generated feature column metadata (from metrics) |
tags |
list[str]
|
Feature grouping tags |
description |
str | None
|
Human-readable description |
change_summary |
dict[str, Any] | None
|
Documents why version was bumped (v0.5.0) |
Source code in src/mlforge/manifest.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
from_dict
classmethod
¶
Create from dictionary.
Source code in src/mlforge/manifest.py
to_dict ¶
Convert to dictionary for JSON serialization.
Source code in src/mlforge/manifest.py
mlforge.manifest.Manifest
dataclass
¶
Consolidated manifest containing all feature metadata.
Aggregates individual feature metadata into a single view. Generated on demand from per-feature .meta.json files.
Attributes:
| Name | Type | Description |
|---|---|---|
version |
str
|
Schema version for compatibility |
generated_at |
str
|
ISO 8601 timestamp when manifest was generated |
features |
dict[str, FeatureMetadata]
|
Mapping of feature names to their metadata |
Source code in src/mlforge/manifest.py
add_feature ¶
from_dict
classmethod
¶
Create from dictionary.
Source code in src/mlforge/manifest.py
get_feature ¶
remove_feature ¶
to_dict ¶
Convert to dictionary for JSON serialization.
Source code in src/mlforge/manifest.py
Functions¶
mlforge.manifest.derive_column_metadata ¶
derive_column_metadata(
feature: Feature,
schema: dict[str, str],
base_schema: dict[str, str] | None = None,
schema_source: str = "polars",
) -> tuple[list[ColumnMetadata], list[ColumnMetadata]]
Derive column metadata from feature definition and result schema.
Separates base columns (keys, timestamp, other non-metric columns) from generated feature columns (rolling metrics). Uses base_schema when available for accurate separation, falls back to regex parsing for backward compatibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
feature
|
Feature
|
The Feature definition object |
required |
schema
|
dict[str, str]
|
Dictionary mapping column names to dtype strings (final schema after metrics) |
required |
base_schema
|
dict[str, str] | None
|
Dictionary mapping column names to dtype strings (before metrics). When provided, enables accurate column separation. Defaults to None. |
None
|
schema_source
|
str
|
Engine source for type normalization ("polars" or "duckdb") |
'polars'
|
Returns:
| Type | Description |
|---|---|
tuple[list[ColumnMetadata], list[ColumnMetadata]]
|
Tuple of (base_columns, feature_columns) where: - base_columns: Keys, timestamp, and other non-metric columns with validators - feature_columns: Rolling metric columns with aggregation metadata |
Source code in src/mlforge/manifest.py
mlforge.manifest.write_metadata_file ¶
Write feature metadata to a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to write the .meta.json file |
required |
metadata
|
FeatureMetadata
|
FeatureMetadata to serialize |
required |
Source code in src/mlforge/manifest.py
mlforge.manifest.read_metadata_file ¶
Read feature metadata from a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the .meta.json file |
required |
Returns:
| Type | Description |
|---|---|
FeatureMetadata | None
|
FeatureMetadata if file exists and is valid, None otherwise |
Source code in src/mlforge/manifest.py
mlforge.manifest.write_manifest_file ¶
Write consolidated manifest to a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to write the manifest.json file |
required |
manifest
|
Manifest
|
Manifest to serialize |
required |
Source code in src/mlforge/manifest.py
mlforge.manifest.read_manifest_file ¶
Read consolidated manifest from a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the manifest.json file |
required |
Returns:
| Type | Description |
|---|---|
Manifest | None
|
Manifest if file exists and is valid, None otherwise |
Source code in src/mlforge/manifest.py
Usage Examples¶
Reading Feature Metadata¶
from mlforge import LocalStore
store = LocalStore("./feature_store")
# Read metadata for a specific feature
metadata = store.read_metadata("user_spend")
if metadata:
print(f"Feature: {metadata.name}")
print(f"Rows: {metadata.row_count:,}")
print(f"Last updated: {metadata.last_updated}")
print(f"Columns: {len(metadata.columns)}")
Listing All Metadata¶
from mlforge import LocalStore
store = LocalStore("./feature_store")
# Get all feature metadata
all_metadata = store.list_metadata()
for meta in all_metadata:
print(f"{meta.name}: {meta.row_count:,} rows")
Creating a Consolidated Manifest¶
from mlforge import LocalStore
from mlforge.manifest import Manifest, write_manifest_file
from datetime import datetime, timezone
store = LocalStore("./feature_store")
# Create manifest from all features
manifest = Manifest(
generated_at=datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
)
for meta in store.list_metadata():
manifest.add_feature(meta)
# Write to file
write_manifest_file("manifest.json", manifest)
Inspecting Column Metadata¶
from mlforge import LocalStore
store = LocalStore("./feature_store")
metadata = store.read_metadata("user_spend")
if metadata and metadata.columns:
for col in metadata.columns:
if col.agg:
# Rolling aggregation column
print(f"{col.name}: {col.agg}({col.input}) over {col.window}")
else:
# Regular column
print(f"{col.name}: {col.dtype}")
Metadata Schema¶
Feature Metadata JSON¶
Per-feature metadata is stored in _metadata/<feature_name>.meta.json:
{
"name": "merchant_spend",
"path": "merchant_spend.parquet",
"entity": "merchant_id",
"keys": ["merchant_id"],
"source": "data/transactions.parquet",
"row_count": 15482,
"last_updated": "2024-01-16T08:30:00Z",
"timestamp": "transaction_date",
"interval": "1d",
"columns": [
{"name": "merchant_id", "dtype": "Utf8"},
{"name": "transaction_date", "dtype": "Date"},
{
"name": "amt__count__7d",
"dtype": "UInt32",
"input": "amt",
"agg": "count",
"window": "7d"
},
{
"name": "amt__sum__7d",
"dtype": "Float64",
"input": "amt",
"agg": "sum",
"window": "7d"
}
],
"tags": ["merchants"],
"description": "Merchant spend aggregations"
}
Consolidated Manifest JSON¶
The manifest consolidates all feature metadata into a single file:
{
"version": "1.0",
"generated_at": "2024-01-16T08:30:00Z",
"features": {
"merchant_spend": {
"name": "merchant_spend",
"path": "merchant_spend.parquet",
...
},
"user_spend": {
"name": "user_spend",
"path": "user_spend.parquet",
...
}
}
}
Column Naming Convention¶
For features with Rolling metrics, columns follow this pattern:
Examples:
user_spend__amt__sum__1d__7d- Sum ofamtover 7-day window with 1-day intervaluser_spend__amt__count__1d__30d- Count ofamtover 30-day window with 1-day interval
The derive_column_metadata() function parses these column names to extract:
input: Source column name (amt)agg: Aggregation type (sum,count, etc.)window: Time window (7d,30d, etc.)
See Also¶
- CLI Reference -
inspectandmanifestcommands - Building Features - How metadata is generated during builds
- Store API - Store metadata methods