CLI Reference¶
mlforge provides a command-line interface for building and listing features.
Installation Verification¶
Check that the CLI is installed:
Commands¶
build¶
Materialize features to offline storage.
Arguments¶
TARGET(optional): Path to definitions file. Auto-discoversdefinitions.pyif not specified.
Options¶
--features NAMES: Comma-separated list of feature names to build. Defaults to all features.--tags TAGS: Comma-separated list of feature tags to build. Mutually exclusive with--features.--force,-f: Overwrite existing features. Defaults toFalse.--online: Build to online store (e.g., Redis) instead of offline store. Defaults toFalse.--no-preview: Disable feature preview output. Defaults toFalse(preview enabled).--preview-rows N: Number of preview rows to display. Defaults to5.--verbose,-v: Enable debug logging. Defaults toFalse.
Examples¶
Build all features:
Build all features (auto-discovers definitions.py):
Build specific features:
Build features by tag:
Force rebuild all features:
Build with verbose logging:
Build without preview:
Custom preview size:
Build to online store (Redis):
This extracts the latest value per entity and writes to the configured online store.
Output¶
The command displays:
- Progress messages for each feature
- Preview of materialized data (unless
--no-preview) - Summary of built features
- Storage paths for each feature
Example output:
Materializing user_total_spend
┌─────────┬─────────────┐
│ user_id │ total_spend │
├─────────┼─────────────┤
│ u1 │ 150.0 │
│ u2 │ 250.0 │
│ u3 │ 100.0 │
└─────────┴─────────────┘
Materializing user_avg_spend
┌─────────┬───────────┐
│ user_id │ avg_spend │
├─────────┼───────────┤
│ u1 │ 50.0 │
│ u2 │ 83.3 │
│ u3 │ 100.0 │
└─────────┴───────────┘
Built features:
user_total_spend -> feature_store/user_total_spend.parquet
user_avg_spend -> feature_store/user_avg_spend.parquet
Built 2 features
Error Handling¶
DefinitionsLoadError: If the definitions file cannot be loaded:
FeatureMaterializationError: If a feature function fails:
$ mlforge build definitions.py
Error: Feature 'user_age' failed: Feature function returned None
Hint: Make sure your feature function returns a DataFrame.
inspect¶
Display detailed metadata for a specific feature.
Arguments¶
FEATURE_NAME(required): Name of the feature to inspectTARGET(optional): Path to definitions file. Auto-discoversdefinitions.pyif not specified.
Examples¶
Inspect a specific feature:
Inspect with custom definitions file:
Output¶
Displays detailed feature metadata including:
- Feature configuration (entity, keys, timestamp, interval)
- Storage details (path, source, row count)
- Column information with types and aggregations
- Tags and description
- Last build timestamp
Example output:
┌─ Feature: user_spend ──────────────────────────────────────┐
│ Total spend features for users │
│ │
│ Path: ./feature_store/user_spend.parquet │
│ Source: data/transactions.parquet │
│ Entity: user_id │
│ Keys: user_id │
│ Timestamp: transaction_date │
│ Interval: 1d │
│ Tags: users, spending │
│ Row Count: 50,000 │
│ Last Updated: 2024-01-16T08:30:00Z │
└──────────────────────────────────────────────────────────────┘
Columns
┌──────────────────────────┬────────┬────────┬─────────────┬────────┐
│ Name │ Type │ Input │ Aggregation │ Window │
├──────────────────────────┼────────┼────────┼─────────────┼────────┤
│ user_id │ Utf8 │ - │ - │ - │
│ transaction_date │ Date │ - │ - │ - │
│ amt__sum__7d │ Float64│ amt │ sum │ 7d │
│ amt__count__7d │ UInt32 │ amt │ count │ 7d │
│ amt__mean__30d │ Float64│ amt │ mean │ 30d │
└──────────────────────────┴────────┴────────┴─────────────┴────────┘
Error Handling¶
No metadata found: If the feature hasn't been built yet:
$ mlforge inspect user_spend
Error: No metadata found for feature 'user_spend'.
Run 'mlforge build' to generate metadata.
manifest¶
Display or regenerate the feature store manifest.
Arguments¶
TARGET(optional): Path to definitions file. Auto-discoversdefinitions.pyif not specified.
Options¶
--regenerate: Rebuild manifest.json from individual .meta.json files. Defaults toFalse.
Examples¶
Display manifest summary:
Regenerate consolidated manifest file:
With custom definitions file:
Output¶
Without --regenerate - Displays a summary table:
Feature Store Manifest
┌──────────────────┬──────────┬────────┬─────────┬─────────────────────┐
│ Feature │ Entity │ Rows │ Columns │ Last Updated │
├──────────────────┼──────────┼────────┼─────────┼─────────────────────┤
│ merchant_spend │ merchant │ 15,482 │ 8 │ 2024-01-16T08:30:00│
│ user_spend │ user_id │ 50,000 │ 12 │ 2024-01-16T08:25:00│
│ account_spend │ account │ 8,234 │ 10 │ 2024-01-16T08:28:00│
└──────────────────┴──────────┴────────┴─────────┴─────────────────────┘
With --regenerate - Creates manifest.json and displays confirmation:
The consolidated manifest is written to:
- LocalStore:
<store_path>/manifest.json - S3Store:
s3://<bucket>/<prefix>/manifest.json
Error Handling¶
No features found: If no features have been built:
validate¶
Run validation on features without building them.
Arguments¶
TARGET(optional): Path to definitions file. Auto-discoversdefinitions.pyif not specified.
Options¶
--features NAMES: Comma-separated list of feature names to validate. Defaults to all features.--tags TAGS: Comma-separated list of feature tags to validate. Mutually exclusive with--features.--verbose,-v: Enable debug logging. Defaults toFalse.
Examples¶
Validate all features:
Validate specific features:
Validate features by tag:
Output¶
Displays validation results for each feature:
Validating merchant_transactions...
✓ All validations passed for merchant_transactions
Validating user_transactions...
✗ Validation failed for user_transactions
- Column 'amount': 3 values < 0 (greater_than_or_equal(0))
Validated 2 features (1 passed, 1 failed)
Error Handling¶
ValidationError: If any feature fails validation:
The command exits with code 1 if any validations fail, making it suitable for CI/CD pipelines.
sync¶
Rebuild features from metadata files (for Git-based collaboration).
LocalStore Only
The sync command only works with LocalStore. Cloud stores (S3, etc.) already share data between teammates, so syncing is not needed.
Arguments¶
TARGET(optional): Path to definitions file. Auto-discoversdefinitions.pyif not specified.
Options¶
--features NAMES: Comma-separated list of feature names to sync. Defaults to all features with missing data.--dry-run: Show what would be synced without actually rebuilding. Defaults toFalse.--force: Rebuild even if source data hash differs. Defaults toFalse.--verbose,-v: Enable debug logging. Defaults toFalse.
How It Works¶
The sync command helps teams collaborate on feature definitions via Git:
- Metadata is committed to Git:
.meta.jsonand_latest.jsonfiles - Data files are ignored:
data.parquetfiles are excluded via auto-generated.gitignore - Teammates rebuild locally: Run
mlforge syncto recreate data from metadata
For each feature, sync will:
- Check if metadata exists but data file is missing
- Compute hash of current source data file
- Compare with
source_hashstored in metadata - If hashes match → rebuild data from feature function
- If hashes differ → error (use
--forceto override)
Examples¶
Preview what would be synced:
Sync all features with missing data:
Sync specific features:
Force sync even if source data changed:
With custom definitions file:
Output¶
Dry-run mode - Shows what would be synced:
[Dry Run] Would sync 2 features:
- user_spend (v1.2.0)
- merchant_spend (v2.0.1)
Run without --dry-run to sync
Normal mode - Rebuilds features and shows progress:
Syncing user_spend (v1.2.0)...
✓ Source hash matches (abc123def456)
✓ Rebuilt user_spend
Syncing merchant_spend (v2.0.1)...
✓ Source hash matches (789abc012def)
✓ Rebuilt merchant_spend
Synced 2 features
No features to sync:
Error Handling¶
Source data changed: If source hash differs from metadata:
$ mlforge sync --features user_spend
Error: Source data has changed for feature 'user_spend' (v1.2.0)
Expected hash: abc123def456
Current hash: xyz789abc012
This means the source data file has been modified since this version
was built. Rebuilding with different source data may produce different
results.
Options:
- Restore the original source data file
- Use --force to rebuild anyway (creates new version)
- Check with your team if source data should have changed
Not a LocalStore: If using S3Store or other cloud storage:
$ mlforge sync
Error: Sync only works with LocalStore.
Cloud stores (S3Store) already share data between teammates.
Missing source file: If the source data file doesn't exist:
$ mlforge sync --features user_spend
Error: Source file not found: data/transactions.parquet
Cannot verify source hash or rebuild feature.
Use Cases¶
After pulling changes from Git:
Setting up new development environment:
Checking if features are out of sync:
When NOT to Use Sync¶
- Cloud stores: Data is already shared via S3/GCS
- Source data changed intentionally: Use
mlforge build --forceto create a new version - Initial setup: Use
mlforge buildfor first-time feature creation
versions¶
List all versions of a feature.
Arguments¶
FEATURE_NAME(required): Name of the feature to list versions forTARGET(optional): Path to definitions file. Auto-discoversdefinitions.pyif not specified.
Examples¶
List versions of a feature:
Output¶
Displays a table of all versions:
Versions of user_spend
┌─────────┬─────────────────────┬─────────────────────┬────────────┐
│ Version │ Created │ Updated │ Rows │
├─────────┼─────────────────────┼─────────────────────┼────────────┤
│ 1.0.0 │ 2024-01-10T08:00:00│ 2024-01-10T08:00:00│ 50,000 │
│ 1.1.0 │ 2024-01-15T10:30:00│ 2024-01-15T10:30:00│ 52,500 │
│ 2.0.0 │ 2024-01-20T14:00:00│ 2024-01-20T14:00:00│ 55,000 │
└─────────┴─────────────────────┴─────────────────────┴────────────┘
Latest: 2.0.0
list¶
Display all registered features in a table.
Arguments¶
TARGET(optional): Path to definitions file. Auto-discoversdefinitions.pyif not specified.
Options¶
--tags TAGS: Comma-separated list of tags to filter features by. Defaults to showing all features.
Examples¶
List all features:
List from current directory (auto-discovers definitions.py):
List features by tag:
Output¶
Displays a formatted table with feature metadata:
┌──────────────────────┬──────────────────┬──────────────────────────┬──────────────┬───────────────────────────┐
│ Name │ Keys │ Source │ Tags │ Description │
├──────────────────────┼──────────────────┼──────────────────────────┼──────────────┼───────────────────────────┤
│ user_total_spend │ [user_id] │ data/transactions.parquet│ user_metrics │ Total spend by user │
│ user_spend_mean_30d │ [user_id] │ data/transactions.parquet│ user_metrics │ 30d rolling avg spend │
│ merchant_total_spend │ [merchant_id] │ data/transactions.parquet│ - │ Total spend by merchant │
└──────────────────────┴──────────────────┴──────────────────────────┴──────────────┴───────────────────────────┘
Global Options¶
These options work with any command:
--verbose, -v¶
Enable debug logging:
Debug output includes:
- Module loading details
- Feature registration logs
- Source data loading information
- Storage operations
Example verbose output:
DEBUG: Loading definitions from definitions.py
DEBUG: Registered feature: user_total_spend
DEBUG: Registered feature: user_avg_spend
INFO: Materializing user_total_spend
DEBUG: Loading source: data/transactions.parquet
DEBUG: Writing to: feature_store/user_total_spend.parquet
Definitions File¶
The TARGET parameter specifies a Python file containing a Definitions object. If not provided, mlforge will automatically search for definitions.py in your project directory.
Auto-Discovery¶
When TARGET is omitted, mlforge searches for definitions.py:
# Automatically finds definitions.py in current directory or subdirectories
mlforge build
mlforge list
The search starts from the project root (identified by pyproject.toml, .git, etc.) and looks recursively, skipping common directories like .venv and node_modules.
Structure¶
# definitions.py
from mlforge import Definitions, LocalStore
import features
defs = Definitions(
name="my-project",
features=[features],
offline_store=LocalStore("./feature_store")
)
Naming Convention¶
The Definitions object must be named defs:
# Good
defs = Definitions(...)
# Bad - won't be found
definitions = Definitions(...)
feature_store = Definitions(...)
Module vs. File Path¶
You can use either a file path or a module path:
# File path (recommended)
mlforge build definitions.py
mlforge build path/to/definitions.py
# Module path (if installed)
mlforge build mypackage.definitions
Exit Codes¶
The CLI uses these exit codes:
0: Success1: Error (load failure, materialization failure, etc.)
Use in scripts:
mlforge build definitions.py
if [ $? -eq 0 ]; then
echo "Build succeeded"
else
echo "Build failed"
exit 1
fi
Environment Variables¶
Currently, mlforge does not use environment variables for configuration. All settings are specified via:
- Command-line options
- Definitions file configuration
Shell Completion¶
mlforge uses cyclopts for CLI parsing. Shell completion may be supported in future versions.
Integration Examples¶
Makefile¶
.PHONY: features
features:
mlforge build definitions.py --force
.PHONY: list-features
list-features:
mlforge list definitions.py
.PHONY: build-prod
build-prod:
mlforge build definitions.py --no-preview
CI/CD Pipeline¶
# .github/workflows/build-features.yml
name: Build Features
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: "3.13"
- name: Install dependencies
run: pip install mlforge-sdk
- name: Build features
run: mlforge build definitions.py --no-preview
Pre-commit Hook¶
# .git/hooks/pre-commit
#!/bin/bash
mlforge build definitions.py --no-preview
if [ $? -ne 0 ]; then
echo "Feature build failed. Fix errors before committing."
exit 1
fi
Next Steps¶
- Building Features - Detailed build guide
- Defining Features - Feature definition reference
- API Reference - Python API documentation