Version API¶
The version module provides types and functions for semantic versioning, change detection, and Git integration.
Enums¶
mlforge.version.ChangeType ¶
Bases: Enum
Type of change detected between feature versions.
Used to determine semantic version bumps: - INITIAL: First build → 1.0.0 - MAJOR: Breaking change → X+1.0.0 - MINOR: Additive change → X.Y+1.0 - PATCH: Data refresh → X.Y.Z+1
Source code in src/mlforge/version.py
Data Classes¶
mlforge.version.ChangeSummary
dataclass
¶
Summary of changes that triggered a version bump.
Stored in FeatureMetadata.change_summary for auditability.
Attributes:
| Name | Type | Description |
|---|---|---|
change_type |
ChangeType
|
Type of version bump applied |
reason |
str
|
Human-readable reason code |
details |
list[str]
|
List of specific changes (e.g., column names added/removed) |
Source code in src/mlforge/version.py
Version Parsing and Bumping¶
mlforge.version.parse_version ¶
Parse semantic version string to tuple.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
version_str
|
str
|
Version string like "1.2.3" |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, int, int]
|
Tuple of (major, minor, patch) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If version string is invalid |
Example
parse_version("1.2.3") (1, 2, 3)
Source code in src/mlforge/version.py
mlforge.version.format_version ¶
Format version tuple to string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
major
|
int
|
Major version number |
required |
minor
|
int
|
Minor version number |
required |
patch
|
int
|
Patch version number |
required |
Returns:
| Type | Description |
|---|---|
str
|
Version string like "1.2.3" |
Source code in src/mlforge/version.py
mlforge.version.bump_version ¶
Increment version by change type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
current
|
str
|
Current version string (e.g., "1.2.3") |
required |
change_type
|
ChangeType
|
Type of version increment |
required |
Returns:
| Type | Description |
|---|---|
str
|
New version string |
Raises:
| Type | Description |
|---|---|
ValueError
|
If change_type is INITIAL (use "1.0.0" directly) |
Example
bump_version("1.2.3", ChangeType.MINOR) "1.3.0" bump_version("1.2.3", ChangeType.MAJOR) "2.0.0"
Source code in src/mlforge/version.py
mlforge.version.sort_versions ¶
Sort version strings semantically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
versions
|
list[str]
|
List of version strings |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list (oldest to newest) |
Example
sort_versions(["1.10.0", "1.2.0", "2.0.0"]) ["1.2.0", "1.10.0", "2.0.0"]
Source code in src/mlforge/version.py
mlforge.version.is_valid_version ¶
Check if a string is a valid semantic version.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
version_str
|
str
|
String to validate |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if valid version format, False otherwise |
Source code in src/mlforge/version.py
Path Construction¶
mlforge.version.versioned_data_path ¶
Get path to versioned feature data file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store_root
|
Path
|
Root path of the feature store |
required |
feature_name
|
str
|
Name of the feature |
required |
version
|
str
|
Semantic version string (e.g., "1.0.0") |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to data.parquet file |
Example
versioned_data_path(Path("./store"), "user_spend", "1.0.0") Path("./store/user_spend/1.0.0/data.parquet")
Source code in src/mlforge/version.py
mlforge.version.versioned_metadata_path ¶
Get path to versioned feature metadata file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store_root
|
Path
|
Root path of the feature store |
required |
feature_name
|
str
|
Name of the feature |
required |
version
|
str
|
Semantic version string |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to .meta.json file |
Source code in src/mlforge/version.py
mlforge.version.latest_pointer_path ¶
Get path to _latest.json pointer file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store_root
|
Path
|
Root path of the feature store |
required |
feature_name
|
str
|
Name of the feature |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to _latest.json file within feature directory |
Source code in src/mlforge/version.py
mlforge.version.feature_versions_dir ¶
Get path to feature's version directory (parent of all versions).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store_root
|
Path
|
Root path of the feature store |
required |
feature_name
|
str
|
Name of the feature |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to feature directory containing version subdirectories |
Source code in src/mlforge/version.py
Hash Computation¶
mlforge.version.compute_schema_hash ¶
Compute hash of column names and dtypes for schema change detection.
Captures structural schema changes (columns added/removed, dtype changes).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
list[ColumnMetadata]
|
List of ColumnMetadata from feature result |
required |
Returns:
| Type | Description |
|---|---|
str
|
Hex string hash (first 12 characters of SHA256) |
Source code in src/mlforge/version.py
mlforge.version.compute_config_hash ¶
compute_config_hash(
keys: list[str],
timestamp: str | None,
interval: str | None,
metrics_config: list[dict[str, Any]] | None,
) -> str
Compute hash of feature configuration for config change detection.
Captures configuration changes that affect computation (keys, timing, metrics).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
list[str]
|
Entity key columns |
required |
timestamp
|
str | None
|
Timestamp column name |
required |
interval
|
str | None
|
Rolling interval string |
required |
metrics_config
|
list[dict[str, Any]] | None
|
Serialized metrics configuration |
required |
Returns:
| Type | Description |
|---|---|
str
|
Hex string hash (first 12 characters of SHA256) |
Source code in src/mlforge/version.py
mlforge.version.compute_content_hash ¶
Compute hash of parquet file content for data change detection.
Uses file-based hashing for efficiency with large files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to parquet file |
required |
Returns:
| Type | Description |
|---|---|
str
|
Hex string hash (first 12 characters of SHA256) |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If path doesn't exist |
Source code in src/mlforge/version.py
mlforge.version.compute_source_hash ¶
Compute hash of source data file for reproducibility verification.
Uses file-based hashing for efficiency with large files. This hash
is stored in metadata and used by mlforge sync to verify that
teammates have the same source data before rebuilding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to source data file (parquet, csv, etc.) |
required |
Returns:
| Type | Description |
|---|---|
str
|
Hex string hash (first 12 characters of SHA256) |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If path doesn't exist |
Source code in src/mlforge/version.py
Change Detection¶
mlforge.version.detect_change_type ¶
detect_change_type(
previous_columns: list[str] | None,
current_columns: list[str],
previous_schema_hash: str | None,
current_schema_hash: str,
previous_config_hash: str | None,
current_config_hash: str,
) -> ChangeType
Determine version bump type based on schema and config changes.
Change detection logic (from roadmap): - No previous version → INITIAL (1.0.0) - Columns removed → MAJOR (breaking) - Dtype changed → MAJOR (breaking, detected via schema_hash) - Columns added → MINOR (additive) - Config changed → MINOR - Same schema/config → PATCH (data refresh)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
previous_columns
|
list[str] | None
|
Column names from previous version (None if first build) |
required |
current_columns
|
list[str]
|
Column names from current build |
required |
previous_schema_hash
|
str | None
|
Schema hash from previous version |
required |
current_schema_hash
|
str
|
Schema hash from current build |
required |
previous_config_hash
|
str | None
|
Config hash from previous version |
required |
current_config_hash
|
str
|
Config hash from current build |
required |
Returns:
| Type | Description |
|---|---|
ChangeType
|
ChangeType indicating required version bump |
Example
detect_change_type(None, ["a", "b"], None, "abc123", None, "def456") ChangeType.INITIAL detect_change_type( ... ["a", "b", "c"], ["a", "b"], "abc", "def", "123", "123" ... ) ChangeType.MAJOR # Column removed
Source code in src/mlforge/version.py
mlforge.version.build_change_summary ¶
build_change_summary(
change_type: ChangeType,
previous_columns: list[str] | None,
current_columns: list[str],
) -> ChangeSummary
Build structured change summary for metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
change_type
|
ChangeType
|
Detected change type |
required |
previous_columns
|
list[str] | None
|
Previous version columns |
required |
current_columns
|
list[str]
|
Current version columns |
required |
Returns:
| Type | Description |
|---|---|
ChangeSummary
|
ChangeSummary with bump_type, reason, and details |
Source code in src/mlforge/version.py
Version Discovery¶
mlforge.version.list_versions ¶
List all versions of a feature, sorted semantically.
Scans the feature directory for version subdirectories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store_root
|
Path
|
Root path of the feature store |
required |
feature_name
|
str
|
Name of the feature |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list of version strings (oldest to newest), empty if none |
Example
list_versions(Path("./store"), "user_spend") ["1.0.0", "1.0.1", "1.1.0"]
Source code in src/mlforge/version.py
mlforge.version.get_latest_version ¶
Get latest version from _latest.json pointer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store_root
|
Path
|
Root path of the feature store |
required |
feature_name
|
str
|
Name of the feature |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
Latest version string, or None if no versions exist |
Source code in src/mlforge/version.py
mlforge.version.write_latest_pointer ¶
Write _latest.json pointer file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store_root
|
Path
|
Root path of the feature store |
required |
feature_name
|
str
|
Name of the feature |
required |
version
|
str
|
Version to mark as latest |
required |
Source code in src/mlforge/version.py
mlforge.version.resolve_version ¶
Resolve version string to actual version.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store_root
|
Path
|
Root path of the feature store |
required |
feature_name
|
str
|
Name of the feature |
required |
version
|
str | None
|
Explicit version or None for latest |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
Resolved version string, or None if feature doesn't exist |
Source code in src/mlforge/version.py
Git Integration¶
mlforge.version.write_feature_gitignore ¶
Write .gitignore to feature directory if not already present.
Creates a .gitignore file that ignores data.parquet files in version subdirectories. This allows committing metadata (.meta.json, _latest.json) while excluding large data files that can be rebuilt from source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store_root
|
Path
|
Root path of the feature store |
required |
feature_name
|
str
|
Name of the feature |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if .gitignore was created, False if it already existed |