Skip to main content
Version: Next

ML Feature Table

The ML Feature Table entity represents a collection of related machine learning features organized together in a feature store. Feature tables are fundamental building blocks in the ML feature management ecosystem, grouping features that share common characteristics such as the same primary keys, update cadence, or data source. They bridge the gap between raw data in data warehouses and the features consumed by ML models during training and inference.

Identity

ML Feature Tables are identified by two pieces of information:

  • The platform that hosts the feature table: this is the specific feature store or ML platform technology. Examples include feast, tecton, sagemaker, etc. See dataplatform for more details.
  • The name of the feature table: a unique identifier within the specific platform that represents this collection of features.

An example of an ML Feature Table identifier is urn:li:mlFeatureTable:(urn:li:dataPlatform:feast,users_feature_table).

The identity is defined by the mlFeatureTableKey aspect, which contains:

  • platform: A URN reference to the data platform hosting the feature table
  • name: The unique name of the feature table within that platform

Important Capabilities

Feature Table Properties

ML Feature Tables support comprehensive metadata through the mlFeatureTableProperties aspect. This aspect captures the essential characteristics of the feature table:

Description and Documentation

Feature tables can have detailed descriptions explaining their purpose, the type of features they contain, and when they should be used. This documentation helps data scientists and ML engineers discover and understand feature tables in their organization.

Python SDK: Create an ML Feature Table with properties
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_create_with_properties.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter

gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})

feature_table_urn = builder.make_ml_feature_table_urn(
feature_table_name="customer_features", platform="feast"
)

feature_table_properties = models.MLFeatureTablePropertiesClass(
description="Customer demographic and behavioral features for churn prediction models. "
"Updated daily from the customer data warehouse.",
customProperties={
"update_frequency": "daily",
"feature_count": "25",
"team": "customer-analytics",
"sla_hours": "24",
},
)

metadata_change_proposal = MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=feature_table_properties,
)

emitter.emit(metadata_change_proposal)

Features

The most important property of a feature table is the collection of features it contains. Feature tables maintain explicit relationships to their constituent features through the mlFeatures property. This creates a "Contains" relationship between the feature table and each individual feature, enabling:

  • Discovery of all features in a table
  • Navigation from feature table to individual features
  • Understanding of feature organization and grouping
Python SDK: Add features to a feature table
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_add_features.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})

feature_table_urn = builder.make_ml_feature_table_urn(
feature_table_name="customer_features", platform="feast"
)

new_feature_urns = [
builder.make_ml_feature_urn(
feature_name="customer_lifetime_value",
feature_table_name="customer_features",
),
builder.make_ml_feature_urn(
feature_name="days_since_last_purchase",
feature_table_name="customer_features",
),
builder.make_ml_feature_urn(
feature_name="total_purchase_count",
feature_table_name="customer_features",
),
]

# Read existing features to avoid overwriting them
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))
feature_table_properties = graph.get_aspect(
entity_urn=feature_table_urn,
aspect_type=models.MLFeatureTablePropertiesClass,
)

if feature_table_properties and feature_table_properties.mlFeatures:
existing_features = feature_table_properties.mlFeatures
all_feature_urns = list(set(existing_features + new_feature_urns))
else:
all_feature_urns = new_feature_urns

updated_properties = models.MLFeatureTablePropertiesClass(
mlFeatures=all_feature_urns,
description="Customer features with newly added purchase metrics",
)

metadata_change_proposal = MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=updated_properties,
)

emitter.emit(metadata_change_proposal)

Primary Keys

Feature tables define one or more primary keys that uniquely identify each row in the table. These primary keys are critical for:

  • Joining features with training datasets
  • Looking up feature values during model inference
  • Understanding the entity granularity of the features (e.g., user-level, transaction-level)

When multiple primary keys are specified, they act as a composite key. The mlPrimaryKeys property creates a "KeyedBy" relationship to each primary key entity.

Python SDK: Add primary keys to a feature table
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_add_primary_keys.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})

feature_table_urn = builder.make_ml_feature_table_urn(
feature_table_name="customer_features", platform="feast"
)

primary_key_urns = [
builder.make_ml_primary_key_urn(
feature_table_name="customer_features",
primary_key_name="customer_id",
)
]

# Read existing properties to preserve other fields
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))
feature_table_properties = graph.get_aspect(
entity_urn=feature_table_urn,
aspect_type=models.MLFeatureTablePropertiesClass,
)

if feature_table_properties:
feature_table_properties.mlPrimaryKeys = primary_key_urns
updated_properties = feature_table_properties
else:
updated_properties = models.MLFeatureTablePropertiesClass(
mlPrimaryKeys=primary_key_urns,
)

metadata_change_proposal = MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=updated_properties,
)

emitter.emit(metadata_change_proposal)

# Also create the primary key entity with its properties
dataset_urn = builder.make_dataset_urn(
name="customers", platform="snowflake", env="PROD"
)
primary_key_urn = primary_key_urns[0]

primary_key_properties = models.MLPrimaryKeyPropertiesClass(
description="Unique identifier for customers in the system",
dataType="TEXT",
sources=[dataset_urn],
)

pk_metadata_change_proposal = MetadataChangeProposalWrapper(
entityUrn=primary_key_urn,
aspect=primary_key_properties,
)

emitter.emit(pk_metadata_change_proposal)

Custom Properties

Feature tables support custom properties through the customProperties field, allowing you to capture platform-specific or organization-specific metadata that doesn't fit into the standard schema. This might include information like:

  • Update frequency or freshness SLAs
  • Feature store configuration settings
  • Cost or resource usage information
  • Team or project ownership details

Primary Key Properties

While primary keys are referenced from feature tables, they are separate entities with their own properties defined in the mlPrimaryKeyProperties aspect. Understanding primary key metadata is essential for proper feature table usage:

Data Type

Primary keys have a data type (defined using MLFeatureDataType) that specifies the type of values:

  • ORDINAL: Integer values
  • NOMINAL: Categorical values
  • BINARY: Boolean values
  • COUNT: Count values
  • TIME: Timestamp values
  • TEXT: String values
  • Other numeric types like CONTINUOUS, INTERVAL

Source Lineage

Primary keys can declare their source datasets through the sources property. This creates lineage relationships showing which upstream datasets the primary key values are derived from. This is crucial for understanding data provenance and impact analysis.

Versioning

Primary keys support versioning through the version property, allowing teams to track changes to key definitions over time and maintain multiple versions in parallel.

Tags and Glossary Terms

Like other DataHub entities, ML Feature Tables support tags and glossary terms for classification and discovery:

  • Tags (via globalTags aspect) provide lightweight categorization
  • Glossary Terms (via glossaryTerms aspect) link to business definitions and concepts

Read this blog to understand when to use tags vs terms.

Ownership

Ownership is associated with feature tables using the ownership aspect. Owners can be individuals or teams responsible for maintaining the feature table. Clear ownership is essential for:

  • Knowing who to contact with questions about features
  • Understanding responsibility for feature quality and updates
  • Governance and access control decisions

Domains and Organization

Feature tables can be organized into domains (via the domains aspect) to represent organizational structure or functional areas. This helps teams manage large feature catalogs by grouping related feature tables together.

Code Examples

Creating a Complete ML Feature Table

Here's a comprehensive example that creates a feature table with all core aspects:

Python SDK: Create a complete ML Feature Table
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_create_complete.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter

gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})

# Step 1: Create the source dataset for lineage
dataset_urn = builder.make_dataset_urn(
name="customer_transactions", platform="snowflake", env="PROD"
)

# Step 2: Create the primary key entity
primary_key_urn = builder.make_ml_primary_key_urn(
feature_table_name="transaction_features",
primary_key_name="transaction_id",
)

primary_key_properties = models.MLPrimaryKeyPropertiesClass(
description="Unique identifier for each transaction",
dataType="TEXT",
sources=[dataset_urn],
)

emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=primary_key_urn,
aspect=primary_key_properties,
)
)

# Step 3: Create the feature entities
feature_1_urn = builder.make_ml_feature_urn(
feature_name="transaction_amount",
feature_table_name="transaction_features",
)

emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_1_urn,
aspect=models.MLFeaturePropertiesClass(
description="Total amount of the transaction in USD",
dataType="CONTINUOUS",
sources=[dataset_urn],
),
)
)

feature_2_urn = builder.make_ml_feature_urn(
feature_name="is_fraud",
feature_table_name="transaction_features",
)

emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_2_urn,
aspect=models.MLFeaturePropertiesClass(
description="Binary indicator of fraudulent transaction",
dataType="BINARY",
sources=[dataset_urn],
),
)
)

# Step 4: Create the feature table with all properties
feature_table_urn = builder.make_ml_feature_table_urn(
feature_table_name="transaction_features", platform="feast"
)

feature_table_properties = models.MLFeatureTablePropertiesClass(
description="Real-time transaction features for fraud detection models",
mlFeatures=[feature_1_urn, feature_2_urn],
mlPrimaryKeys=[primary_key_urn],
customProperties={
"update_frequency": "real-time",
"team": "fraud-detection",
"critical": "true",
},
)

emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=feature_table_properties,
)
)

# Step 5: Add tags for categorization
emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=models.GlobalTagsClass(
tags=[
models.TagAssociationClass(tag=builder.make_tag_urn("Fraud Detection")),
models.TagAssociationClass(
tag=builder.make_tag_urn("Real-time Features")
),
]
),
)
)

# Step 6: Add ownership
emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=models.OwnershipClass(
owners=[
models.OwnerClass(
owner=builder.make_user_urn("data_science_team"),
type=models.OwnershipTypeClass.DATAOWNER,
)
]
),
)
)

print(f"Successfully created feature table: {feature_table_urn}")

Querying ML Feature Tables

You can retrieve ML Feature Table metadata using both the Python SDK and REST API:

Python SDK: Read an ML Feature Table
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_read.py
from datahub.sdk import DataHubClient, MLFeatureTableUrn

client = DataHubClient.from_env()

# Or get this from the UI (share -> copy urn) and use MLFeatureTableUrn.from_string(...)
mlfeature_table_urn = MLFeatureTableUrn(
"feast", "test_feature_table_all_feature_dtypes"
)

mlfeature_table_entity = client.entities.get(mlfeature_table_urn)
print("MLFeature Table name:", mlfeature_table_entity.name)
print("MLFeature Table platform:", mlfeature_table_entity.platform)
print("MLFeature Table description:", mlfeature_table_entity.description)

REST API: Fetch ML Feature Table metadata
# Get the complete entity with all aspects
curl 'http://localhost:8080/entities/urn%3Ali%3AmlFeatureTable%3A(urn%3Ali%3AdataPlatform%3Afeast,users_feature_table)'

# Get relationships to see features and primary keys
curl 'http://localhost:8080/relationships?direction=OUTGOING&urn=urn%3Ali%3AmlFeatureTable%3A(urn%3Ali%3AdataPlatform%3Afeast,users_feature_table)&types=Contains,KeyedBy'

Integration Points

ML Feature Tables integrate with multiple other entities in DataHub's metadata model:

Relationships with ML Features

Feature tables contain ML Features through the "Contains" relationship. Each feature in the mlFeatures array represents an individual feature that can be:

  • Used independently by ML models
  • Have its own metadata, lineage, and documentation
  • Shared across multiple feature tables in some feature store implementations

Navigation works bidirectionally - from feature table to features, and from features back to their parent tables.

Relationships with ML Primary Keys

Feature tables reference ML Primary Keys through the "KeyedBy" relationship. Primary keys:

  • Define the entity granularity of the feature table
  • Enable joining features with entity identifiers in training datasets
  • Can be shared across multiple feature tables when they represent the same entity type
  • Have their own lineage to upstream datasets through the sources property

Relationships with ML Models

While not directly referenced in feature table metadata, ML Models consume features through the mlFeatures property in MLModelProperties. This creates a "Consumes" lineage relationship showing which models use features from a particular feature table. This lineage enables:

  • Understanding downstream impact when feature tables change
  • Discovering which models depend on specific feature tables
  • Tracking feature usage and adoption across models

Relationships with Datasets

Feature tables have indirect relationships to datasets through two paths:

  1. Via ML Features: Individual features can declare source datasets through their sources property, creating "DerivedFrom" lineage
  2. Via ML Primary Keys: Primary keys can declare source datasets, showing where entity identifiers originate

This lineage connects the feature store to upstream data warehouses, enabling end-to-end data lineage from raw data to model predictions.

Platform Integration

Feature tables are associated with a specific data platform (e.g., Feast, Tecton) through the platform property in the key aspect. This creates a "SourcePlatform" relationship that:

  • Identifies which feature store system hosts the feature table
  • Enables filtering and organization by platform
  • Supports multi-platform feature store environments

Notable Exceptions

Feature Store Platform Variations

Different feature store platforms have different capabilities and concepts:

  • Feast: Uses the term "feature table" directly. Feature tables in Feast correspond 1:1 with this entity.
  • Tecton: Uses "feature views" and "feature services" as similar concepts. These can be modeled as feature tables.
  • SageMaker Feature Store: Uses "feature groups" which map to feature tables.
  • Databricks Feature Store: Uses "feature tables" but with database.schema.table naming patterns.

When ingesting from these platforms, ensure the naming conventions match the platform's terminology for consistency.

Custom Properties Usage

Unlike datasets which have both datasetProperties and editableDatasetProperties, feature tables have:

  • mlFeatureTableProperties: The main properties aspect (usually from ingestion)
  • editableMlFeatureTableProperties: UI-editable description only

For custom metadata, use the customProperties map in mlFeatureTableProperties rather than creating custom aspects.

Entity References vs. Entity Creation

When using the SDK to create feature tables:

  • You must create the referenced entities first: Create individual ML Features and ML Primary Keys before referencing them in the feature table
  • The feature table only stores URN references - it doesn't create the feature or primary key entities
  • If you reference non-existent entities, they will appear as broken references in the UI

This is different from some other DataHub entities where child entities can be created inline.

Lineage Considerations

Feature table lineage is typically established through the features and primary keys it contains:

  • Feature tables themselves don't have direct upstreamLineage aspects
  • Instead, lineage flows through the contained features' sources properties
  • When querying lineage, you'll need to traverse through the "Contains" relationships to find upstream datasets

This design reflects that features are the atomic unit of lineage in ML systems, while feature tables are organizational constructs.

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

  • ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
  • Searchable: This field is indexed and can be searched in DataHub's search interface
  • Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
  • → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

mlFeatureTableKey

Key for an MLFeatureTable

FieldTypeRequiredDescriptionAnnotations
platformstringData platform urn associated with the feature table→ SourcePlatform
namestringName of the feature tableSearchable

mlFeatureTableProperties

Properties associated with a MLFeatureTable

FieldTypeRequiredDescriptionAnnotations
customPropertiesmapCustom property bag.Searchable
descriptionstringDocumentation of the MLFeatureTableSearchable
mlFeaturesstring[]List of features contained in the feature tableSearchable, → Contains
mlPrimaryKeysstring[]List of primary keys in the feature table (if multiple, assumed to act as a composite key)Searchable, → KeyedBy

ownership

Ownership information of an entity.

FieldTypeRequiredDescriptionAnnotations
ownersOwner[]List of owners of the entity.
ownerTypesmapOwnership type to Owners map, populated via mutation hook.Searchable
lastModifiedAuditStampAudit stamp containing who last modified the record and when. A value of 0 in the time field indi...

institutionalMemory

Institutional memory of an entity. This is a way to link to relevant documentation and provide description of the documentation. Institutional or tribal knowledge is very important for users to leverage the entity.

FieldTypeRequiredDescriptionAnnotations
elementsInstitutionalMemoryMetadata[]List of records that represent institutional memory of an entity. Each record consists of a link,...

status

The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc. This aspect is used to represent soft deletes conventionally.

FieldTypeRequiredDescriptionAnnotations
removedbooleanWhether the entity has been removed (soft-deleted).Searchable

deprecation

Deprecation status of an entity

FieldTypeRequiredDescriptionAnnotations
deprecatedbooleanWhether the entity is deprecated.Searchable
decommissionTimelongThe time user plan to decommission this entity.
notestringAdditional information about the entity deprecation plan, such as the wiki, doc, RB.
actorstringThe user URN which will be credited for modifying this deprecation content.
replacementstring

browsePaths

Shared aspect containing Browse Paths to be indexed for an entity.

FieldTypeRequiredDescriptionAnnotations
pathsstring[]A list of valid browse paths for the entity. Browse paths are expected to be forward slash-separ...Searchable

globalTags

Tag aspect used for applying tags to an entity

FieldTypeRequiredDescriptionAnnotations
tagsTagAssociation[]Tags associated with a given entitySearchable, → TaggedWith

dataPlatformInstance

The specific instance of the data platform that this entity belongs to

FieldTypeRequiredDescriptionAnnotations
platformstringData PlatformSearchable
instancestringInstance of the data platform (e.g. db instance)Searchable (platformInstance)

browsePathsV2

Shared aspect containing a Browse Path to be indexed for an entity.

FieldTypeRequiredDescriptionAnnotations
pathBrowsePathEntry[]A valid browse path for the entity. This field is provided by DataHub by default. This aspect is ...Searchable

glossaryTerms

Related business terms information

FieldTypeRequiredDescriptionAnnotations
termsGlossaryTermAssociation[]The related business terms
auditStampAuditStampAudit stamp containing who reported the related business term

editableMlFeatureTableProperties

Properties associated with a MLFeatureTable editable from the ui

FieldTypeRequiredDescriptionAnnotations
descriptionstringDocumentation of the MLFeatureTableSearchable (editedDescription)

domains

Links from an Asset to its Domains

FieldTypeRequiredDescriptionAnnotations
domainsstring[]The Domains attached to an AssetSearchable, → AssociatedWith

applications

Links from an Asset to its Applications

FieldTypeRequiredDescriptionAnnotations
applicationsstring[]The Applications attached to an AssetSearchable, → AssociatedWith

structuredProperties

Properties about an entity governed by StructuredPropertyDefinition

FieldTypeRequiredDescriptionAnnotations
propertiesStructuredPropertyValueAssignment[]Custom property bag.

forms

Forms that are assigned to this entity to be filled out

FieldTypeRequiredDescriptionAnnotations
incompleteFormsFormAssociation[]All incomplete forms assigned to the entity.Searchable
completedFormsFormAssociation[]All complete forms assigned to the entity.Searchable
verificationsFormVerificationAssociation[]Verifications that have been applied to the entity via completed forms.Searchable

testResults

Information about a Test Result

FieldTypeRequiredDescriptionAnnotations
failingTestResult[]Results that are failingSearchable, → IsFailing
passingTestResult[]Results that are passingSearchable, → IsPassing

subTypes

Sub Types. Use this aspect to specialize a generic Entity e.g. Making a Dataset also be a View or also be a LookerExplore

FieldTypeRequiredDescriptionAnnotations
typeNamesstring[]The names of the specific types.Searchable

Common Types

These types are used across multiple aspects in this entity.

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

  • time (long): When did the resource/association/sub-resource move into the specific lifecyc...
  • actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
  • impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
  • message (string?): Additional context around how DataHub was informed of the particular change. ...

FormAssociation

Properties of an applied form.

Fields:

  • urn (string): Urn of the applied form
  • incompletePrompts (FormPromptAssociation[]): A list of prompts that are not yet complete for this form.
  • completedPrompts (FormPromptAssociation[]): A list of prompts that have been completed for this form.

TestResult

Information about a Test Result

Fields:

  • test (string): The urn of the test
  • type (TestResultType): The type of the result
  • testDefinitionMd5 (string?): The md5 of the test definition that was used to compute this result. See Test...
  • lastComputed (AuditStamp?): The audit stamp of when the result was computed, including the actor who comp...

Relationships

Outgoing

These are the relationships stored in this entity's aspects

  • SourcePlatform

    • DataPlatform via mlFeatureTableKey.platform
  • Contains

    • MlFeature via mlFeatureTableProperties.mlFeatures
  • KeyedBy

    • MlPrimaryKey via mlFeatureTableProperties.mlPrimaryKeys
  • OwnedBy

    • Corpuser via ownership.owners.owner
    • CorpGroup via ownership.owners.owner
  • ownershipType

    • OwnershipType via ownership.owners.typeUrn
  • TaggedWith

    • Tag via globalTags.tags
  • TermedWith

    • GlossaryTerm via glossaryTerms.terms.urn
  • AssociatedWith

    • Domain via domains.domains
    • Application via applications.applications
  • IsFailing

    • Test via testResults.failing
  • IsPassing

    • Test via testResults.passing

Global Metadata Model

Global Graph