ML Feature

The ML Feature entity represents an individual input variable used by machine learning models. Features are the building blocks of feature engineering - they transform raw data into meaningful signals that ML algorithms can learn from. In modern ML systems, features are first-class citizens that can be discovered, documented, versioned, and reused across multiple models and teams.

Identity

ML Features are identified by two pieces of information:

The feature namespace: A logical grouping or namespace that organizes related features together. This often corresponds to a feature table name, domain area, or team. Examples include user_features, transaction_features, product_features.
The feature name: The unique name of the feature within its namespace. Examples include age, lifetime_value, days_since_signup.

An example of an ML Feature identifier is urn:li:mlFeature:(user_features,age).

The identity is defined by the mlFeatureKey aspect, which contains:

featureNamespace: A string representing the logical namespace or grouping for the feature
name: The unique name of the feature within that namespace

URN Structure Examples

urn:li:mlFeature:(user_features,age)
urn:li:mlFeature:(user_features,lifetime_value)
urn:li:mlFeature:(transaction_features,amount_last_7d)
urn:li:mlFeature:(product_features,price)
urn:li:mlFeature:(product_features,category_embedding)

The namespace and name together form a globally unique identifier. Multiple features can share the same namespace (representing a logical grouping), but each feature name must be unique within its namespace.

Important Capabilities

Feature Properties

ML Features support comprehensive metadata through the mlFeatureProperties aspect. This aspect captures the essential characteristics that define a feature:

Description and Documentation

Features should have clear descriptions explaining what they represent, how they're calculated, and when they should be used. Good feature documentation is critical for:

Helping data scientists discover relevant features for their models
Preventing duplicate feature creation
Understanding feature semantics and appropriate use cases
Facilitating feature reuse across teams

Python SDK: Create an ML Feature with description

# Inlined from /metadata-ingestion/examples/library/mlfeature_create_with_description.py
import os

import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter

gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
emitter = DatahubRestEmitter(gms_server=gms_server, token=token)

feature_urn = builder.make_ml_feature_urn(
    feature_table_name="user_features",
    feature_name="age",
)

metadata_change_proposal = MetadataChangeProposalWrapper(
    entityUrn=feature_urn,
    aspect=models.MLFeaturePropertiesClass(
        description="Age of the user in years, calculated as the difference between current date and birth date. "
        "This feature is commonly used for demographic segmentation and age-based personalization. "
        "Values range from 18-100 for registered users (age verification required).",
        dataType="CONTINUOUS",
    ),
)

emitter.emit(metadata_change_proposal)

Data Type

Features have a data type specified using MLFeatureDataType that describes the nature of the feature values. Understanding data type is essential for proper feature handling, preprocessing, and model training. DataHub supports a rich taxonomy of data types:

Categorical Types:

NOMINAL: Discrete values with no inherent order (e.g., country, product category)
ORDINAL: Discrete values that can be ranked (e.g., education level, rating)
BINARY: Two-category values (e.g., is_subscriber, has_clicked)

Numeric Types:

CONTINUOUS: Real-valued numeric data (e.g., height, price, temperature)
COUNT: Non-negative integer counts (e.g., number of purchases, page views)
INTERVAL: Numeric data with equal spacing (e.g., percentages, scores)

Temporal:

TIME: Time-based cyclical features (e.g., hour_of_day, day_of_week)

Unstructured:

TEXT: Text data requiring NLP processing
IMAGE: Image data
VIDEO: Video data
AUDIO: Audio data

Structured:

MAP: Dictionary or mapping structures
SEQUENCE: Lists, arrays, or sequences
SET: Unordered collections
BYTE: Binary-encoded complex objects

Special:

USELESS: High-cardinality unique values with no predictive relationship (e.g., random IDs)
UNKNOWN: Type is not yet determined

Python SDK: Create features with different data types

# Inlined from /metadata-ingestion/examples/library/mlfeature_create_with_datatypes.py
import os

import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter

gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
emitter = DatahubRestEmitter(gms_server=gms_server, token=token)

features = [
    {
        "name": "user_country",
        "description": "Country of user residence",
        "data_type": "NOMINAL",
    },
    {
        "name": "subscription_tier",
        "description": "User subscription level: free, basic, premium, enterprise",
        "data_type": "ORDINAL",
    },
    {
        "name": "is_email_verified",
        "description": "Whether the user has verified their email address",
        "data_type": "BINARY",
    },
    {
        "name": "total_purchases",
        "description": "Total number of purchases made by the user",
        "data_type": "COUNT",
    },
    {
        "name": "signup_hour",
        "description": "Hour of day when user signed up (0-23)",
        "data_type": "TIME",
    },
    {
        "name": "lifetime_value",
        "description": "Total revenue generated by user in USD",
        "data_type": "CONTINUOUS",
    },
    {
        "name": "user_bio",
        "description": "User profile biography text",
        "data_type": "TEXT",
    },
]

for feature_def in features:
    feature_urn = builder.make_ml_feature_urn(
        feature_table_name="user_features",
        feature_name=feature_def["name"],
    )

    metadata_change_proposal = MetadataChangeProposalWrapper(
        entityUrn=feature_urn,
        aspect=models.MLFeaturePropertiesClass(
            description=feature_def["description"],
            dataType=feature_def["data_type"],
        ),
    )

    emitter.emit(metadata_change_proposal)

Source Lineage

One of the most powerful capabilities of ML Features in DataHub is their ability to declare source datasets through the sources property. This creates explicit "DerivedFrom" lineage relationships between features and the upstream datasets they're computed from.

Source lineage enables:

End-to-end traceability: Track a model prediction back to the raw data that generated its features
Impact analysis: Understand which features (and downstream models) are affected when a dataset changes
Data quality: Identify the root cause when feature values appear incorrect
Compliance: Document data provenance for regulatory requirements
Discovery: Find all features derived from a particular dataset

Python SDK: Add source lineage to a feature

# Inlined from /metadata-ingestion/examples/library/mlfeature_add_source_lineage.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter

emitter = DatahubRestEmitter(gms_server="http://localhost:8080", extra_headers={})

feature_urn = builder.make_ml_feature_urn(
    feature_table_name="user_features",
    feature_name="days_since_signup",
)

users_table_urn = builder.make_dataset_urn(
    name="analytics.users",
    platform="snowflake",
    env="PROD",
)

metadata_change_proposal = MetadataChangeProposalWrapper(
    entityUrn=feature_urn,
    aspect=models.MLFeaturePropertiesClass(
        description="Number of days since the user created their account, "
        "calculated as the difference between current date and signup_date. "
        "Used for cohort analysis and lifecycle stage segmentation.",
        dataType="COUNT",
        sources=[users_table_urn],
    ),
)

emitter.emit(metadata_change_proposal)

Versioning

Features support versioning through the version property. Version information helps teams:

Track breaking changes to feature definitions or calculations
Maintain multiple feature versions during migration periods
Understand which feature version a model was trained with
Coordinate feature rollouts across training and serving systems

Python SDK: Create a versioned feature

# Inlined from /metadata-ingestion/examples/library/mlfeature_create_versioned.py
import os

import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter

gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
emitter = DatahubRestEmitter(gms_server=gms_server, token=token)

feature_urn = builder.make_ml_feature_urn(
    feature_table_name="user_features",
    feature_name="total_spend",
)

dataset_urn = builder.make_dataset_urn(
    name="analytics.orders",
    platform="snowflake",
    env="PROD",
)

metadata_change_proposal = MetadataChangeProposalWrapper(
    entityUrn=feature_urn,
    aspect=models.MLFeaturePropertiesClass(
        description="Total amount spent by user across all orders. "
        "Version 2.0 now includes refunds and returns, providing net spend instead of gross. "
        "Changed from gross spend calculation in v1.0.",
        dataType="CONTINUOUS",
        version=models.VersionTagClass(versionTag="2.0"),
        sources=[dataset_urn],
    ),
)

emitter.emit(metadata_change_proposal)

Custom Properties

Features support arbitrary key-value custom properties through the customProperties field, allowing you to capture platform-specific or organization-specific metadata:

Feature importance scores
Update frequency or freshness SLAs
Cost or compute requirements
Feature store specific configuration
Team or project ownership details

Tags and Glossary Terms

ML Features support tags and glossary terms for classification, discovery, and governance:

Tags (via globalTags aspect) provide lightweight categorization such as PII indicators, feature maturity levels, or domain areas
Glossary Terms (via glossaryTerms aspect) link features to standardized business definitions and concepts

Read this blog to understand when to use tags vs terms.

Python SDK: Add tags and terms to a feature

# Inlined from /metadata-ingestion/examples/library/mlfeature_add_tags_terms.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))

feature_urn = builder.make_ml_feature_urn(
    feature_table_name="user_features",
    feature_name="email_address",
)

current_tags = graph.get_aspect(
    entity_urn=feature_urn, aspect_type=models.GlobalTagsClass
)

tag_to_add = builder.make_tag_urn("PII")
term_to_add = builder.make_term_urn("CustomerData")

if current_tags:
    if tag_to_add not in [tag.tag for tag in current_tags.tags]:
        current_tags.tags.append(models.TagAssociationClass(tag=tag_to_add))
else:
    current_tags = models.GlobalTagsClass(
        tags=[models.TagAssociationClass(tag=tag_to_add)]
    )

emitter.emit(
    MetadataChangeProposalWrapper(
        entityUrn=feature_urn,
        aspect=current_tags,
    )
)

current_terms = graph.get_aspect(
    entity_urn=feature_urn, aspect_type=models.GlossaryTermsClass
)

if current_terms:
    if term_to_add not in [term.urn for term in current_terms.terms]:
        current_terms.terms.append(models.GlossaryTermAssociationClass(urn=term_to_add))
else:
    current_terms = models.GlossaryTermsClass(
        terms=[models.GlossaryTermAssociationClass(urn=term_to_add)],
        auditStamp=models.AuditStampClass(time=0, actor="urn:li:corpuser:datahub"),
    )

emitter.emit(
    MetadataChangeProposalWrapper(
        entityUrn=feature_urn,
        aspect=current_terms,
    )
)

Ownership

Ownership is associated with features using the ownership aspect. Clear feature ownership is essential for:

Knowing who to contact with questions about feature behavior
Understanding responsibility for feature quality and updates
Managing access control and governance
Coordinating feature changes across teams

Python SDK: Add ownership to a feature

# Inlined from /metadata-ingestion/examples/library/mlfeature_add_ownership.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))

feature_urn = builder.make_ml_feature_urn(
    feature_table_name="user_features",
    feature_name="age",
)

owner_to_add = builder.make_user_urn("data_science_team")

current_ownership = graph.get_aspect(
    entity_urn=feature_urn, aspect_type=models.OwnershipClass
)

if current_ownership:
    if owner_to_add not in [owner.owner for owner in current_ownership.owners]:
        current_ownership.owners.append(
            models.OwnerClass(
                owner=owner_to_add,
                type=models.OwnershipTypeClass.DATAOWNER,
            )
        )
else:
    current_ownership = models.OwnershipClass(
        owners=[
            models.OwnerClass(
                owner=owner_to_add,
                type=models.OwnershipTypeClass.DATAOWNER,
            )
        ]
    )

emitter.emit(
    MetadataChangeProposalWrapper(
        entityUrn=feature_urn,
        aspect=current_ownership,
    )
)

Domains and Organization

Features can be organized into domains (via the domains aspect) to represent organizational structure or functional areas. Domain organization helps teams:

Manage large feature catalogs by grouping related features
Apply consistent governance policies to related features
Facilitate discovery within organizational boundaries
Track feature adoption by business unit

Code Examples

Creating a Complete ML Feature

Here's a comprehensive example that creates a feature with all core metadata:

Python SDK: Create a complete ML Feature

# Inlined from /metadata-ingestion/examples/library/mlfeature_create_complete.py
import os

import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter

gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
emitter = DatahubRestEmitter(gms_server=gms_server, token=token)

feature_urn = builder.make_ml_feature_urn(
    feature_table_name="user_features",
    feature_name="user_engagement_score",
)

source_dataset_urn = builder.make_dataset_urn(
    name="analytics.user_activity",
    platform="snowflake",
    env="PROD",
)

owner_urn = builder.make_user_urn("ml_platform_team")
tag_urn = builder.make_tag_urn("HighValue")
term_urn = builder.make_term_urn("EngagementMetrics")
domain_urn = builder.make_domain_urn("user_analytics")

feature_properties = MetadataChangeProposalWrapper(
    entityUrn=feature_urn,
    aspect=models.MLFeaturePropertiesClass(
        description="Composite engagement score calculated from user activity metrics including "
        "page views, time on site, feature usage, and interaction frequency. "
        "Higher scores indicate more engaged users. Range: 0-100.",
        dataType="CONTINUOUS",
        version=models.VersionTagClass(versionTag="1.0"),
        sources=[source_dataset_urn],
        customProperties={
            "update_frequency": "daily",
            "calculation_method": "weighted_sum",
            "min_value": "0",
            "max_value": "100",
        },
    ),
)

ownership = MetadataChangeProposalWrapper(
    entityUrn=feature_urn,
    aspect=models.OwnershipClass(
        owners=[
            models.OwnerClass(
                owner=owner_urn,
                type=models.OwnershipTypeClass.DATAOWNER,
            )
        ]
    ),
)

tags = MetadataChangeProposalWrapper(
    entityUrn=feature_urn,
    aspect=models.GlobalTagsClass(tags=[models.TagAssociationClass(tag=tag_urn)]),
)

terms = MetadataChangeProposalWrapper(
    entityUrn=feature_urn,
    aspect=models.GlossaryTermsClass(
        terms=[models.GlossaryTermAssociationClass(urn=term_urn)],
        auditStamp=models.AuditStampClass(time=0, actor="urn:li:corpuser:datahub"),
    ),
)

domains = MetadataChangeProposalWrapper(
    entityUrn=feature_urn,
    aspect=models.DomainsClass(domains=[domain_urn]),
)

for mcp in [feature_properties, ownership, tags, terms, domains]:
    emitter.emit(mcp)

Linking Features to Feature Tables

Features are typically organized into feature tables. While the feature entity itself doesn't directly reference its parent table (the relationship is inverse - tables reference features), you can discover the containing table through relationships:

Python SDK: Find feature table containing a feature

# Inlined from /metadata-ingestion/examples/library/mlfeature_find_table.py
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.metadata.urns import MlFeatureUrn

graph = DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))

feature_urn = MlFeatureUrn(
    feature_namespace="user_features",
    name="age",
)

relationships = graph.get_related_entities(
    entity_urn=str(feature_urn),
    relationship_types=["Contains"],
    direction=DataHubGraph.RelationshipDirection.INCOMING,
)

if relationships:
    feature_table_urns = [rel.urn for rel in relationships]
    print(f"Feature {feature_urn} is contained in tables:")
    for table_urn in feature_table_urns:
        print(f"  - {table_urn}")
else:
    print(f"Feature {feature_urn} is not associated with any feature table")

Querying ML Features

You can retrieve ML Feature metadata using both the Python SDK and REST API:

Python SDK: Read an ML Feature

# Inlined from /metadata-ingestion/examples/library/mlfeature_read.py
from datahub.sdk import DataHubClient, MLFeatureUrn

client = DataHubClient.from_env()

# Or get this from the UI (share -> copy urn) and use MLFeatureUrn.from_string(...)
mlfeature_urn = MLFeatureUrn(
    "test_feature_table_all_feature_dtypes", "test_BOOL_feature"
)

mlfeature_entity = client.entities.get(mlfeature_urn)
print("MLFeature name:", mlfeature_entity.name)
print("MLFeature table:", mlfeature_entity.feature_table_urn)
print("MLFeature description:", mlfeature_entity.description)

REST API: Fetch ML Feature metadata

# Get the complete entity with all aspects
curl 'http://localhost:8080/entities/urn%3Ali%3AmlFeature%3A(user_features,age)'

# Get relationships to see source datasets and consuming models
curl 'http://localhost:8080/relationships?direction=OUTGOING&urn=urn%3Ali%3AmlFeature%3A(user_features,age)&types=DerivedFrom'
curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3AmlFeature%3A(user_features,age)&types=Consumes'

Batch Feature Creation

When creating many features at once (e.g., from a feature store ingestion connector), batch operations improve performance:

Python SDK: Create multiple features efficiently

# Inlined from /metadata-ingestion/examples/library/mlfeature_create_batch.py
import os

import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter

gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
emitter = DatahubRestEmitter(gms_server=gms_server, token=token)

source_dataset = builder.make_dataset_urn(
    name="analytics.users",
    platform="snowflake",
    env="PROD",
)

features_config = [
    {
        "name": "age",
        "description": "User age in years",
        "data_type": "CONTINUOUS",
    },
    {
        "name": "country",
        "description": "User country of residence",
        "data_type": "NOMINAL",
    },
    {
        "name": "is_verified",
        "description": "Whether user email is verified",
        "data_type": "BINARY",
    },
    {
        "name": "total_orders",
        "description": "Total number of orders placed",
        "data_type": "COUNT",
    },
    {
        "name": "signup_hour",
        "description": "Hour of day user signed up",
        "data_type": "TIME",
    },
]

mcps = []

for feature_config in features_config:
    feature_urn = builder.make_ml_feature_urn(
        feature_table_name="user_features",
        feature_name=feature_config["name"],
    )

    mcp = MetadataChangeProposalWrapper(
        entityUrn=feature_urn,
        aspect=models.MLFeaturePropertiesClass(
            description=feature_config["description"],
            dataType=feature_config["data_type"],
            sources=[source_dataset],
        ),
    )
    mcps.append(mcp)

for mcp in mcps:
    emitter.emit(mcp)

print(f"Created {len(mcps)} features in feature namespace 'user_features'")

Integration Points

ML Features integrate with multiple other entities in DataHub's metadata model to form a comprehensive ML metadata ecosystem:

Relationships with Datasets

Features declare their source datasets through the sources property in mlFeatureProperties. This creates a "DerivedFrom" lineage relationship that:

Shows which raw data tables feed into each feature
Enables impact analysis when datasets change
Provides end-to-end lineage from data warehouse to model predictions
Supports data quality root cause analysis

The relationship is directional: features point to their source datasets. Multiple features can derive from the same dataset, and a single feature can derive from multiple datasets if it's computed via a join or union.

Relationships with ML Models

ML Models consume features through the mlFeatures property in MLModelProperties. This creates a "Consumes" lineage relationship showing:

Which features are used by each model
Which models depend on a particular feature
The complete set of inputs for model training and inference
Impact analysis for feature changes on downstream models

This relationship enables critical use cases like:

Feature usage tracking: Identify unused features that can be deprecated
Model impact analysis: Find all models affected when a feature changes
Feature importance correlation: Link model performance to feature changes
Compliance documentation: Show exactly what data influences model decisions

Python SDK: Link features to a model

# Inlined from /metadata-ingestion/examples/library/mlfeature_add_to_mlmodel.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.metadata.schema_classes import MLModelPropertiesClass

gms_endpoint = "http://localhost:8080"
# Create an emitter to DataHub over REST
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})

model_urn = builder.make_ml_model_urn(
    model_name="my-test-model", platform="science", env="PROD"
)
feature_urns = [
    builder.make_ml_feature_urn(
        feature_name="my-feature3", feature_table_name="my-feature-table"
    ),
]

# This code concatenates the new features with the existing features in the model
# If you want to replace all existing features with only the new ones, you can comment out this line.
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))
model_properties = graph.get_aspect(
    entity_urn=model_urn, aspect_type=MLModelPropertiesClass
)
if model_properties:
    current_features = model_properties.mlFeatures
    print("current_features:", current_features)
    if current_features:
        feature_urns += current_features

model_properties = models.MLModelPropertiesClass(mlFeatures=feature_urns)

# MCP creation
metadata_change_proposal = MetadataChangeProposalWrapper(
    entityUrn=model_urn,
    aspect=model_properties,
)

# Emit metadata!
emitter.emit(metadata_change_proposal)

Relationships with ML Feature Tables

Feature tables contain ML Features through the "Contains" relationship. The feature table's mlFeatures property lists the URNs of features it contains. This relationship:

Organizes features into logical groupings
Enables navigation from table to features and back
Represents the physical or logical organization in the feature store
Helps discover related features that share characteristics

While features don't explicitly store their parent table, you can discover it by querying incoming "Contains" relationships.

Python SDK: Add a feature to a feature table

# Inlined from /metadata-ingestion/examples/library/mlfeature_add_to_mlfeature_table.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.metadata.schema_classes import MLFeatureTablePropertiesClass

gms_endpoint = "http://localhost:8080"
# Create an emitter to DataHub over REST
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})

feature_table_urn = builder.make_ml_feature_table_urn(
    feature_table_name="my-feature-table", platform="feast"
)
feature_urns = [
    builder.make_ml_feature_urn(
        feature_name="my-feature2", feature_table_name="my-feature-table"
    ),
]

# This code concatenates the new features with the existing features in the feature table.
# If you want to replace all existing features with only the new ones, you can comment out this line.
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))
feature_table_properties = graph.get_aspect(
    entity_urn=feature_table_urn, aspect_type=MLFeatureTablePropertiesClass
)
if feature_table_properties:
    current_features = feature_table_properties.mlFeatures
    print("current_features:", current_features)
    if current_features:
        feature_urns += current_features

feature_table_properties = models.MLFeatureTablePropertiesClass(mlFeatures=feature_urns)
# MCP createion
metadata_change_proposal = MetadataChangeProposalWrapper(
    entityUrn=feature_table_urn,
    aspect=feature_table_properties,
)

# Emit metadata! This is a blocking call
emitter.emit(metadata_change_proposal)

Platform Integration

Features are often associated with a platform through their namespace or through related entities (feature tables). While features themselves don't have a direct platform reference in their key, the namespace often encodes platform-specific organization, and related feature tables declare their platform explicitly.

GraphQL Resolvers

Features are accessible through DataHub's GraphQL API via the MLFeatureType class. The GraphQL interface provides:

Search and discovery capabilities for features
Autocomplete for feature names during searches
Batch loading of feature metadata
Filtering features by properties and relationships

Notable Exceptions

Feature Namespace vs Feature Table

The featureNamespace in the feature key is a logical grouping concept and doesn't necessarily correspond 1:1 with feature tables:

In many feature stores: The namespace matches the feature table name. A feature table named user_features contains features with namespace user_features.
In some systems: The namespace might represent a broader domain or project, with multiple feature tables sharing a namespace.
Best practice: Use consistent namespace naming that aligns with your feature table organization for clarity.

When ingesting features, ensure namespace values match the corresponding feature table names for proper relationship establishment.

Feature Identity and Feature Tables

A feature's identity (featureNamespace + name) is independent of any feature table. This means:

The same feature URN could theoretically be referenced by multiple feature tables (though this is uncommon)
Changing a feature's containing table requires updating the table's metadata, not the feature itself
Features can exist without being part of any feature table (though this reduces discoverability)

Most feature stores enforce 1:1 relationships between features and feature tables to avoid ambiguity.

Versioning Strategies

There are multiple approaches to versioning features:

Option 1: Version in the URN (namespace or name)

urn:li:mlFeature:(user_features_v2,age)
urn:li:mlFeature:(user_features,age_v2)

Pros: Each version is a separate entity with independent metadata
Cons: Harder to track version history; requires manual version management

Option 2: Version in the properties

MLFeatureProperties(
    description="User age in years",
    version=VersionTag(versionTag="2.0")
)

Pros: Version history attached to single entity; easier lineage tracking
Cons: Point-in-time queries are harder; version changes mutate entity

Recommendation: Use the version property in mlFeatureProperties for most use cases. Only use versioned URNs when breaking changes require fully separate entities (e.g., changing data type from continuous to categorical).

Composite Features and Feature Engineering

Composite features (features derived from other features) can be modeled in two ways:

Approach 1: Intermediate features as entities Create explicit feature entities for each transformation step, with lineage between them:

raw_feature -> transformed_feature -> composite_feature

Approach 2: Direct source lineage Skip intermediate features and link composite features directly to source datasets, documenting the transformation in the description.

Choose Approach 1 when:

Intermediate features are reused by multiple downstream features/models
You need to track transformations explicitly for governance
Feature engineering pipelines are complex and multi-stage

Choose Approach 2 when:

Transformations are simple and one-off
Intermediate features have no independent value
You want to reduce metadata entity count

Feature Drift and Monitoring

While DataHub's ML Feature entity doesn't include built-in drift monitoring aspects, you can use:

Custom Properties: Store drift metrics or monitoring status
Tags: Apply tags like HIGH_DRIFT_DETECTED or MONITORING_ENABLED
Documentation: Link to external monitoring dashboards via institutionalMemory
External Systems: Integrate with specialized feature monitoring tools and reference them in feature metadata

Feature drift detection typically happens in runtime feature stores or model monitoring systems, with DataHub serving as the metadata catalog that links to those systems.

Search and Discovery

Features are searchable by:

Name (with autocomplete)
Namespace (partial text search)
Description (full text search)
Tags and glossary terms
Owner
Source datasets (via lineage)

The name field has the highest search boost score (8.0), making feature name the primary discovery mechanism. Ensure feature names are descriptive and follow consistent naming conventions across your organization.

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
Searchable: This field is indexed and can be searched in DataHub's search interface
Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
→ RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

mlFeatureKey

Key for an MLFeature

Fields
Raw Schema

Field	Type	Required	Description	Annotations
featureNamespace	string	✓	Namespace for the feature	Searchable
name	string	✓	Name of the feature	Searchable

{
  "type": "record",
  "Aspect": {
    "name": "mlFeatureKey"
  },
  "name": "MLFeatureKey",
  "namespace": "com.linkedin.metadata.key",
  "fields": [
    {
      "Searchable": {
        "fieldType": "TEXT_PARTIAL"
      },
      "type": "string",
      "name": "featureNamespace",
      "doc": "Namespace for the feature"
    },
    {
      "Searchable": {
        "boostScore": 8.0,
        "enableAutocomplete": true,
        "fieldNameAliases": [
          "_entityName"
        ],
        "fieldType": "WORD_GRAM",
        "searchLabel": "entityName",
        "searchTier": 1
      },
      "type": "string",
      "name": "name",
      "doc": "Name of the feature"
    }
  ],
  "doc": "Key for an MLFeature"
}

mlFeatureProperties

Properties associated with a MLFeature

Fields
Raw Schema

Field	Type	Required	Description	Annotations
customProperties	map	✓	Custom property bag.	Searchable
description	string		Documentation of the MLFeature	Searchable
dataType	MLFeatureDataType		Data Type of the MLFeature
version	VersionTag		Version of the MLFeature
sources	string[]		Source of the MLFeature	→ DerivedFrom

{
  "type": "record",
  "Aspect": {
    "name": "mlFeatureProperties"
  },
  "name": "MLFeatureProperties",
  "namespace": "com.linkedin.ml.metadata",
  "fields": [
    {
      "Searchable": {
        "/*": {
          "fieldType": "TEXT",
          "queryByDefault": true
        }
      },
      "type": {
        "type": "map",
        "values": "string"
      },
      "name": "customProperties",
      "default": {},
      "doc": "Custom property bag."
    },
    {
      "Searchable": {
        "fieldType": "TEXT",
        "hasValuesFieldName": "hasDescription"
      },
      "type": [
        "null",
        "string"
      ],
      "name": "description",
      "default": null,
      "doc": "Documentation of the MLFeature"
    },
    {
      "type": [
        "null",
        {
          "type": "enum",
          "symbolDocs": {
            "AUDIO": "Audio Data",
            "BINARY": "Binary data is discrete data that can be in only one of two categories - either yes or no, 1 or 0, off or on, etc",
            "BYTE": "Bytes data are binary-encoded values that can represent complex objects.",
            "CONTINUOUS": "Continuous data are made of uncountable values, often the result of a measurement such as height, weight, age etc.",
            "COUNT": "Count data is discrete whole number data - no negative numbers here.\nCount data often has many small values, such as zero and one.",
            "IMAGE": "Image Data",
            "INTERVAL": "Interval data has equal spaces between the numbers and does not represent a temporal pattern.\nExamples include percentages, temperatures, and income.",
            "MAP": "Mapping Data Type ex: dict, map",
            "NOMINAL": "Nominal data is made of discrete values with no numerical relationship between the different categories - mean and median are meaningless.\nAnimal species is one example. For example, pig is not higher than bird and lower than fish.",
            "ORDINAL": "Ordinal data are discrete integers that can be ranked or sorted.\nFor example, the distance between first and second may not be the same as the distance between second and third.",
            "SEQUENCE": "Sequence Data Type ex: list, tuple, range",
            "SET": "Set Data Type ex: set, frozenset",
            "TEXT": "Text Data",
            "TIME": "Time data is a cyclical, repeating continuous form of data.\nThe relevant time features can be any period- daily, weekly, monthly, annual, etc.",
            "UNKNOWN": "Unknown data are data that we don't know the type for.",
            "USELESS": "Useless data is unique, discrete data with no potential relationship with the outcome variable.\nA useless feature has high cardinality. An example would be bank account numbers that were generated randomly.",
            "VIDEO": "Video Data"
          },
          "name": "MLFeatureDataType",
          "namespace": "com.linkedin.common",
          "symbols": [
            "USELESS",
            "NOMINAL",
            "ORDINAL",
            "BINARY",
            "COUNT",
            "TIME",
            "INTERVAL",
            "IMAGE",
            "VIDEO",
            "AUDIO",
            "TEXT",
            "MAP",
            "SEQUENCE",
            "SET",
            "CONTINUOUS",
            "BYTE",
            "UNKNOWN"
          ],
          "doc": "MLFeature Data Type"
        }
      ],
      "name": "dataType",
      "default": null,
      "doc": "Data Type of the MLFeature"
    },
    {
      "type": [
        "null",
        {
          "type": "record",
          "name": "VersionTag",
          "namespace": "com.linkedin.common",
          "fields": [
            {
              "type": [
                "null",
                "string"
              ],
              "name": "versionTag",
              "default": null
            },
            {
              "type": [
                "null",
                {
                  "type": "record",
                  "name": "MetadataAttribution",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": "long",
                      "name": "time",
                      "doc": "When this metadata was updated."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": "string",
                      "name": "actor",
                      "doc": "The entity (e.g. a member URN) responsible for applying the assocated metadata. This can\neither be a user (in case of UI edits) or the datahub system for automation."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "source",
                      "default": null,
                      "doc": "The DataHub source responsible for applying the associated metadata. This will only be filled out\nwhen a DataHub source is responsible. This includes the specific metadata test urn, the automation urn."
                    },
                    {
                      "type": {
                        "type": "map",
                        "values": "string"
                      },
                      "name": "sourceDetail",
                      "default": {},
                      "doc": "The details associated with why this metadata was applied. For example, this could include\nthe actual regex rule, sql statement, ingestion pipeline ID, etc."
                    }
                  ],
                  "doc": "Information about who, why, and how this metadata was applied"
                }
              ],
              "name": "metadataAttribution",
              "default": null
            }
          ],
          "doc": "A resource-defined string representing the resource state for the purpose of concurrency control"
        }
      ],
      "name": "version",
      "default": null,
      "doc": "Version of the MLFeature"
    },
    {
      "Relationship": {
        "/*": {
          "entityTypes": [
            "dataset"
          ],
          "isLineage": true,
          "name": "DerivedFrom"
        }
      },
      "type": [
        "null",
        {
          "type": "array",
          "items": "string"
        }
      ],
      "name": "sources",
      "default": null,
      "doc": "Source of the MLFeature"
    }
  ],
  "doc": "Properties associated with a MLFeature"
}

ownership

Ownership information of an entity.

Fields
Raw Schema

Field	Type	Required	Description	Annotations
owners	Owner[]	✓	List of owners of the entity.
ownerTypes	map		Ownership type to Owners map, populated via mutation hook.	Searchable
lastModified	AuditStamp	✓	Audit stamp containing who last modified the record and when. A value of 0 in the time field indi...

{
  "type": "record",
  "Aspect": {
    "name": "ownership"
  },
  "name": "Ownership",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "Owner",
          "namespace": "com.linkedin.common",
          "fields": [
            {
              "Relationship": {
                "entityTypes": [
                  "corpuser",
                  "corpGroup"
                ],
                "name": "OwnedBy"
              },
              "Searchable": {
                "addToFilters": true,
                "fieldName": "owners",
                "fieldType": "URN",
                "filterNameOverride": "Owned By",
                "hasValuesFieldName": "hasOwners",
                "queryByDefault": false,
                "searchTier": 2
              },
              "java": {
                "class": "com.linkedin.common.urn.Urn"
              },
              "type": "string",
              "name": "owner",
              "doc": "Owner URN, e.g. urn:li:corpuser:ldap, urn:li:corpGroup:group_name, and urn:li:multiProduct:mp_name\n(Caveat: only corpuser is currently supported in the frontend.)"
            },
            {
              "deprecated": true,
              "type": {
                "type": "enum",
                "symbolDocs": {
                  "BUSINESS_OWNER": "A person or group who is responsible for logical, or business related, aspects of the asset.",
                  "CONSUMER": "A person, group, or service that consumes the data\nDeprecated! Use TECHNICAL_OWNER or BUSINESS_OWNER instead.",
                  "CUSTOM": "Set when ownership type is unknown or a when new one is specified as an ownership type entity for which we have no\nenum value for. This is used for backwards compatibility",
                  "DATAOWNER": "A person or group that is owning the data\nDeprecated! Use TECHNICAL_OWNER instead.",
                  "DATA_STEWARD": "A steward, expert, or delegate responsible for the asset.",
                  "DELEGATE": "A person or a group that overseas the operation, e.g. a DBA or SRE.\nDeprecated! Use TECHNICAL_OWNER instead.",
                  "DEVELOPER": "A person or group that is in charge of developing the code\nDeprecated! Use TECHNICAL_OWNER instead.",
                  "NONE": "No specific type associated to the owner.",
                  "PRODUCER": "A person, group, or service that produces/generates the data\nDeprecated! Use TECHNICAL_OWNER instead.",
                  "STAKEHOLDER": "A person or a group that has direct business interest\nDeprecated! Use TECHNICAL_OWNER, BUSINESS_OWNER, or STEWARD instead.",
                  "TECHNICAL_OWNER": "person or group who is responsible for technical aspects of the asset."
                },
                "deprecatedSymbols": {
                  "CONSUMER": true,
                  "DATAOWNER": true,
                  "DELEGATE": true,
                  "DEVELOPER": true,
                  "PRODUCER": true,
                  "STAKEHOLDER": true
                },
                "name": "OwnershipType",
                "namespace": "com.linkedin.common",
                "symbols": [
                  "CUSTOM",
                  "TECHNICAL_OWNER",
                  "BUSINESS_OWNER",
                  "DATA_STEWARD",
                  "NONE",
                  "DEVELOPER",
                  "DATAOWNER",
                  "DELEGATE",
                  "PRODUCER",
                  "CONSUMER",
                  "STAKEHOLDER"
                ],
                "doc": "Asset owner types"
              },
              "name": "type",
              "doc": "The type of the ownership"
            },
            {
              "Relationship": {
                "entityTypes": [
                  "ownershipType"
                ],
                "name": "ownershipType"
              },
              "java": {
                "class": "com.linkedin.common.urn.Urn"
              },
              "type": [
                "null",
                "string"
              ],
              "name": "typeUrn",
              "default": null,
              "doc": "The type of the ownership\nUrn of type O"
            },
            {
              "type": [
                "null",
                {
                  "type": "record",
                  "name": "OwnershipSource",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": {
                        "type": "enum",
                        "symbolDocs": {
                          "AUDIT": "Auditing system or audit logs",
                          "DATABASE": "Database, e.g. GRANTS table",
                          "FILE_SYSTEM": "File system, e.g. file/directory owner",
                          "ISSUE_TRACKING_SYSTEM": "Issue tracking system, e.g. Jira",
                          "MANUAL": "Manually provided by a user",
                          "OTHER": "Other sources",
                          "SERVICE": "Other ownership-like service, e.g. Nuage, ACL service etc",
                          "SOURCE_CONTROL": "SCM system, e.g. GIT, SVN"
                        },
                        "name": "OwnershipSourceType",
                        "namespace": "com.linkedin.common",
                        "symbols": [
                          "AUDIT",
                          "DATABASE",
                          "FILE_SYSTEM",
                          "ISSUE_TRACKING_SYSTEM",
                          "MANUAL",
                          "SERVICE",
                          "SOURCE_CONTROL",
                          "OTHER"
                        ]
                      },
                      "name": "type",
                      "doc": "The type of the source"
                    },
                    {
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "url",
                      "default": null,
                      "doc": "A reference URL for the source"
                    }
                  ],
                  "doc": "Source/provider of the ownership information"
                }
              ],
              "name": "source",
              "default": null,
              "doc": "Source information for the ownership"
            },
            {
              "Searchable": {
                "/actor": {
                  "fieldName": "ownerAttributionActors",
                  "fieldType": "URN",
                  "queryByDefault": false
                },
                "/source": {
                  "fieldName": "ownerAttributionSources",
                  "fieldType": "URN",
                  "queryByDefault": false
                },
                "/time": {
                  "fieldName": "ownerAttributionDates",
                  "fieldType": "DATETIME",
                  "queryByDefault": false
                }
              },
              "type": [
                "null",
                {
                  "type": "record",
                  "name": "MetadataAttribution",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": "long",
                      "name": "time",
                      "doc": "When this metadata was updated."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": "string",
                      "name": "actor",
                      "doc": "The entity (e.g. a member URN) responsible for applying the assocated metadata. This can\neither be a user (in case of UI edits) or the datahub system for automation."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "source",
                      "default": null,
                      "doc": "The DataHub source responsible for applying the associated metadata. This will only be filled out\nwhen a DataHub source is responsible. This includes the specific metadata test urn, the automation urn."
                    },
                    {
                      "type": {
                        "type": "map",
                        "values": "string"
                      },
                      "name": "sourceDetail",
                      "default": {},
                      "doc": "The details associated with why this metadata was applied. For example, this could include\nthe actual regex rule, sql statement, ingestion pipeline ID, etc."
                    }
                  ],
                  "doc": "Information about who, why, and how this metadata was applied"
                }
              ],
              "name": "attribution",
              "default": null,
              "doc": "Information about who, why, and how this metadata was applied"
            }
          ],
          "doc": "Ownership information"
        }
      },
      "name": "owners",
      "doc": "List of owners of the entity."
    },
    {
      "Searchable": {
        "/$key": {
          "fieldType": "MAP_ARRAY",
          "queryByDefault": false
        }
      },
      "type": [
        {
          "type": "map",
          "values": {
            "type": "array",
            "items": "string"
          }
        },
        "null"
      ],
      "name": "ownerTypes",
      "default": {},
      "doc": "Ownership type to Owners map, populated via mutation hook."
    },
    {
      "type": {
        "type": "record",
        "name": "AuditStamp",
        "namespace": "com.linkedin.common",
        "fields": [
          {
            "type": "long",
            "name": "time",
            "doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
          },
          {
            "java": {
              "class": "com.linkedin.common.urn.Urn"
            },
            "type": "string",
            "name": "actor",
            "doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
          },
          {
            "java": {
              "class": "com.linkedin.common.urn.Urn"
            },
            "type": [
              "null",
              "string"
            ],
            "name": "impersonator",
            "default": null,
            "doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
          },
          {
            "type": [
              "null",
              "string"
            ],
            "name": "message",
            "default": null,
            "doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
          }
        ],
        "doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
      },
      "name": "lastModified",
      "default": {
        "actor": "urn:li:corpuser:unknown",
        "impersonator": null,
        "time": 0,
        "message": null
      },
      "doc": "Audit stamp containing who last modified the record and when. A value of 0 in the time field indicates missing data."
    }
  ],
  "doc": "Ownership information of an entity."
}

institutionalMemory

Institutional memory of an entity. This is a way to link to relevant documentation and provide description of the documentation. Institutional or tribal knowledge is very important for users to leverage the entity.

Fields
Raw Schema

Field	Type	Required	Description	Annotations
elements	InstitutionalMemoryMetadata[]	✓	List of records that represent institutional memory of an entity. Each record consists of a link,...

{
  "type": "record",
  "Aspect": {
    "name": "institutionalMemory"
  },
  "name": "InstitutionalMemory",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "InstitutionalMemoryMetadata",
          "namespace": "com.linkedin.common",
          "fields": [
            {
              "java": {
                "class": "com.linkedin.common.url.Url",
                "coercerClass": "com.linkedin.common.url.UrlCoercer"
              },
              "type": "string",
              "name": "url",
              "doc": "Link to an engineering design document or a wiki page."
            },
            {
              "type": "string",
              "name": "description",
              "doc": "Description of the link."
            },
            {
              "type": {
                "type": "record",
                "name": "AuditStamp",
                "namespace": "com.linkedin.common",
                "fields": [
                  {
                    "type": "long",
                    "name": "time",
                    "doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
                  },
                  {
                    "java": {
                      "class": "com.linkedin.common.urn.Urn"
                    },
                    "type": "string",
                    "name": "actor",
                    "doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
                  },
                  {
                    "java": {
                      "class": "com.linkedin.common.urn.Urn"
                    },
                    "type": [
                      "null",
                      "string"
                    ],
                    "name": "impersonator",
                    "default": null,
                    "doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
                  },
                  {
                    "type": [
                      "null",
                      "string"
                    ],
                    "name": "message",
                    "default": null,
                    "doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
                  }
                ],
                "doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
              },
              "name": "createStamp",
              "doc": "Audit stamp associated with creation of this record"
            },
            {
              "type": [
                "null",
                "com.linkedin.common.AuditStamp"
              ],
              "name": "updateStamp",
              "default": null,
              "doc": "Audit stamp associated with updation of this record"
            },
            {
              "type": [
                "null",
                {
                  "type": "record",
                  "name": "InstitutionalMemoryMetadataSettings",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": "boolean",
                      "name": "showInAssetPreview",
                      "default": false,
                      "doc": "Show record in asset preview like on entity header and search previews"
                    }
                  ],
                  "doc": "Settings related to a record of InstitutionalMemoryMetadata"
                }
              ],
              "name": "settings",
              "default": null,
              "doc": "Settings for this record"
            }
          ],
          "doc": "Metadata corresponding to a record of institutional memory."
        }
      },
      "name": "elements",
      "doc": "List of records that represent institutional memory of an entity. Each record consists of a link, description, creator and timestamps associated with that record."
    }
  ],
  "doc": "Institutional memory of an entity. This is a way to link to relevant documentation and provide description of the documentation. Institutional or tribal knowledge is very important for users to leverage the entity."
}

status

The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc. This aspect is used to represent soft deletes conventionally.

Fields
Raw Schema

Field	Type	Required	Description	Annotations
removed	boolean	✓	Whether the entity has been removed (soft-deleted).	Searchable

{
  "type": "record",
  "Aspect": {
    "name": "status"
  },
  "name": "Status",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "Searchable": {
        "fieldType": "BOOLEAN"
      },
      "type": "boolean",
      "name": "removed",
      "default": false,
      "doc": "Whether the entity has been removed (soft-deleted)."
    }
  ],
  "doc": "The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc.\nThis aspect is used to represent soft deletes conventionally."
}

deprecation

Deprecation status of an entity

Fields
Raw Schema

Field	Type	Required	Description	Annotations
deprecated	boolean	✓	Whether the entity is deprecated.	Searchable
decommissionTime	long		The time user plan to decommission this entity.
note	string	✓	Additional information about the entity deprecation plan, such as the wiki, doc, RB.
actor	string	✓	The user URN which will be credited for modifying this deprecation content.
replacement	string

{
  "type": "record",
  "Aspect": {
    "name": "deprecation"
  },
  "name": "Deprecation",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "Searchable": {
        "addToFilters": true,
        "fieldType": "BOOLEAN",
        "filterNameOverride": "Deprecated",
        "weightsPerFieldValue": {
          "true": 0.5
        }
      },
      "type": "boolean",
      "name": "deprecated",
      "doc": "Whether the entity is deprecated."
    },
    {
      "type": [
        "null",
        "long"
      ],
      "name": "decommissionTime",
      "default": null,
      "doc": "The time user plan to decommission this entity."
    },
    {
      "type": "string",
      "name": "note",
      "doc": "Additional information about the entity deprecation plan, such as the wiki, doc, RB."
    },
    {
      "java": {
        "class": "com.linkedin.common.urn.Urn"
      },
      "type": "string",
      "name": "actor",
      "doc": "The user URN which will be credited for modifying this deprecation content."
    },
    {
      "java": {
        "class": "com.linkedin.common.urn.Urn"
      },
      "type": [
        "null",
        "string"
      ],
      "name": "replacement",
      "default": null
    }
  ],
  "doc": "Deprecation status of an entity"
}

browsePaths

Shared aspect containing Browse Paths to be indexed for an entity.

Fields
Raw Schema

Field	Type	Required	Description	Annotations
paths	string[]	✓	A list of valid browse paths for the entity. Browse paths are expected to be forward slash-separ...	Searchable

{
  "type": "record",
  "Aspect": {
    "name": "browsePaths"
  },
  "name": "BrowsePaths",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "Searchable": {
        "/*": {
          "fieldName": "browsePaths",
          "fieldType": "BROWSE_PATH"
        }
      },
      "type": {
        "type": "array",
        "items": "string"
      },
      "name": "paths",
      "doc": "A list of valid browse paths for the entity.\n\nBrowse paths are expected to be forward slash-separated strings. For example: 'prod/snowflake/datasetName'"
    }
  ],
  "doc": "Shared aspect containing Browse Paths to be indexed for an entity."
}

globalTags

Tag aspect used for applying tags to an entity

Fields
Raw Schema

Field	Type	Required	Description	Annotations
tags	TagAssociation[]	✓	Tags associated with a given entity	Searchable, → TaggedWith

{
  "type": "record",
  "Aspect": {
    "name": "globalTags"
  },
  "name": "GlobalTags",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "Relationship": {
        "/*/tag": {
          "entityTypes": [
            "tag"
          ],
          "name": "TaggedWith"
        }
      },
      "Searchable": {
        "/*/tag": {
          "addToFilters": true,
          "boostScore": 0.5,
          "fieldName": "tags",
          "fieldType": "URN",
          "filterNameOverride": "Tagged With",
          "hasValuesFieldName": "hasTags",
          "queryByDefault": true,
          "searchTier": 2
        }
      },
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "TagAssociation",
          "namespace": "com.linkedin.common",
          "fields": [
            {
              "java": {
                "class": "com.linkedin.common.urn.TagUrn"
              },
              "type": "string",
              "name": "tag",
              "doc": "Urn of the applied tag"
            },
            {
              "type": [
                "null",
                "string"
              ],
              "name": "context",
              "default": null,
              "doc": "Additional context about the association"
            },
            {
              "Searchable": {
                "/actor": {
                  "fieldName": "tagAttributionActors",
                  "fieldType": "URN",
                  "queryByDefault": false
                },
                "/source": {
                  "fieldName": "tagAttributionSources",
                  "fieldType": "URN",
                  "queryByDefault": false
                },
                "/time": {
                  "fieldName": "tagAttributionDates",
                  "fieldType": "DATETIME",
                  "queryByDefault": false
                }
              },
              "type": [
                "null",
                {
                  "type": "record",
                  "name": "MetadataAttribution",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": "long",
                      "name": "time",
                      "doc": "When this metadata was updated."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": "string",
                      "name": "actor",
                      "doc": "The entity (e.g. a member URN) responsible for applying the assocated metadata. This can\neither be a user (in case of UI edits) or the datahub system for automation."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "source",
                      "default": null,
                      "doc": "The DataHub source responsible for applying the associated metadata. This will only be filled out\nwhen a DataHub source is responsible. This includes the specific metadata test urn, the automation urn."
                    },
                    {
                      "type": {
                        "type": "map",
                        "values": "string"
                      },
                      "name": "sourceDetail",
                      "default": {},
                      "doc": "The details associated with why this metadata was applied. For example, this could include\nthe actual regex rule, sql statement, ingestion pipeline ID, etc."
                    }
                  ],
                  "doc": "Information about who, why, and how this metadata was applied"
                }
              ],
              "name": "attribution",
              "default": null,
              "doc": "Information about who, why, and how this metadata was applied"
            }
          ],
          "doc": "Properties of an applied tag. For now, just an Urn. In the future we can extend this with other properties, e.g.\npropagation parameters."
        }
      },
      "name": "tags",
      "doc": "Tags associated with a given entity"
    }
  ],
  "doc": "Tag aspect used for applying tags to an entity"
}

dataPlatformInstance

The specific instance of the data platform that this entity belongs to

Fields
Raw Schema

Field	Type	Required	Description	Annotations
platform	string	✓	Data Platform	Searchable
instance	string		Instance of the data platform (e.g. db instance)	Searchable (platformInstance)

{
  "type": "record",
  "Aspect": {
    "name": "dataPlatformInstance"
  },
  "name": "DataPlatformInstance",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "Searchable": {
        "addToFilters": true,
        "fieldType": "URN",
        "filterNameOverride": "Platform"
      },
      "java": {
        "class": "com.linkedin.common.urn.Urn"
      },
      "type": "string",
      "name": "platform",
      "doc": "Data Platform"
    },
    {
      "Searchable": {
        "addToFilters": true,
        "fieldName": "platformInstance",
        "fieldType": "URN",
        "filterNameOverride": "Platform Instance"
      },
      "java": {
        "class": "com.linkedin.common.urn.Urn"
      },
      "type": [
        "null",
        "string"
      ],
      "name": "instance",
      "default": null,
      "doc": "Instance of the data platform (e.g. db instance)"
    }
  ],
  "doc": "The specific instance of the data platform that this entity belongs to"
}

browsePathsV2

Shared aspect containing a Browse Path to be indexed for an entity.

Fields
Raw Schema

Field	Type	Required	Description	Annotations
path	BrowsePathEntry[]	✓	A valid browse path for the entity. This field is provided by DataHub by default. This aspect is ...	Searchable

{
  "type": "record",
  "Aspect": {
    "name": "browsePathsV2"
  },
  "name": "BrowsePathsV2",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "Searchable": {
        "/*/id": {
          "fieldName": "browsePathV2",
          "fieldType": "BROWSE_PATH_V2"
        }
      },
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "BrowsePathEntry",
          "namespace": "com.linkedin.common",
          "fields": [
            {
              "type": "string",
              "name": "id",
              "doc": "The ID of the browse path entry. This is what gets stored in the index.\nIf there's an urn associated with this entry, id and urn will be the same"
            },
            {
              "java": {
                "class": "com.linkedin.common.urn.Urn"
              },
              "type": [
                "null",
                "string"
              ],
              "name": "urn",
              "default": null,
              "doc": "Optional urn pointing to some entity in DataHub"
            }
          ],
          "doc": "Represents a single level in an entity's browsePathV2"
        }
      },
      "name": "path",
      "doc": "A valid browse path for the entity. This field is provided by DataHub by default.\nThis aspect is a newer version of browsePaths where we can encode more information in the path.\nThis path is also based on containers for a given entity if it has containers.\n\nThis is stored in elasticsearch as unit-separator delimited strings and only includes platform specific folders or containers.\nThese paths should not include high level info captured elsewhere ie. Platform and Environment."
    }
  ],
  "doc": "Shared aspect containing a Browse Path to be indexed for an entity."
}

glossaryTerms

Related business terms information

Fields
Raw Schema

Field	Type	Required	Description	Annotations
terms	GlossaryTermAssociation[]	✓	The related business terms
auditStamp	AuditStamp	✓	Audit stamp containing who reported the related business term

{
  "type": "record",
  "Aspect": {
    "name": "glossaryTerms"
  },
  "name": "GlossaryTerms",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "GlossaryTermAssociation",
          "namespace": "com.linkedin.common",
          "fields": [
            {
              "Relationship": {
                "entityTypes": [
                  "glossaryTerm"
                ],
                "name": "TermedWith"
              },
              "Searchable": {
                "addToFilters": true,
                "fieldName": "glossaryTerms",
                "fieldType": "URN",
                "filterNameOverride": "Glossary Term",
                "hasValuesFieldName": "hasGlossaryTerms",
                "includeSystemModifiedAt": true,
                "systemModifiedAtFieldName": "termsModifiedAt"
              },
              "java": {
                "class": "com.linkedin.common.urn.GlossaryTermUrn"
              },
              "type": "string",
              "name": "urn",
              "doc": "Urn of the applied glossary term"
            },
            {
              "java": {
                "class": "com.linkedin.common.urn.Urn"
              },
              "type": [
                "null",
                "string"
              ],
              "name": "actor",
              "default": null,
              "doc": "The user URN which will be credited for adding associating this term to the entity"
            },
            {
              "type": [
                "null",
                "string"
              ],
              "name": "context",
              "default": null,
              "doc": "Additional context about the association"
            },
            {
              "Searchable": {
                "/actor": {
                  "fieldName": "termAttributionActors",
                  "fieldType": "URN",
                  "queryByDefault": false
                },
                "/source": {
                  "fieldName": "termAttributionSources",
                  "fieldType": "URN",
                  "queryByDefault": false
                },
                "/time": {
                  "fieldName": "termAttributionDates",
                  "fieldType": "DATETIME",
                  "queryByDefault": false
                }
              },
              "type": [
                "null",
                {
                  "type": "record",
                  "name": "MetadataAttribution",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": "long",
                      "name": "time",
                      "doc": "When this metadata was updated."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": "string",
                      "name": "actor",
                      "doc": "The entity (e.g. a member URN) responsible for applying the assocated metadata. This can\neither be a user (in case of UI edits) or the datahub system for automation."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "source",
                      "default": null,
                      "doc": "The DataHub source responsible for applying the associated metadata. This will only be filled out\nwhen a DataHub source is responsible. This includes the specific metadata test urn, the automation urn."
                    },
                    {
                      "type": {
                        "type": "map",
                        "values": "string"
                      },
                      "name": "sourceDetail",
                      "default": {},
                      "doc": "The details associated with why this metadata was applied. For example, this could include\nthe actual regex rule, sql statement, ingestion pipeline ID, etc."
                    }
                  ],
                  "doc": "Information about who, why, and how this metadata was applied"
                }
              ],
              "name": "attribution",
              "default": null,
              "doc": "Information about who, why, and how this metadata was applied"
            }
          ],
          "doc": "Properties of an applied glossary term."
        }
      },
      "name": "terms",
      "doc": "The related business terms"
    },
    {
      "type": {
        "type": "record",
        "name": "AuditStamp",
        "namespace": "com.linkedin.common",
        "fields": [
          {
            "type": "long",
            "name": "time",
            "doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
          },
          {
            "java": {
              "class": "com.linkedin.common.urn.Urn"
            },
            "type": "string",
            "name": "actor",
            "doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
          },
          {
            "java": {
              "class": "com.linkedin.common.urn.Urn"
            },
            "type": [
              "null",
              "string"
            ],
            "name": "impersonator",
            "default": null,
            "doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
          },
          {
            "type": [
              "null",
              "string"
            ],
            "name": "message",
            "default": null,
            "doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
          }
        ],
        "doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
      },
      "name": "auditStamp",
      "doc": "Audit stamp containing who reported the related business term"
    }
  ],
  "doc": "Related business terms information"
}

editableMlFeatureProperties

Properties associated with a MLFeature editable from the UI

Fields
Raw Schema

Field	Type	Required	Description	Annotations
description	string		Documentation of the MLFeature	Searchable (editedDescription)

{
  "type": "record",
  "Aspect": {
    "name": "editableMlFeatureProperties"
  },
  "name": "EditableMLFeatureProperties",
  "namespace": "com.linkedin.ml.metadata",
  "fields": [
    {
      "Searchable": {
        "fieldName": "editedDescription",
        "fieldType": "TEXT",
        "searchTier": 2
      },
      "type": [
        "null",
        "string"
      ],
      "name": "description",
      "default": null,
      "doc": "Documentation of the MLFeature"
    }
  ],
  "doc": "Properties associated with a MLFeature editable from the UI"
}

domains

Links from an Asset to its Domains

Fields
Raw Schema

Field	Type	Required	Description	Annotations
domains	string[]	✓	The Domains attached to an Asset	Searchable, → AssociatedWith

{
  "type": "record",
  "Aspect": {
    "name": "domains"
  },
  "name": "Domains",
  "namespace": "com.linkedin.domain",
  "fields": [
    {
      "Relationship": {
        "/*": {
          "entityTypes": [
            "domain"
          ],
          "name": "AssociatedWith"
        }
      },
      "Searchable": {
        "/*": {
          "addToFilters": true,
          "fieldName": "domains",
          "fieldType": "URN",
          "filterNameOverride": "Domain",
          "hasValuesFieldName": "hasDomain"
        }
      },
      "type": {
        "type": "array",
        "items": "string"
      },
      "name": "domains",
      "doc": "The Domains attached to an Asset"
    }
  ],
  "doc": "Links from an Asset to its Domains"
}

applications

Links from an Asset to its Applications

Fields
Raw Schema

Field	Type	Required	Description	Annotations
applications	string[]	✓	The Applications attached to an Asset	Searchable, → AssociatedWith

{
  "type": "record",
  "Aspect": {
    "name": "applications"
  },
  "name": "Applications",
  "namespace": "com.linkedin.application",
  "fields": [
    {
      "Relationship": {
        "/*": {
          "entityTypes": [
            "application"
          ],
          "name": "AssociatedWith"
        }
      },
      "Searchable": {
        "/*": {
          "addToFilters": true,
          "fieldName": "applications",
          "fieldType": "URN",
          "filterNameOverride": "Application",
          "hasValuesFieldName": "hasApplication"
        }
      },
      "type": {
        "type": "array",
        "items": "string"
      },
      "name": "applications",
      "doc": "The Applications attached to an Asset"
    }
  ],
  "doc": "Links from an Asset to its Applications"
}

structuredProperties

Properties about an entity governed by StructuredPropertyDefinition

Fields
Raw Schema

Field	Type	Required	Description	Annotations
properties	StructuredPropertyValueAssignment[]	✓	Custom property bag.

{
  "type": "record",
  "Aspect": {
    "name": "structuredProperties"
  },
  "name": "StructuredProperties",
  "namespace": "com.linkedin.structured",
  "fields": [
    {
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "StructuredPropertyValueAssignment",
          "namespace": "com.linkedin.structured",
          "fields": [
            {
              "java": {
                "class": "com.linkedin.common.urn.Urn"
              },
              "type": "string",
              "name": "propertyUrn",
              "doc": "The property that is being assigned a value."
            },
            {
              "type": {
                "type": "array",
                "items": [
                  "string",
                  "double"
                ]
              },
              "name": "values",
              "doc": "The value assigned to the property."
            },
            {
              "type": [
                "null",
                {
                  "type": "record",
                  "name": "AuditStamp",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": "long",
                      "name": "time",
                      "doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": "string",
                      "name": "actor",
                      "doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "impersonator",
                      "default": null,
                      "doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
                    },
                    {
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "message",
                      "default": null,
                      "doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
                    }
                  ],
                  "doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
                }
              ],
              "name": "created",
              "default": null,
              "doc": "Audit stamp containing who created this relationship edge and when"
            },
            {
              "type": [
                "null",
                "com.linkedin.common.AuditStamp"
              ],
              "name": "lastModified",
              "default": null,
              "doc": "Audit stamp containing who last modified this relationship edge and when"
            },
            {
              "Searchable": {
                "/actor": {
                  "fieldName": "structuredPropertyAttributionActors",
                  "fieldType": "URN",
                  "queryByDefault": false
                },
                "/source": {
                  "fieldName": "structuredPropertyAttributionSources",
                  "fieldType": "URN",
                  "queryByDefault": false
                },
                "/time": {
                  "fieldName": "structuredPropertyAttributionDates",
                  "fieldType": "DATETIME",
                  "queryByDefault": false
                }
              },
              "type": [
                "null",
                {
                  "type": "record",
                  "name": "MetadataAttribution",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": "long",
                      "name": "time",
                      "doc": "When this metadata was updated."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": "string",
                      "name": "actor",
                      "doc": "The entity (e.g. a member URN) responsible for applying the assocated metadata. This can\neither be a user (in case of UI edits) or the datahub system for automation."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "source",
                      "default": null,
                      "doc": "The DataHub source responsible for applying the associated metadata. This will only be filled out\nwhen a DataHub source is responsible. This includes the specific metadata test urn, the automation urn."
                    },
                    {
                      "type": {
                        "type": "map",
                        "values": "string"
                      },
                      "name": "sourceDetail",
                      "default": {},
                      "doc": "The details associated with why this metadata was applied. For example, this could include\nthe actual regex rule, sql statement, ingestion pipeline ID, etc."
                    }
                  ],
                  "doc": "Information about who, why, and how this metadata was applied"
                }
              ],
              "name": "attribution",
              "default": null,
              "doc": "Information about who, why, and how this metadata was applied"
            }
          ]
        }
      },
      "name": "properties",
      "doc": "Custom property bag."
    }
  ],
  "doc": "Properties about an entity governed by StructuredPropertyDefinition"
}

forms

Forms that are assigned to this entity to be filled out

Fields
Raw Schema

Field	Type	Required	Description	Annotations
incompleteForms	FormAssociation[]	✓	All incomplete forms assigned to the entity.	Searchable
completedForms	FormAssociation[]	✓	All complete forms assigned to the entity.	Searchable
verifications	FormVerificationAssociation[]	✓	Verifications that have been applied to the entity via completed forms.	Searchable

{
  "type": "record",
  "Aspect": {
    "name": "forms"
  },
  "name": "Forms",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "Searchable": {
        "/*/completedPrompts/*/id": {
          "fieldName": "incompleteFormsCompletedPromptIds",
          "fieldType": "KEYWORD",
          "queryByDefault": false
        },
        "/*/completedPrompts/*/lastModified/time": {
          "fieldName": "incompleteFormsCompletedPromptResponseTimes",
          "fieldType": "DATETIME",
          "queryByDefault": false
        },
        "/*/incompletePrompts/*/id": {
          "fieldName": "incompleteFormsIncompletePromptIds",
          "fieldType": "KEYWORD",
          "queryByDefault": false
        },
        "/*/urn": {
          "fieldName": "incompleteForms",
          "fieldType": "URN",
          "queryByDefault": false
        }
      },
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "FormAssociation",
          "namespace": "com.linkedin.common",
          "fields": [
            {
              "java": {
                "class": "com.linkedin.common.urn.Urn"
              },
              "type": "string",
              "name": "urn",
              "doc": "Urn of the applied form"
            },
            {
              "type": {
                "type": "array",
                "items": {
                  "type": "record",
                  "name": "FormPromptAssociation",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": "string",
                      "name": "id",
                      "doc": "The id for the prompt. This must be GLOBALLY UNIQUE."
                    },
                    {
                      "type": {
                        "type": "record",
                        "name": "AuditStamp",
                        "namespace": "com.linkedin.common",
                        "fields": [
                          {
                            "type": "long",
                            "name": "time",
                            "doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
                          },
                          {
                            "java": {
                              "class": "com.linkedin.common.urn.Urn"
                            },
                            "type": "string",
                            "name": "actor",
                            "doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
                          },
                          {
                            "java": {
                              "class": "com.linkedin.common.urn.Urn"
                            },
                            "type": [
                              "null",
                              "string"
                            ],
                            "name": "impersonator",
                            "default": null,
                            "doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
                          },
                          {
                            "type": [
                              "null",
                              "string"
                            ],
                            "name": "message",
                            "default": null,
                            "doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
                          }
                        ],
                        "doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
                      },
                      "name": "lastModified",
                      "doc": "The last time this prompt was touched for the entity (set, unset)"
                    },
                    {
                      "type": [
                        "null",
                        {
                          "type": "record",
                          "name": "FormPromptFieldAssociations",
                          "namespace": "com.linkedin.common",
                          "fields": [
                            {
                              "type": [
                                "null",
                                {
                                  "type": "array",
                                  "items": {
                                    "type": "record",
                                    "name": "FieldFormPromptAssociation",
                                    "namespace": "com.linkedin.common",
                                    "fields": [
                                      {
                                        "type": "string",
                                        "name": "fieldPath",
                                        "doc": "The field path on a schema field."
                                      },
                                      {
                                        "type": "com.linkedin.common.AuditStamp",
                                        "name": "lastModified",
                                        "doc": "The last time this prompt was touched for the field on the entity (set, unset)"
                                      }
                                    ],
                                    "doc": "Information about the status of a particular prompt for a specific schema field\non an entity."
                                  }
                                }
                              ],
                              "name": "completedFieldPrompts",
                              "default": null,
                              "doc": "A list of field-level prompt associations that are not yet complete for this form."
                            },
                            {
                              "type": [
                                "null",
                                {
                                  "type": "array",
                                  "items": "com.linkedin.common.FieldFormPromptAssociation"
                                }
                              ],
                              "name": "incompleteFieldPrompts",
                              "default": null,
                              "doc": "A list of field-level prompt associations that are complete for this form."
                            }
                          ],
                          "doc": "Information about the field-level prompt associations on a top-level prompt association."
                        }
                      ],
                      "name": "fieldAssociations",
                      "default": null,
                      "doc": "Optional information about the field-level prompt associations."
                    }
                  ],
                  "doc": "Information about the status of a particular prompt.\nNote that this is where we can add additional information about individual responses:\nactor, timestamp, and the response itself."
                }
              },
              "name": "incompletePrompts",
              "default": [],
              "doc": "A list of prompts that are not yet complete for this form."
            },
            {
              "type": {
                "type": "array",
                "items": "com.linkedin.common.FormPromptAssociation"
              },
              "name": "completedPrompts",
              "default": [],
              "doc": "A list of prompts that have been completed for this form."
            }
          ],
          "doc": "Properties of an applied form."
        }
      },
      "name": "incompleteForms",
      "doc": "All incomplete forms assigned to the entity."
    },
    {
      "Searchable": {
        "/*/completedPrompts/*/id": {
          "fieldName": "completedFormsCompletedPromptIds",
          "fieldType": "KEYWORD",
          "queryByDefault": false
        },
        "/*/completedPrompts/*/lastModified/time": {
          "fieldName": "completedFormsCompletedPromptResponseTimes",
          "fieldType": "DATETIME",
          "queryByDefault": false
        },
        "/*/incompletePrompts/*/id": {
          "fieldName": "completedFormsIncompletePromptIds",
          "fieldType": "KEYWORD",
          "queryByDefault": false
        },
        "/*/urn": {
          "fieldName": "completedForms",
          "fieldType": "URN",
          "queryByDefault": false
        }
      },
      "type": {
        "type": "array",
        "items": "com.linkedin.common.FormAssociation"
      },
      "name": "completedForms",
      "doc": "All complete forms assigned to the entity."
    },
    {
      "Searchable": {
        "/*/form": {
          "fieldName": "verifiedForms",
          "fieldType": "URN",
          "queryByDefault": false
        }
      },
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "FormVerificationAssociation",
          "namespace": "com.linkedin.common",
          "fields": [
            {
              "java": {
                "class": "com.linkedin.common.urn.Urn"
              },
              "type": "string",
              "name": "form",
              "doc": "The urn of the form that granted this verification."
            },
            {
              "type": [
                "null",
                "com.linkedin.common.AuditStamp"
              ],
              "name": "lastModified",
              "default": null,
              "doc": "An audit stamp capturing who and when verification was applied for this form."
            }
          ],
          "doc": "An association between a verification and an entity that has been granted\nvia completion of one or more forms of type 'VERIFICATION'."
        }
      },
      "name": "verifications",
      "default": [],
      "doc": "Verifications that have been applied to the entity via completed forms."
    }
  ],
  "doc": "Forms that are assigned to this entity to be filled out"
}

testResults

Information about a Test Result

Fields
Raw Schema

Field	Type	Required	Description	Annotations
failing	TestResult[]	✓	Results that are failing	Searchable, → IsFailing
passing	TestResult[]	✓	Results that are passing	Searchable, → IsPassing

{
  "type": "record",
  "Aspect": {
    "name": "testResults"
  },
  "name": "TestResults",
  "namespace": "com.linkedin.test",
  "fields": [
    {
      "Relationship": {
        "/*/test": {
          "entityTypes": [
            "test"
          ],
          "name": "IsFailing"
        }
      },
      "Searchable": {
        "/*/test": {
          "fieldName": "failingTests",
          "fieldType": "URN",
          "hasValuesFieldName": "hasFailingTests",
          "queryByDefault": false
        }
      },
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "TestResult",
          "namespace": "com.linkedin.test",
          "fields": [
            {
              "java": {
                "class": "com.linkedin.common.urn.Urn"
              },
              "type": "string",
              "name": "test",
              "doc": "The urn of the test"
            },
            {
              "type": {
                "type": "enum",
                "symbolDocs": {
                  "FAILURE": " The Test Failed",
                  "SUCCESS": " The Test Succeeded"
                },
                "name": "TestResultType",
                "namespace": "com.linkedin.test",
                "symbols": [
                  "SUCCESS",
                  "FAILURE"
                ]
              },
              "name": "type",
              "doc": "The type of the result"
            },
            {
              "type": [
                "null",
                "string"
              ],
              "name": "testDefinitionMd5",
              "default": null,
              "doc": "The md5 of the test definition that was used to compute this result.\nSee TestInfo.testDefinition.md5 for more information."
            },
            {
              "type": [
                "null",
                {
                  "type": "record",
                  "name": "AuditStamp",
                  "namespace": "com.linkedin.common",
                  "fields": [
                    {
                      "type": "long",
                      "name": "time",
                      "doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": "string",
                      "name": "actor",
                      "doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
                    },
                    {
                      "java": {
                        "class": "com.linkedin.common.urn.Urn"
                      },
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "impersonator",
                      "default": null,
                      "doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
                    },
                    {
                      "type": [
                        "null",
                        "string"
                      ],
                      "name": "message",
                      "default": null,
                      "doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
                    }
                  ],
                  "doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
                }
              ],
              "name": "lastComputed",
              "default": null,
              "doc": "The audit stamp of when the result was computed, including the actor who computed it."
            }
          ],
          "doc": "Information about a Test Result"
        }
      },
      "name": "failing",
      "doc": "Results that are failing"
    },
    {
      "Relationship": {
        "/*/test": {
          "entityTypes": [
            "test"
          ],
          "name": "IsPassing"
        }
      },
      "Searchable": {
        "/*/test": {
          "fieldName": "passingTests",
          "fieldType": "URN",
          "hasValuesFieldName": "hasPassingTests",
          "queryByDefault": false
        }
      },
      "type": {
        "type": "array",
        "items": "com.linkedin.test.TestResult"
      },
      "name": "passing",
      "doc": "Results that are passing"
    }
  ],
  "doc": "Information about a Test Result"
}

subTypes

Sub Types. Use this aspect to specialize a generic Entity e.g. Making a Dataset also be a View or also be a LookerExplore

Fields
Raw Schema

Field	Type	Required	Description	Annotations
typeNames	string[]	✓	The names of the specific types.	Searchable

{
  "type": "record",
  "Aspect": {
    "name": "subTypes"
  },
  "name": "SubTypes",
  "namespace": "com.linkedin.common",
  "fields": [
    {
      "Searchable": {
        "/*": {
          "addToFilters": true,
          "fieldType": "KEYWORD",
          "filterNameOverride": "Sub Type",
          "queryByDefault": false
        }
      },
      "type": {
        "type": "array",
        "items": "string"
      },
      "name": "typeNames",
      "doc": "The names of the specific types."
    }
  ],
  "doc": "Sub Types. Use this aspect to specialize a generic Entity\ne.g. Making a Dataset also be a View or also be a LookerExplore"
}

Common Types

These types are used across multiple aspects in this entity.

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

time (long): When did the resource/association/sub-resource move into the specific lifecyc...
actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
message (string?): Additional context around how DataHub was informed of the particular change. ...

FormAssociation

Properties of an applied form.

Fields:

urn (string): Urn of the applied form
incompletePrompts (FormPromptAssociation[]): A list of prompts that are not yet complete for this form.
completedPrompts (FormPromptAssociation[]): A list of prompts that have been completed for this form.

TestResult

Information about a Test Result

Fields:

test (string): The urn of the test
type (TestResultType): The type of the result
testDefinitionMd5 (string?): The md5 of the test definition that was used to compute this result. See Test...
lastComputed (AuditStamp?): The audit stamp of when the result was computed, including the actor who comp...

Relationships

Outgoing

These are the relationships stored in this entity's aspects

DerivedFrom
- Dataset via mlFeatureProperties.sources
OwnedBy
- Corpuser via ownership.owners.owner
- CorpGroup via ownership.owners.owner
ownershipType
- OwnershipType via ownership.owners.typeUrn
TaggedWith
- Tag via globalTags.tags
TermedWith
- GlossaryTerm via glossaryTerms.terms.urn
AssociatedWith
- Domain via domains.domains
- Application via applications.applications
IsFailing
- Test via testResults.failing
IsPassing
- Test via testResults.passing

Incoming

These are the relationships stored in other entity's aspects

RelatedAsset
- Document via documentInfo.relatedAssets.asset
Consumes
- MlModel via mlModelProperties.mlFeatures
Contains
- MlFeatureTable via mlFeatureTableProperties.mlFeatures

Global Metadata Model

Global Graph

Is this page helpful?

ML Feature

Identity​

URN Structure Examples​

Important Capabilities​

Feature Properties​

Description and Documentation​

Data Type​

Source Lineage​

Versioning​

Custom Properties​

Tags and Glossary Terms​

Ownership​

Domains and Organization​

Code Examples​

Creating a Complete ML Feature​

Linking Features to Feature Tables​

Querying ML Features​

Batch Feature Creation​

Integration Points​

Relationships with Datasets​

Relationships with ML Models​

Relationships with ML Feature Tables​

Platform Integration​

GraphQL Resolvers​

Notable Exceptions​

Feature Namespace vs Feature Table​

Feature Identity and Feature Tables​

Versioning Strategies​

Composite Features and Feature Engineering​

Feature Drift and Monitoring​

Search and Discovery​

Technical Reference Guide​

Reading the Field Tables​

Aspects​

mlFeatureKey​

mlFeatureProperties​

ownership​

institutionalMemory​

status​

deprecation​

browsePaths​

globalTags​

dataPlatformInstance​

browsePathsV2​

glossaryTerms​

editableMlFeatureProperties​

domains​

applications​

structuredProperties​

forms​

testResults​

subTypes​

Common Types​

AuditStamp​

FormAssociation​

TestResult​

Relationships​

Outgoing​

Incoming​

Global Metadata Model​

Identity

URN Structure Examples

Important Capabilities

Feature Properties

Description and Documentation

Data Type

Source Lineage

Versioning

Custom Properties

Tags and Glossary Terms

Ownership

Domains and Organization

Code Examples

Creating a Complete ML Feature

Linking Features to Feature Tables

Querying ML Features

Batch Feature Creation

Integration Points

Relationships with Datasets

Relationships with ML Models

Relationships with ML Feature Tables

Platform Integration

GraphQL Resolvers

Notable Exceptions

Feature Namespace vs Feature Table

Feature Identity and Feature Tables

Versioning Strategies

Composite Features and Feature Engineering

Feature Drift and Monitoring

Search and Discovery

Technical Reference Guide

Reading the Field Tables

Aspects

mlFeatureKey

mlFeatureProperties

ownership

institutionalMemory

status

deprecation

browsePaths

globalTags

dataPlatformInstance

browsePathsV2

glossaryTerms

editableMlFeatureProperties

domains

applications

structuredProperties

forms

testResults

subTypes

Common Types

AuditStamp

FormAssociation

TestResult

Relationships

Outgoing

Incoming

Global Metadata Model