ML Feature Table
The ML Feature Table entity represents a collection of related machine learning features organized together in a feature store. Feature tables are fundamental building blocks in the ML feature management ecosystem, grouping features that share common characteristics such as the same primary keys, update cadence, or data source. They bridge the gap between raw data in data warehouses and the features consumed by ML models during training and inference.
Identity
ML Feature Tables are identified by two pieces of information:
- The platform that hosts the feature table: this is the specific feature store or ML platform technology. Examples include
feast,tecton,sagemaker, etc. See dataplatform for more details. - The name of the feature table: a unique identifier within the specific platform that represents this collection of features.
An example of an ML Feature Table identifier is urn:li:mlFeatureTable:(urn:li:dataPlatform:feast,users_feature_table).
The identity is defined by the mlFeatureTableKey aspect, which contains:
platform: A URN reference to the data platform hosting the feature tablename: The unique name of the feature table within that platform
Important Capabilities
Feature Table Properties
ML Feature Tables support comprehensive metadata through the mlFeatureTableProperties aspect. This aspect captures the essential characteristics of the feature table:
Description and Documentation
Feature tables can have detailed descriptions explaining their purpose, the type of features they contain, and when they should be used. This documentation helps data scientists and ML engineers discover and understand feature tables in their organization.
Python SDK: Create an ML Feature Table with properties
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_create_with_properties.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})
feature_table_urn = builder.make_ml_feature_table_urn(
feature_table_name="customer_features", platform="feast"
)
feature_table_properties = models.MLFeatureTablePropertiesClass(
description="Customer demographic and behavioral features for churn prediction models. "
"Updated daily from the customer data warehouse.",
customProperties={
"update_frequency": "daily",
"feature_count": "25",
"team": "customer-analytics",
"sla_hours": "24",
},
)
metadata_change_proposal = MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=feature_table_properties,
)
emitter.emit(metadata_change_proposal)
Features
The most important property of a feature table is the collection of features it contains. Feature tables maintain explicit relationships to their constituent features through the mlFeatures property. This creates a "Contains" relationship between the feature table and each individual feature, enabling:
- Discovery of all features in a table
- Navigation from feature table to individual features
- Understanding of feature organization and grouping
Python SDK: Add features to a feature table
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_add_features.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})
feature_table_urn = builder.make_ml_feature_table_urn(
feature_table_name="customer_features", platform="feast"
)
new_feature_urns = [
builder.make_ml_feature_urn(
feature_name="customer_lifetime_value",
feature_table_name="customer_features",
),
builder.make_ml_feature_urn(
feature_name="days_since_last_purchase",
feature_table_name="customer_features",
),
builder.make_ml_feature_urn(
feature_name="total_purchase_count",
feature_table_name="customer_features",
),
]
# Read existing features to avoid overwriting them
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))
feature_table_properties = graph.get_aspect(
entity_urn=feature_table_urn,
aspect_type=models.MLFeatureTablePropertiesClass,
)
if feature_table_properties and feature_table_properties.mlFeatures:
existing_features = feature_table_properties.mlFeatures
all_feature_urns = list(set(existing_features + new_feature_urns))
else:
all_feature_urns = new_feature_urns
updated_properties = models.MLFeatureTablePropertiesClass(
mlFeatures=all_feature_urns,
description="Customer features with newly added purchase metrics",
)
metadata_change_proposal = MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=updated_properties,
)
emitter.emit(metadata_change_proposal)
Primary Keys
Feature tables define one or more primary keys that uniquely identify each row in the table. These primary keys are critical for:
- Joining features with training datasets
- Looking up feature values during model inference
- Understanding the entity granularity of the features (e.g., user-level, transaction-level)
When multiple primary keys are specified, they act as a composite key. The mlPrimaryKeys property creates a "KeyedBy" relationship to each primary key entity.
Python SDK: Add primary keys to a feature table
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_add_primary_keys.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})
feature_table_urn = builder.make_ml_feature_table_urn(
feature_table_name="customer_features", platform="feast"
)
primary_key_urns = [
builder.make_ml_primary_key_urn(
feature_table_name="customer_features",
primary_key_name="customer_id",
)
]
# Read existing properties to preserve other fields
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))
feature_table_properties = graph.get_aspect(
entity_urn=feature_table_urn,
aspect_type=models.MLFeatureTablePropertiesClass,
)
if feature_table_properties:
feature_table_properties.mlPrimaryKeys = primary_key_urns
updated_properties = feature_table_properties
else:
updated_properties = models.MLFeatureTablePropertiesClass(
mlPrimaryKeys=primary_key_urns,
)
metadata_change_proposal = MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=updated_properties,
)
emitter.emit(metadata_change_proposal)
# Also create the primary key entity with its properties
dataset_urn = builder.make_dataset_urn(
name="customers", platform="snowflake", env="PROD"
)
primary_key_urn = primary_key_urns[0]
primary_key_properties = models.MLPrimaryKeyPropertiesClass(
description="Unique identifier for customers in the system",
dataType="TEXT",
sources=[dataset_urn],
)
pk_metadata_change_proposal = MetadataChangeProposalWrapper(
entityUrn=primary_key_urn,
aspect=primary_key_properties,
)
emitter.emit(pk_metadata_change_proposal)
Custom Properties
Feature tables support custom properties through the customProperties field, allowing you to capture platform-specific or organization-specific metadata that doesn't fit into the standard schema. This might include information like:
- Update frequency or freshness SLAs
- Feature store configuration settings
- Cost or resource usage information
- Team or project ownership details
Primary Key Properties
While primary keys are referenced from feature tables, they are separate entities with their own properties defined in the mlPrimaryKeyProperties aspect. Understanding primary key metadata is essential for proper feature table usage:
Data Type
Primary keys have a data type (defined using MLFeatureDataType) that specifies the type of values:
ORDINAL: Integer valuesNOMINAL: Categorical valuesBINARY: Boolean valuesCOUNT: Count valuesTIME: Timestamp valuesTEXT: String values- Other numeric types like
CONTINUOUS,INTERVAL
Source Lineage
Primary keys can declare their source datasets through the sources property. This creates lineage relationships showing which upstream datasets the primary key values are derived from. This is crucial for understanding data provenance and impact analysis.
Versioning
Primary keys support versioning through the version property, allowing teams to track changes to key definitions over time and maintain multiple versions in parallel.
Tags and Glossary Terms
Like other DataHub entities, ML Feature Tables support tags and glossary terms for classification and discovery:
- Tags (via
globalTagsaspect) provide lightweight categorization - Glossary Terms (via
glossaryTermsaspect) link to business definitions and concepts
Read this blog to understand when to use tags vs terms.
Ownership
Ownership is associated with feature tables using the ownership aspect. Owners can be individuals or teams responsible for maintaining the feature table. Clear ownership is essential for:
- Knowing who to contact with questions about features
- Understanding responsibility for feature quality and updates
- Governance and access control decisions
Domains and Organization
Feature tables can be organized into domains (via the domains aspect) to represent organizational structure or functional areas. This helps teams manage large feature catalogs by grouping related feature tables together.
Code Examples
Creating a Complete ML Feature Table
Here's a comprehensive example that creates a feature table with all core aspects:
Python SDK: Create a complete ML Feature Table
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_create_complete.py
import datahub.emitter.mce_builder as builder
import datahub.metadata.schema_classes as models
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
gms_endpoint = "http://localhost:8080"
emitter = DatahubRestEmitter(gms_server=gms_endpoint, extra_headers={})
# Step 1: Create the source dataset for lineage
dataset_urn = builder.make_dataset_urn(
name="customer_transactions", platform="snowflake", env="PROD"
)
# Step 2: Create the primary key entity
primary_key_urn = builder.make_ml_primary_key_urn(
feature_table_name="transaction_features",
primary_key_name="transaction_id",
)
primary_key_properties = models.MLPrimaryKeyPropertiesClass(
description="Unique identifier for each transaction",
dataType="TEXT",
sources=[dataset_urn],
)
emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=primary_key_urn,
aspect=primary_key_properties,
)
)
# Step 3: Create the feature entities
feature_1_urn = builder.make_ml_feature_urn(
feature_name="transaction_amount",
feature_table_name="transaction_features",
)
emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_1_urn,
aspect=models.MLFeaturePropertiesClass(
description="Total amount of the transaction in USD",
dataType="CONTINUOUS",
sources=[dataset_urn],
),
)
)
feature_2_urn = builder.make_ml_feature_urn(
feature_name="is_fraud",
feature_table_name="transaction_features",
)
emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_2_urn,
aspect=models.MLFeaturePropertiesClass(
description="Binary indicator of fraudulent transaction",
dataType="BINARY",
sources=[dataset_urn],
),
)
)
# Step 4: Create the feature table with all properties
feature_table_urn = builder.make_ml_feature_table_urn(
feature_table_name="transaction_features", platform="feast"
)
feature_table_properties = models.MLFeatureTablePropertiesClass(
description="Real-time transaction features for fraud detection models",
mlFeatures=[feature_1_urn, feature_2_urn],
mlPrimaryKeys=[primary_key_urn],
customProperties={
"update_frequency": "real-time",
"team": "fraud-detection",
"critical": "true",
},
)
emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=feature_table_properties,
)
)
# Step 5: Add tags for categorization
emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=models.GlobalTagsClass(
tags=[
models.TagAssociationClass(tag=builder.make_tag_urn("Fraud Detection")),
models.TagAssociationClass(
tag=builder.make_tag_urn("Real-time Features")
),
]
),
)
)
# Step 6: Add ownership
emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=feature_table_urn,
aspect=models.OwnershipClass(
owners=[
models.OwnerClass(
owner=builder.make_user_urn("data_science_team"),
type=models.OwnershipTypeClass.DATAOWNER,
)
]
),
)
)
print(f"Successfully created feature table: {feature_table_urn}")
Querying ML Feature Tables
You can retrieve ML Feature Table metadata using both the Python SDK and REST API:
Python SDK: Read an ML Feature Table
# Inlined from /metadata-ingestion/examples/library/mlfeature_table_read.py
from datahub.sdk import DataHubClient, MLFeatureTableUrn
client = DataHubClient.from_env()
# Or get this from the UI (share -> copy urn) and use MLFeatureTableUrn.from_string(...)
mlfeature_table_urn = MLFeatureTableUrn(
"feast", "test_feature_table_all_feature_dtypes"
)
mlfeature_table_entity = client.entities.get(mlfeature_table_urn)
print("MLFeature Table name:", mlfeature_table_entity.name)
print("MLFeature Table platform:", mlfeature_table_entity.platform)
print("MLFeature Table description:", mlfeature_table_entity.description)
REST API: Fetch ML Feature Table metadata
# Get the complete entity with all aspects
curl 'http://localhost:8080/entities/urn%3Ali%3AmlFeatureTable%3A(urn%3Ali%3AdataPlatform%3Afeast,users_feature_table)'
# Get relationships to see features and primary keys
curl 'http://localhost:8080/relationships?direction=OUTGOING&urn=urn%3Ali%3AmlFeatureTable%3A(urn%3Ali%3AdataPlatform%3Afeast,users_feature_table)&types=Contains,KeyedBy'
Integration Points
ML Feature Tables integrate with multiple other entities in DataHub's metadata model:
Relationships with ML Features
Feature tables contain ML Features through the "Contains" relationship. Each feature in the mlFeatures array represents an individual feature that can be:
- Used independently by ML models
- Have its own metadata, lineage, and documentation
- Shared across multiple feature tables in some feature store implementations
Navigation works bidirectionally - from feature table to features, and from features back to their parent tables.
Relationships with ML Primary Keys
Feature tables reference ML Primary Keys through the "KeyedBy" relationship. Primary keys:
- Define the entity granularity of the feature table
- Enable joining features with entity identifiers in training datasets
- Can be shared across multiple feature tables when they represent the same entity type
- Have their own lineage to upstream datasets through the
sourcesproperty
Relationships with ML Models
While not directly referenced in feature table metadata, ML Models consume features through the mlFeatures property in MLModelProperties. This creates a "Consumes" lineage relationship showing which models use features from a particular feature table. This lineage enables:
- Understanding downstream impact when feature tables change
- Discovering which models depend on specific feature tables
- Tracking feature usage and adoption across models
Relationships with Datasets
Feature tables have indirect relationships to datasets through two paths:
- Via ML Features: Individual features can declare source datasets through their
sourcesproperty, creating "DerivedFrom" lineage - Via ML Primary Keys: Primary keys can declare source datasets, showing where entity identifiers originate
This lineage connects the feature store to upstream data warehouses, enabling end-to-end data lineage from raw data to model predictions.
Platform Integration
Feature tables are associated with a specific data platform (e.g., Feast, Tecton) through the platform property in the key aspect. This creates a "SourcePlatform" relationship that:
- Identifies which feature store system hosts the feature table
- Enables filtering and organization by platform
- Supports multi-platform feature store environments
Notable Exceptions
Feature Store Platform Variations
Different feature store platforms have different capabilities and concepts:
- Feast: Uses the term "feature table" directly. Feature tables in Feast correspond 1:1 with this entity.
- Tecton: Uses "feature views" and "feature services" as similar concepts. These can be modeled as feature tables.
- SageMaker Feature Store: Uses "feature groups" which map to feature tables.
- Databricks Feature Store: Uses "feature tables" but with database.schema.table naming patterns.
When ingesting from these platforms, ensure the naming conventions match the platform's terminology for consistency.
Custom Properties Usage
Unlike datasets which have both datasetProperties and editableDatasetProperties, feature tables have:
mlFeatureTableProperties: The main properties aspect (usually from ingestion)editableMlFeatureTableProperties: UI-editable description only
For custom metadata, use the customProperties map in mlFeatureTableProperties rather than creating custom aspects.
Entity References vs. Entity Creation
When using the SDK to create feature tables:
- You must create the referenced entities first: Create individual ML Features and ML Primary Keys before referencing them in the feature table
- The feature table only stores URN references - it doesn't create the feature or primary key entities
- If you reference non-existent entities, they will appear as broken references in the UI
This is different from some other DataHub entities where child entities can be created inline.
Lineage Considerations
Feature table lineage is typically established through the features and primary keys it contains:
- Feature tables themselves don't have direct
upstreamLineageaspects - Instead, lineage flows through the contained features'
sourcesproperties - When querying lineage, you'll need to traverse through the "Contains" relationships to find upstream datasets
This design reflects that features are the atomic unit of lineage in ML systems, while feature tables are organizational constructs.
Technical Reference Guide
The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.
Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.
Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.
Reading the Field Tables
Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:
- ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
- Searchable: This field is indexed and can be searched in DataHub's search interface
- Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example,
dashboardToolis indexed astool - → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g.,
→ Contains,→ OwnedBy)
Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.
Aspects
mlFeatureTableKey
Key for an MLFeatureTable
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| platform | string | ✓ | Data platform urn associated with the feature table | → SourcePlatform |
| name | string | ✓ | Name of the feature table | Searchable |
{
"type": "record",
"Aspect": {
"name": "mlFeatureTableKey"
},
"name": "MLFeatureTableKey",
"namespace": "com.linkedin.metadata.key",
"fields": [
{
"Relationship": {
"entityTypes": [
"dataPlatform"
],
"name": "SourcePlatform"
},
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "platform",
"doc": "Data platform urn associated with the feature table"
},
{
"Searchable": {
"boostScore": 8.0,
"enableAutocomplete": true,
"fieldNameAliases": [
"_entityName"
],
"fieldType": "WORD_GRAM"
},
"type": "string",
"name": "name",
"doc": "Name of the feature table"
}
],
"doc": "Key for an MLFeatureTable"
}
mlFeatureTableProperties
Properties associated with a MLFeatureTable
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| customProperties | map | ✓ | Custom property bag. | Searchable |
| description | string | Documentation of the MLFeatureTable | Searchable | |
| mlFeatures | string[] | List of features contained in the feature table | Searchable, → Contains | |
| mlPrimaryKeys | string[] | List of primary keys in the feature table (if multiple, assumed to act as a composite key) | Searchable, → KeyedBy |
{
"type": "record",
"Aspect": {
"name": "mlFeatureTableProperties"
},
"name": "MLFeatureTableProperties",
"namespace": "com.linkedin.ml.metadata",
"fields": [
{
"Searchable": {
"/*": {
"fieldType": "TEXT",
"queryByDefault": true
}
},
"type": {
"type": "map",
"values": "string"
},
"name": "customProperties",
"default": {},
"doc": "Custom property bag."
},
{
"Searchable": {
"fieldType": "TEXT",
"hasValuesFieldName": "hasDescription"
},
"type": [
"null",
"string"
],
"name": "description",
"default": null,
"doc": "Documentation of the MLFeatureTable"
},
{
"Relationship": {
"/*": {
"entityTypes": [
"mlFeature"
],
"name": "Contains"
}
},
"Searchable": {
"/*": {
"fieldName": "features",
"fieldType": "URN"
}
},
"type": [
"null",
{
"type": "array",
"items": "string"
}
],
"name": "mlFeatures",
"default": null,
"doc": "List of features contained in the feature table"
},
{
"Relationship": {
"/*": {
"entityTypes": [
"mlPrimaryKey"
],
"name": "KeyedBy"
}
},
"Searchable": {
"/*": {
"fieldName": "primaryKeys",
"fieldType": "URN"
}
},
"type": [
"null",
{
"type": "array",
"items": "string"
}
],
"name": "mlPrimaryKeys",
"default": null,
"doc": "List of primary keys in the feature table (if multiple, assumed to act as a composite key)"
}
],
"doc": "Properties associated with a MLFeatureTable"
}
ownership
Ownership information of an entity.
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| owners | Owner[] | ✓ | List of owners of the entity. | |
| ownerTypes | map | Ownership type to Owners map, populated via mutation hook. | Searchable | |
| lastModified | AuditStamp | ✓ | Audit stamp containing who last modified the record and when. A value of 0 in the time field indi... |
{
"type": "record",
"Aspect": {
"name": "ownership"
},
"name": "Ownership",
"namespace": "com.linkedin.common",
"fields": [
{
"type": {
"type": "array",
"items": {
"type": "record",
"name": "Owner",
"namespace": "com.linkedin.common",
"fields": [
{
"Relationship": {
"entityTypes": [
"corpuser",
"corpGroup"
],
"name": "OwnedBy"
},
"Searchable": {
"addToFilters": true,
"fieldName": "owners",
"fieldType": "URN",
"filterNameOverride": "Owned By",
"hasValuesFieldName": "hasOwners",
"queryByDefault": false
},
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "owner",
"doc": "Owner URN, e.g. urn:li:corpuser:ldap, urn:li:corpGroup:group_name, and urn:li:multiProduct:mp_name\n(Caveat: only corpuser is currently supported in the frontend.)"
},
{
"deprecated": true,
"type": {
"type": "enum",
"symbolDocs": {
"BUSINESS_OWNER": "A person or group who is responsible for logical, or business related, aspects of the asset.",
"CONSUMER": "A person, group, or service that consumes the data\nDeprecated! Use TECHNICAL_OWNER or BUSINESS_OWNER instead.",
"CUSTOM": "Set when ownership type is unknown or a when new one is specified as an ownership type entity for which we have no\nenum value for. This is used for backwards compatibility",
"DATAOWNER": "A person or group that is owning the data\nDeprecated! Use TECHNICAL_OWNER instead.",
"DATA_STEWARD": "A steward, expert, or delegate responsible for the asset.",
"DELEGATE": "A person or a group that overseas the operation, e.g. a DBA or SRE.\nDeprecated! Use TECHNICAL_OWNER instead.",
"DEVELOPER": "A person or group that is in charge of developing the code\nDeprecated! Use TECHNICAL_OWNER instead.",
"NONE": "No specific type associated to the owner.",
"PRODUCER": "A person, group, or service that produces/generates the data\nDeprecated! Use TECHNICAL_OWNER instead.",
"STAKEHOLDER": "A person or a group that has direct business interest\nDeprecated! Use TECHNICAL_OWNER, BUSINESS_OWNER, or STEWARD instead.",
"TECHNICAL_OWNER": "person or group who is responsible for technical aspects of the asset."
},
"deprecatedSymbols": {
"CONSUMER": true,
"DATAOWNER": true,
"DELEGATE": true,
"DEVELOPER": true,
"PRODUCER": true,
"STAKEHOLDER": true
},
"name": "OwnershipType",
"namespace": "com.linkedin.common",
"symbols": [
"CUSTOM",
"TECHNICAL_OWNER",
"BUSINESS_OWNER",
"DATA_STEWARD",
"NONE",
"DEVELOPER",
"DATAOWNER",
"DELEGATE",
"PRODUCER",
"CONSUMER",
"STAKEHOLDER"
],
"doc": "Asset owner types"
},
"name": "type",
"doc": "The type of the ownership"
},
{
"Relationship": {
"entityTypes": [
"ownershipType"
],
"name": "ownershipType"
},
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "typeUrn",
"default": null,
"doc": "The type of the ownership\nUrn of type O"
},
{
"type": [
"null",
{
"type": "record",
"name": "OwnershipSource",
"namespace": "com.linkedin.common",
"fields": [
{
"type": {
"type": "enum",
"symbolDocs": {
"AUDIT": "Auditing system or audit logs",
"DATABASE": "Database, e.g. GRANTS table",
"FILE_SYSTEM": "File system, e.g. file/directory owner",
"ISSUE_TRACKING_SYSTEM": "Issue tracking system, e.g. Jira",
"MANUAL": "Manually provided by a user",
"OTHER": "Other sources",
"SERVICE": "Other ownership-like service, e.g. Nuage, ACL service etc",
"SOURCE_CONTROL": "SCM system, e.g. GIT, SVN"
},
"name": "OwnershipSourceType",
"namespace": "com.linkedin.common",
"symbols": [
"AUDIT",
"DATABASE",
"FILE_SYSTEM",
"ISSUE_TRACKING_SYSTEM",
"MANUAL",
"SERVICE",
"SOURCE_CONTROL",
"OTHER"
]
},
"name": "type",
"doc": "The type of the source"
},
{
"type": [
"null",
"string"
],
"name": "url",
"default": null,
"doc": "A reference URL for the source"
}
],
"doc": "Source/provider of the ownership information"
}
],
"name": "source",
"default": null,
"doc": "Source information for the ownership"
},
{
"Searchable": {
"/actor": {
"fieldName": "ownerAttributionActors",
"fieldType": "URN",
"queryByDefault": false
},
"/source": {
"fieldName": "ownerAttributionSources",
"fieldType": "URN",
"queryByDefault": false
},
"/time": {
"fieldName": "ownerAttributionDates",
"fieldType": "DATETIME",
"queryByDefault": false
}
},
"type": [
"null",
{
"type": "record",
"name": "MetadataAttribution",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When this metadata was updated."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) responsible for applying the assocated metadata. This can\neither be a user (in case of UI edits) or the datahub system for automation."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "source",
"default": null,
"doc": "The DataHub source responsible for applying the associated metadata. This will only be filled out\nwhen a DataHub source is responsible. This includes the specific metadata test urn, the automation urn."
},
{
"type": {
"type": "map",
"values": "string"
},
"name": "sourceDetail",
"default": {},
"doc": "The details associated with why this metadata was applied. For example, this could include\nthe actual regex rule, sql statement, ingestion pipeline ID, etc."
}
],
"doc": "Information about who, why, and how this metadata was applied"
}
],
"name": "attribution",
"default": null,
"doc": "Information about who, why, and how this metadata was applied"
}
],
"doc": "Ownership information"
}
},
"name": "owners",
"doc": "List of owners of the entity."
},
{
"Searchable": {
"/*": {
"fieldType": "MAP_ARRAY",
"queryByDefault": false
}
},
"type": [
{
"type": "map",
"values": {
"type": "array",
"items": "string"
}
},
"null"
],
"name": "ownerTypes",
"default": {},
"doc": "Ownership type to Owners map, populated via mutation hook."
},
{
"type": {
"type": "record",
"name": "AuditStamp",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "impersonator",
"default": null,
"doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
},
{
"type": [
"null",
"string"
],
"name": "message",
"default": null,
"doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
}
],
"doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
},
"name": "lastModified",
"default": {
"actor": "urn:li:corpuser:unknown",
"impersonator": null,
"time": 0,
"message": null
},
"doc": "Audit stamp containing who last modified the record and when. A value of 0 in the time field indicates missing data."
}
],
"doc": "Ownership information of an entity."
}
institutionalMemory
Institutional memory of an entity. This is a way to link to relevant documentation and provide description of the documentation. Institutional or tribal knowledge is very important for users to leverage the entity.
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| elements | InstitutionalMemoryMetadata[] | ✓ | List of records that represent institutional memory of an entity. Each record consists of a link,... |
{
"type": "record",
"Aspect": {
"name": "institutionalMemory"
},
"name": "InstitutionalMemory",
"namespace": "com.linkedin.common",
"fields": [
{
"type": {
"type": "array",
"items": {
"type": "record",
"name": "InstitutionalMemoryMetadata",
"namespace": "com.linkedin.common",
"fields": [
{
"java": {
"class": "com.linkedin.common.url.Url",
"coercerClass": "com.linkedin.common.url.UrlCoercer"
},
"type": "string",
"name": "url",
"doc": "Link to an engineering design document or a wiki page."
},
{
"type": "string",
"name": "description",
"doc": "Description of the link."
},
{
"type": {
"type": "record",
"name": "AuditStamp",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "impersonator",
"default": null,
"doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
},
{
"type": [
"null",
"string"
],
"name": "message",
"default": null,
"doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
}
],
"doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
},
"name": "createStamp",
"doc": "Audit stamp associated with creation of this record"
},
{
"type": [
"null",
"com.linkedin.common.AuditStamp"
],
"name": "updateStamp",
"default": null,
"doc": "Audit stamp associated with updation of this record"
},
{
"type": [
"null",
{
"type": "record",
"name": "InstitutionalMemoryMetadataSettings",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "boolean",
"name": "showInAssetPreview",
"default": false,
"doc": "Show record in asset preview like on entity header and search previews"
}
],
"doc": "Settings related to a record of InstitutionalMemoryMetadata"
}
],
"name": "settings",
"default": null,
"doc": "Settings for this record"
}
],
"doc": "Metadata corresponding to a record of institutional memory."
}
},
"name": "elements",
"doc": "List of records that represent institutional memory of an entity. Each record consists of a link, description, creator and timestamps associated with that record."
}
],
"doc": "Institutional memory of an entity. This is a way to link to relevant documentation and provide description of the documentation. Institutional or tribal knowledge is very important for users to leverage the entity."
}
status
The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc. This aspect is used to represent soft deletes conventionally.
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| removed | boolean | ✓ | Whether the entity has been removed (soft-deleted). | Searchable |
{
"type": "record",
"Aspect": {
"name": "status"
},
"name": "Status",
"namespace": "com.linkedin.common",
"fields": [
{
"Searchable": {
"fieldType": "BOOLEAN"
},
"type": "boolean",
"name": "removed",
"default": false,
"doc": "Whether the entity has been removed (soft-deleted)."
}
],
"doc": "The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc.\nThis aspect is used to represent soft deletes conventionally."
}
deprecation
Deprecation status of an entity
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| deprecated | boolean | ✓ | Whether the entity is deprecated. | Searchable |
| decommissionTime | long | The time user plan to decommission this entity. | ||
| note | string | ✓ | Additional information about the entity deprecation plan, such as the wiki, doc, RB. | |
| actor | string | ✓ | The user URN which will be credited for modifying this deprecation content. | |
| replacement | string |
{
"type": "record",
"Aspect": {
"name": "deprecation"
},
"name": "Deprecation",
"namespace": "com.linkedin.common",
"fields": [
{
"Searchable": {
"addToFilters": true,
"fieldType": "BOOLEAN",
"filterNameOverride": "Deprecated",
"weightsPerFieldValue": {
"true": 0.5
}
},
"type": "boolean",
"name": "deprecated",
"doc": "Whether the entity is deprecated."
},
{
"type": [
"null",
"long"
],
"name": "decommissionTime",
"default": null,
"doc": "The time user plan to decommission this entity."
},
{
"type": "string",
"name": "note",
"doc": "Additional information about the entity deprecation plan, such as the wiki, doc, RB."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The user URN which will be credited for modifying this deprecation content."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "replacement",
"default": null
}
],
"doc": "Deprecation status of an entity"
}
browsePaths
Shared aspect containing Browse Paths to be indexed for an entity.
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| paths | string[] | ✓ | A list of valid browse paths for the entity. Browse paths are expected to be forward slash-separ... | Searchable |
{
"type": "record",
"Aspect": {
"name": "browsePaths"
},
"name": "BrowsePaths",
"namespace": "com.linkedin.common",
"fields": [
{
"Searchable": {
"/*": {
"fieldName": "browsePaths",
"fieldType": "BROWSE_PATH"
}
},
"type": {
"type": "array",
"items": "string"
},
"name": "paths",
"doc": "A list of valid browse paths for the entity.\n\nBrowse paths are expected to be forward slash-separated strings. For example: 'prod/snowflake/datasetName'"
}
],
"doc": "Shared aspect containing Browse Paths to be indexed for an entity."
}
globalTags
Tag aspect used for applying tags to an entity
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| tags | TagAssociation[] | ✓ | Tags associated with a given entity | Searchable, → TaggedWith |
{
"type": "record",
"Aspect": {
"name": "globalTags"
},
"name": "GlobalTags",
"namespace": "com.linkedin.common",
"fields": [
{
"Relationship": {
"/*/tag": {
"entityTypes": [
"tag"
],
"name": "TaggedWith"
}
},
"Searchable": {
"/*/tag": {
"addToFilters": true,
"boostScore": 0.5,
"fieldName": "tags",
"fieldType": "URN",
"filterNameOverride": "Tag",
"hasValuesFieldName": "hasTags",
"queryByDefault": true
}
},
"type": {
"type": "array",
"items": {
"type": "record",
"name": "TagAssociation",
"namespace": "com.linkedin.common",
"fields": [
{
"java": {
"class": "com.linkedin.common.urn.TagUrn"
},
"type": "string",
"name": "tag",
"doc": "Urn of the applied tag"
},
{
"type": [
"null",
"string"
],
"name": "context",
"default": null,
"doc": "Additional context about the association"
},
{
"Searchable": {
"/actor": {
"fieldName": "tagAttributionActors",
"fieldType": "URN",
"queryByDefault": false
},
"/source": {
"fieldName": "tagAttributionSources",
"fieldType": "URN",
"queryByDefault": false
},
"/time": {
"fieldName": "tagAttributionDates",
"fieldType": "DATETIME",
"queryByDefault": false
}
},
"type": [
"null",
{
"type": "record",
"name": "MetadataAttribution",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When this metadata was updated."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) responsible for applying the assocated metadata. This can\neither be a user (in case of UI edits) or the datahub system for automation."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "source",
"default": null,
"doc": "The DataHub source responsible for applying the associated metadata. This will only be filled out\nwhen a DataHub source is responsible. This includes the specific metadata test urn, the automation urn."
},
{
"type": {
"type": "map",
"values": "string"
},
"name": "sourceDetail",
"default": {},
"doc": "The details associated with why this metadata was applied. For example, this could include\nthe actual regex rule, sql statement, ingestion pipeline ID, etc."
}
],
"doc": "Information about who, why, and how this metadata was applied"
}
],
"name": "attribution",
"default": null,
"doc": "Information about who, why, and how this metadata was applied"
}
],
"doc": "Properties of an applied tag. For now, just an Urn. In the future we can extend this with other properties, e.g.\npropagation parameters."
}
},
"name": "tags",
"doc": "Tags associated with a given entity"
}
],
"doc": "Tag aspect used for applying tags to an entity"
}
dataPlatformInstance
The specific instance of the data platform that this entity belongs to
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| platform | string | ✓ | Data Platform | Searchable |
| instance | string | Instance of the data platform (e.g. db instance) | Searchable (platformInstance) |
{
"type": "record",
"Aspect": {
"name": "dataPlatformInstance"
},
"name": "DataPlatformInstance",
"namespace": "com.linkedin.common",
"fields": [
{
"Searchable": {
"addToFilters": true,
"fieldType": "URN",
"filterNameOverride": "Platform"
},
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "platform",
"doc": "Data Platform"
},
{
"Searchable": {
"addToFilters": true,
"fieldName": "platformInstance",
"fieldType": "URN",
"filterNameOverride": "Platform Instance"
},
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "instance",
"default": null,
"doc": "Instance of the data platform (e.g. db instance)"
}
],
"doc": "The specific instance of the data platform that this entity belongs to"
}
browsePathsV2
Shared aspect containing a Browse Path to be indexed for an entity.
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| path | BrowsePathEntry[] | ✓ | A valid browse path for the entity. This field is provided by DataHub by default. This aspect is ... | Searchable |
{
"type": "record",
"Aspect": {
"name": "browsePathsV2"
},
"name": "BrowsePathsV2",
"namespace": "com.linkedin.common",
"fields": [
{
"Searchable": {
"/*/id": {
"fieldName": "browsePathV2",
"fieldType": "BROWSE_PATH_V2"
}
},
"type": {
"type": "array",
"items": {
"type": "record",
"name": "BrowsePathEntry",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "string",
"name": "id",
"doc": "The ID of the browse path entry. This is what gets stored in the index.\nIf there's an urn associated with this entry, id and urn will be the same"
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "urn",
"default": null,
"doc": "Optional urn pointing to some entity in DataHub"
}
],
"doc": "Represents a single level in an entity's browsePathV2"
}
},
"name": "path",
"doc": "A valid browse path for the entity. This field is provided by DataHub by default.\nThis aspect is a newer version of browsePaths where we can encode more information in the path.\nThis path is also based on containers for a given entity if it has containers.\n\nThis is stored in elasticsearch as unit-separator delimited strings and only includes platform specific folders or containers.\nThese paths should not include high level info captured elsewhere ie. Platform and Environment."
}
],
"doc": "Shared aspect containing a Browse Path to be indexed for an entity."
}
glossaryTerms
Related business terms information
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| terms | GlossaryTermAssociation[] | ✓ | The related business terms | |
| auditStamp | AuditStamp | ✓ | Audit stamp containing who reported the related business term |
{
"type": "record",
"Aspect": {
"name": "glossaryTerms"
},
"name": "GlossaryTerms",
"namespace": "com.linkedin.common",
"fields": [
{
"type": {
"type": "array",
"items": {
"type": "record",
"name": "GlossaryTermAssociation",
"namespace": "com.linkedin.common",
"fields": [
{
"Relationship": {
"entityTypes": [
"glossaryTerm"
],
"name": "TermedWith"
},
"Searchable": {
"addToFilters": true,
"fieldName": "glossaryTerms",
"fieldType": "URN",
"filterNameOverride": "Glossary Term",
"hasValuesFieldName": "hasGlossaryTerms",
"includeSystemModifiedAt": true,
"systemModifiedAtFieldName": "termsModifiedAt"
},
"java": {
"class": "com.linkedin.common.urn.GlossaryTermUrn"
},
"type": "string",
"name": "urn",
"doc": "Urn of the applied glossary term"
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "actor",
"default": null,
"doc": "The user URN which will be credited for adding associating this term to the entity"
},
{
"type": [
"null",
"string"
],
"name": "context",
"default": null,
"doc": "Additional context about the association"
},
{
"Searchable": {
"/actor": {
"fieldName": "termAttributionActors",
"fieldType": "URN",
"queryByDefault": false
},
"/source": {
"fieldName": "termAttributionSources",
"fieldType": "URN",
"queryByDefault": false
},
"/time": {
"fieldName": "termAttributionDates",
"fieldType": "DATETIME",
"queryByDefault": false
}
},
"type": [
"null",
{
"type": "record",
"name": "MetadataAttribution",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When this metadata was updated."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) responsible for applying the assocated metadata. This can\neither be a user (in case of UI edits) or the datahub system for automation."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "source",
"default": null,
"doc": "The DataHub source responsible for applying the associated metadata. This will only be filled out\nwhen a DataHub source is responsible. This includes the specific metadata test urn, the automation urn."
},
{
"type": {
"type": "map",
"values": "string"
},
"name": "sourceDetail",
"default": {},
"doc": "The details associated with why this metadata was applied. For example, this could include\nthe actual regex rule, sql statement, ingestion pipeline ID, etc."
}
],
"doc": "Information about who, why, and how this metadata was applied"
}
],
"name": "attribution",
"default": null,
"doc": "Information about who, why, and how this metadata was applied"
}
],
"doc": "Properties of an applied glossary term."
}
},
"name": "terms",
"doc": "The related business terms"
},
{
"type": {
"type": "record",
"name": "AuditStamp",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "impersonator",
"default": null,
"doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
},
{
"type": [
"null",
"string"
],
"name": "message",
"default": null,
"doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
}
],
"doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
},
"name": "auditStamp",
"doc": "Audit stamp containing who reported the related business term"
}
],
"doc": "Related business terms information"
}
editableMlFeatureTableProperties
Properties associated with a MLFeatureTable editable from the ui
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| description | string | Documentation of the MLFeatureTable | Searchable (editedDescription) |
{
"type": "record",
"Aspect": {
"name": "editableMlFeatureTableProperties"
},
"name": "EditableMLFeatureTableProperties",
"namespace": "com.linkedin.ml.metadata",
"fields": [
{
"Searchable": {
"fieldName": "editedDescription",
"fieldType": "TEXT"
},
"type": [
"null",
"string"
],
"name": "description",
"default": null,
"doc": "Documentation of the MLFeatureTable"
}
],
"doc": "Properties associated with a MLFeatureTable editable from the ui"
}
domains
Links from an Asset to its Domains
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| domains | string[] | ✓ | The Domains attached to an Asset | Searchable, → AssociatedWith |
{
"type": "record",
"Aspect": {
"name": "domains"
},
"name": "Domains",
"namespace": "com.linkedin.domain",
"fields": [
{
"Relationship": {
"/*": {
"entityTypes": [
"domain"
],
"name": "AssociatedWith"
}
},
"Searchable": {
"/*": {
"addToFilters": true,
"fieldName": "domains",
"fieldType": "URN",
"filterNameOverride": "Domain",
"hasValuesFieldName": "hasDomain"
}
},
"type": {
"type": "array",
"items": "string"
},
"name": "domains",
"doc": "The Domains attached to an Asset"
}
],
"doc": "Links from an Asset to its Domains"
}
applications
Links from an Asset to its Applications
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| applications | string[] | ✓ | The Applications attached to an Asset | Searchable, → AssociatedWith |
{
"type": "record",
"Aspect": {
"name": "applications"
},
"name": "Applications",
"namespace": "com.linkedin.application",
"fields": [
{
"Relationship": {
"/*": {
"entityTypes": [
"application"
],
"name": "AssociatedWith"
}
},
"Searchable": {
"/*": {
"addToFilters": true,
"fieldName": "applications",
"fieldType": "URN",
"filterNameOverride": "Application",
"hasValuesFieldName": "hasApplication"
}
},
"type": {
"type": "array",
"items": "string"
},
"name": "applications",
"doc": "The Applications attached to an Asset"
}
],
"doc": "Links from an Asset to its Applications"
}
structuredProperties
Properties about an entity governed by StructuredPropertyDefinition
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| properties | StructuredPropertyValueAssignment[] | ✓ | Custom property bag. |
{
"type": "record",
"Aspect": {
"name": "structuredProperties"
},
"name": "StructuredProperties",
"namespace": "com.linkedin.structured",
"fields": [
{
"type": {
"type": "array",
"items": {
"type": "record",
"name": "StructuredPropertyValueAssignment",
"namespace": "com.linkedin.structured",
"fields": [
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "propertyUrn",
"doc": "The property that is being assigned a value."
},
{
"type": {
"type": "array",
"items": [
"string",
"double"
]
},
"name": "values",
"doc": "The value assigned to the property."
},
{
"type": [
"null",
{
"type": "record",
"name": "AuditStamp",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "impersonator",
"default": null,
"doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
},
{
"type": [
"null",
"string"
],
"name": "message",
"default": null,
"doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
}
],
"doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
}
],
"name": "created",
"default": null,
"doc": "Audit stamp containing who created this relationship edge and when"
},
{
"type": [
"null",
"com.linkedin.common.AuditStamp"
],
"name": "lastModified",
"default": null,
"doc": "Audit stamp containing who last modified this relationship edge and when"
},
{
"Searchable": {
"/actor": {
"fieldName": "structuredPropertyAttributionActors",
"fieldType": "URN",
"queryByDefault": false
},
"/source": {
"fieldName": "structuredPropertyAttributionSources",
"fieldType": "URN",
"queryByDefault": false
},
"/time": {
"fieldName": "structuredPropertyAttributionDates",
"fieldType": "DATETIME",
"queryByDefault": false
}
},
"type": [
"null",
{
"type": "record",
"name": "MetadataAttribution",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When this metadata was updated."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) responsible for applying the assocated metadata. This can\neither be a user (in case of UI edits) or the datahub system for automation."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "source",
"default": null,
"doc": "The DataHub source responsible for applying the associated metadata. This will only be filled out\nwhen a DataHub source is responsible. This includes the specific metadata test urn, the automation urn."
},
{
"type": {
"type": "map",
"values": "string"
},
"name": "sourceDetail",
"default": {},
"doc": "The details associated with why this metadata was applied. For example, this could include\nthe actual regex rule, sql statement, ingestion pipeline ID, etc."
}
],
"doc": "Information about who, why, and how this metadata was applied"
}
],
"name": "attribution",
"default": null,
"doc": "Information about who, why, and how this metadata was applied"
}
]
}
},
"name": "properties",
"doc": "Custom property bag."
}
],
"doc": "Properties about an entity governed by StructuredPropertyDefinition"
}
forms
Forms that are assigned to this entity to be filled out
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| incompleteForms | FormAssociation[] | ✓ | All incomplete forms assigned to the entity. | Searchable |
| completedForms | FormAssociation[] | ✓ | All complete forms assigned to the entity. | Searchable |
| verifications | FormVerificationAssociation[] | ✓ | Verifications that have been applied to the entity via completed forms. | Searchable |
{
"type": "record",
"Aspect": {
"name": "forms"
},
"name": "Forms",
"namespace": "com.linkedin.common",
"fields": [
{
"Searchable": {
"/*/completedPrompts/*/id": {
"fieldName": "incompleteFormsCompletedPromptIds",
"fieldType": "KEYWORD",
"queryByDefault": false
},
"/*/completedPrompts/*/lastModified/time": {
"fieldName": "incompleteFormsCompletedPromptResponseTimes",
"fieldType": "DATETIME",
"queryByDefault": false
},
"/*/incompletePrompts/*/id": {
"fieldName": "incompleteFormsIncompletePromptIds",
"fieldType": "KEYWORD",
"queryByDefault": false
},
"/*/urn": {
"fieldName": "incompleteForms",
"fieldType": "URN",
"queryByDefault": false
}
},
"type": {
"type": "array",
"items": {
"type": "record",
"name": "FormAssociation",
"namespace": "com.linkedin.common",
"fields": [
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "urn",
"doc": "Urn of the applied form"
},
{
"type": {
"type": "array",
"items": {
"type": "record",
"name": "FormPromptAssociation",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "string",
"name": "id",
"doc": "The id for the prompt. This must be GLOBALLY UNIQUE."
},
{
"type": {
"type": "record",
"name": "AuditStamp",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "impersonator",
"default": null,
"doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
},
{
"type": [
"null",
"string"
],
"name": "message",
"default": null,
"doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
}
],
"doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
},
"name": "lastModified",
"doc": "The last time this prompt was touched for the entity (set, unset)"
},
{
"type": [
"null",
{
"type": "record",
"name": "FormPromptFieldAssociations",
"namespace": "com.linkedin.common",
"fields": [
{
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "FieldFormPromptAssociation",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "string",
"name": "fieldPath",
"doc": "The field path on a schema field."
},
{
"type": "com.linkedin.common.AuditStamp",
"name": "lastModified",
"doc": "The last time this prompt was touched for the field on the entity (set, unset)"
}
],
"doc": "Information about the status of a particular prompt for a specific schema field\non an entity."
}
}
],
"name": "completedFieldPrompts",
"default": null,
"doc": "A list of field-level prompt associations that are not yet complete for this form."
},
{
"type": [
"null",
{
"type": "array",
"items": "com.linkedin.common.FieldFormPromptAssociation"
}
],
"name": "incompleteFieldPrompts",
"default": null,
"doc": "A list of field-level prompt associations that are complete for this form."
}
],
"doc": "Information about the field-level prompt associations on a top-level prompt association."
}
],
"name": "fieldAssociations",
"default": null,
"doc": "Optional information about the field-level prompt associations."
}
],
"doc": "Information about the status of a particular prompt.\nNote that this is where we can add additional information about individual responses:\nactor, timestamp, and the response itself."
}
},
"name": "incompletePrompts",
"default": [],
"doc": "A list of prompts that are not yet complete for this form."
},
{
"type": {
"type": "array",
"items": "com.linkedin.common.FormPromptAssociation"
},
"name": "completedPrompts",
"default": [],
"doc": "A list of prompts that have been completed for this form."
}
],
"doc": "Properties of an applied form."
}
},
"name": "incompleteForms",
"doc": "All incomplete forms assigned to the entity."
},
{
"Searchable": {
"/*/completedPrompts/*/id": {
"fieldName": "completedFormsCompletedPromptIds",
"fieldType": "KEYWORD",
"queryByDefault": false
},
"/*/completedPrompts/*/lastModified/time": {
"fieldName": "completedFormsCompletedPromptResponseTimes",
"fieldType": "DATETIME",
"queryByDefault": false
},
"/*/incompletePrompts/*/id": {
"fieldName": "completedFormsIncompletePromptIds",
"fieldType": "KEYWORD",
"queryByDefault": false
},
"/*/urn": {
"fieldName": "completedForms",
"fieldType": "URN",
"queryByDefault": false
}
},
"type": {
"type": "array",
"items": "com.linkedin.common.FormAssociation"
},
"name": "completedForms",
"doc": "All complete forms assigned to the entity."
},
{
"Searchable": {
"/*/form": {
"fieldName": "verifiedForms",
"fieldType": "URN",
"queryByDefault": false
}
},
"type": {
"type": "array",
"items": {
"type": "record",
"name": "FormVerificationAssociation",
"namespace": "com.linkedin.common",
"fields": [
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "form",
"doc": "The urn of the form that granted this verification."
},
{
"type": [
"null",
"com.linkedin.common.AuditStamp"
],
"name": "lastModified",
"default": null,
"doc": "An audit stamp capturing who and when verification was applied for this form."
}
],
"doc": "An association between a verification and an entity that has been granted\nvia completion of one or more forms of type 'VERIFICATION'."
}
},
"name": "verifications",
"default": [],
"doc": "Verifications that have been applied to the entity via completed forms."
}
],
"doc": "Forms that are assigned to this entity to be filled out"
}
testResults
Information about a Test Result
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| failing | TestResult[] | ✓ | Results that are failing | Searchable, → IsFailing |
| passing | TestResult[] | ✓ | Results that are passing | Searchable, → IsPassing |
{
"type": "record",
"Aspect": {
"name": "testResults"
},
"name": "TestResults",
"namespace": "com.linkedin.test",
"fields": [
{
"Relationship": {
"/*/test": {
"entityTypes": [
"test"
],
"name": "IsFailing"
}
},
"Searchable": {
"/*/test": {
"fieldName": "failingTests",
"fieldType": "URN",
"hasValuesFieldName": "hasFailingTests",
"queryByDefault": false
}
},
"type": {
"type": "array",
"items": {
"type": "record",
"name": "TestResult",
"namespace": "com.linkedin.test",
"fields": [
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "test",
"doc": "The urn of the test"
},
{
"type": {
"type": "enum",
"symbolDocs": {
"FAILURE": " The Test Failed",
"SUCCESS": " The Test Succeeded"
},
"name": "TestResultType",
"namespace": "com.linkedin.test",
"symbols": [
"SUCCESS",
"FAILURE"
]
},
"name": "type",
"doc": "The type of the result"
},
{
"type": [
"null",
"string"
],
"name": "testDefinitionMd5",
"default": null,
"doc": "The md5 of the test definition that was used to compute this result.\nSee TestInfo.testDefinition.md5 for more information."
},
{
"type": [
"null",
{
"type": "record",
"name": "AuditStamp",
"namespace": "com.linkedin.common",
"fields": [
{
"type": "long",
"name": "time",
"doc": "When did the resource/association/sub-resource move into the specific lifecycle stage represented by this AuditEvent."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": "string",
"name": "actor",
"doc": "The entity (e.g. a member URN) which will be credited for moving the resource/association/sub-resource into the specific lifecycle stage. It is also the one used to authorize the change."
},
{
"java": {
"class": "com.linkedin.common.urn.Urn"
},
"type": [
"null",
"string"
],
"name": "impersonator",
"default": null,
"doc": "The entity (e.g. a service URN) which performs the change on behalf of the Actor and must be authorized to act as the Actor."
},
{
"type": [
"null",
"string"
],
"name": "message",
"default": null,
"doc": "Additional context around how DataHub was informed of the particular change. For example: was the change created by an automated process, or manually."
}
],
"doc": "Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage."
}
],
"name": "lastComputed",
"default": null,
"doc": "The audit stamp of when the result was computed, including the actor who computed it."
}
],
"doc": "Information about a Test Result"
}
},
"name": "failing",
"doc": "Results that are failing"
},
{
"Relationship": {
"/*/test": {
"entityTypes": [
"test"
],
"name": "IsPassing"
}
},
"Searchable": {
"/*/test": {
"fieldName": "passingTests",
"fieldType": "URN",
"hasValuesFieldName": "hasPassingTests",
"queryByDefault": false
}
},
"type": {
"type": "array",
"items": "com.linkedin.test.TestResult"
},
"name": "passing",
"doc": "Results that are passing"
}
],
"doc": "Information about a Test Result"
}
subTypes
Sub Types. Use this aspect to specialize a generic Entity e.g. Making a Dataset also be a View or also be a LookerExplore
- Fields
- Raw Schema
| Field | Type | Required | Description | Annotations |
|---|---|---|---|---|
| typeNames | string[] | ✓ | The names of the specific types. | Searchable |
{
"type": "record",
"Aspect": {
"name": "subTypes"
},
"name": "SubTypes",
"namespace": "com.linkedin.common",
"fields": [
{
"Searchable": {
"/*": {
"addToFilters": true,
"fieldType": "KEYWORD",
"filterNameOverride": "Sub Type",
"queryByDefault": false
}
},
"type": {
"type": "array",
"items": "string"
},
"name": "typeNames",
"doc": "The names of the specific types."
}
],
"doc": "Sub Types. Use this aspect to specialize a generic Entity\ne.g. Making a Dataset also be a View or also be a LookerExplore"
}
Common Types
These types are used across multiple aspects in this entity.
AuditStamp
Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.
Fields:
time(long): When did the resource/association/sub-resource move into the specific lifecyc...actor(string): The entity (e.g. a member URN) which will be credited for moving the resource...impersonator(string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...message(string?): Additional context around how DataHub was informed of the particular change. ...
FormAssociation
Properties of an applied form.
Fields:
urn(string): Urn of the applied formincompletePrompts(FormPromptAssociation[]): A list of prompts that are not yet complete for this form.completedPrompts(FormPromptAssociation[]): A list of prompts that have been completed for this form.
TestResult
Information about a Test Result
Fields:
test(string): The urn of the testtype(TestResultType): The type of the resulttestDefinitionMd5(string?): The md5 of the test definition that was used to compute this result. See Test...lastComputed(AuditStamp?): The audit stamp of when the result was computed, including the actor who comp...
Relationships
Outgoing
These are the relationships stored in this entity's aspects
SourcePlatform
- DataPlatform via
mlFeatureTableKey.platform
- DataPlatform via
Contains
- MlFeature via
mlFeatureTableProperties.mlFeatures
- MlFeature via
KeyedBy
- MlPrimaryKey via
mlFeatureTableProperties.mlPrimaryKeys
- MlPrimaryKey via
OwnedBy
- Corpuser via
ownership.owners.owner - CorpGroup via
ownership.owners.owner
- Corpuser via
ownershipType
- OwnershipType via
ownership.owners.typeUrn
- OwnershipType via
TaggedWith
- Tag via
globalTags.tags
- Tag via
TermedWith
- GlossaryTerm via
glossaryTerms.terms.urn
- GlossaryTerm via
AssociatedWith
- Domain via
domains.domains - Application via
applications.applications
- Domain via
IsFailing
- Test via
testResults.failing
- Test via
IsPassing
- Test via
testResults.passing
- Test via
Global Metadata Model
