Skip to main content
Version: Next

GlossaryTerm

A GlossaryTerm represents a standardized business definition or vocabulary term that can be associated with data assets across your organization. GlossaryTerms are the fundamental building blocks of DataHub's Business Glossary feature, enabling teams to establish and maintain a shared vocabulary for describing data concepts.

In practice, GlossaryTerms allow you to:

  • Define business terminology with clear, authoritative definitions
  • Create relationships between related business concepts (inheritance, containment, etc.)
  • Tag data assets (datasets, dashboards, charts, etc.) with standardized business terms
  • Establish governance and ownership over business vocabulary
  • Link to external resources and documentation

For example, a GlossaryTerm might define "Customer Lifetime Value (CLV)" with a precise business definition, relate it to other terms like "Revenue" and "Customer", and be applied to specific dataset columns that store CLV calculations.

Identity

GlossaryTerms are uniquely identified by a single field: their name. This name serves as the persistent identifier for the term throughout its lifecycle.

URN Structure

The URN (Uniform Resource Name) for a GlossaryTerm follows this pattern:

urn:li:glossaryTerm:<term_name>

Where:

  • <term_name>: A unique string identifier for the term. This can be human-readable (e.g., "CustomerLifetimeValue") or a generated ID (e.g., "clv-001" or a UUID).

Examples

# Simple term name
urn:li:glossaryTerm:Revenue

# Hierarchical naming convention (common pattern)
urn:li:glossaryTerm:Finance.Revenue
urn:li:glossaryTerm:Classification.PII
urn:li:glossaryTerm:Classification.Confidential

# UUID-based identifier
urn:li:glossaryTerm:41516e31-0acb-fd90-76ff-fc2c98d2d1a3

# Descriptive identifier
urn:li:glossaryTerm:CustomerLifetimeValue

Best Practices for Term Names

  1. Use hierarchical notation: Prefix terms with their category (e.g., Classification.PII, Finance.Revenue) to indicate structure even though the name is flat.
  2. Be consistent: Choose a naming convention (camelCase, dot notation, etc.) and apply it uniformly.
  3. Keep it permanent: The term name is the identifier and should not change. Use the name field in glossaryTermInfo for the display name.
  4. Consider organization: While the URN is flat, you can use glossaryNodes (term groups) to create hierarchical organization in the UI.

Important Capabilities

Core Business Definition (glossaryTermInfo)

The glossaryTermInfo aspect contains the essential business information about a term:

  • definition (required): The authoritative business definition of the term. This should be clear, concise, and provide sufficient context for anyone to understand the term's meaning.
  • name: The display name shown in the UI. This can be more human-friendly than the URN identifier (e.g., "Customer Lifetime Value" vs. "CustomerLifetimeValue").
  • parentNode: A reference to a GlossaryNode (term group) that acts as a folder for organizing terms hierarchically.
  • termSource: Indicates whether the term is "INTERNAL" (defined within your organization) or "EXTERNAL" (from an external standard like FIBO).
  • sourceRef: A reference identifier for external term sources (e.g., "FIBO" for Financial Industry Business Ontology).
  • sourceUrl: A URL pointing to the external definition of the term.
  • customProperties: Key-value pairs for additional metadata specific to your organization.

Example:

{
"name": "Customer Lifetime Value",
"definition": "The total revenue a business can expect from a single customer account throughout the business relationship.",
"termSource": "INTERNAL",
"parentNode": "urn:li:glossaryNode:Finance"
}

Term Relationships (glossaryRelatedTerms)

GlossaryTerms support several relationship types that help model the semantic connections between business concepts:

1. IsA Relationships (Inheritance)

Indicates that one term is a specialized type of another term. This creates an "Is-A" hierarchy where more specific terms inherit the characteristics of broader terms.

Use case: Email IsA PersonalInformation, SocialSecurityNumber IsA PersonalInformation

2. HasA Relationships (Containment)

Indicates that one term contains or is composed of another term. This creates a "Has-A" relationship where a complex concept consists of simpler parts.

Use case: Address HasA ZipCode, Address HasA Street, Address HasA City

3. Values Relationships

Defines the allowed values for an enumerated term. Useful for controlled vocabularies where a term has a fixed set of valid values.

Use case: ColorEnum HasValues Red, Green, Blue

4. RelatedTo Relationships

General-purpose relationship for terms that are semantically related but don't fit the other categories.

Use case: Revenue RelatedTo Profit, Customer RelatedTo Account

Hierarchical Organization

GlossaryTerms can be organized hierarchically through GlossaryNodes (term groups). The parentNode field in glossaryTermInfo establishes this relationship:

GlossaryNode: Classification
├── GlossaryTerm: Sensitive
├── GlossaryTerm: Confidential
└── GlossaryTerm: HighlyConfidential

GlossaryNode: PersonalInformation
├── GlossaryTerm: Email
├── GlossaryTerm: Address
└── GlossaryTerm: PhoneNumber

This hierarchy is visible in the DataHub UI and helps users navigate large glossaries.

Applying Terms to Data Assets

GlossaryTerms become valuable when applied to actual data assets. Terms can be attached to:

  • Datasets (tables, views, files)
  • Dataset fields (columns)
  • Dashboards
  • Charts
  • Data Jobs
  • Containers
  • And many other entity types

When a term is applied to a data asset, it creates a TermedWith relationship, which enables:

  • Discovery: Find all assets tagged with a specific business concept
  • Governance: Track which assets contain sensitive data types
  • Documentation: Provide business context for technical assets
  • Compliance: Identify datasets subject to regulatory requirements

Code Examples

Creating a GlossaryTerm

Python SDK: Create a basic GlossaryTerm
# Inlined from /metadata-ingestion/examples/library/glossary_term_create.py
import os

from datahub.emitter.mce_builder import make_term_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import GlossaryTermInfoClass

# Get DataHub connection details from environment
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")

# Create a term URN - the unique identifier for the glossary term
term_urn = make_term_urn("CustomerLifetimeValue")

# Define the term's core information
term_info = GlossaryTermInfoClass(
name="Customer Lifetime Value",
definition="The total revenue a business can expect from a single customer account throughout the business relationship. This metric helps prioritize customer retention efforts and marketing spend.",
termSource="INTERNAL",
)

# Create a metadata change proposal
event = MetadataChangeProposalWrapper(
entityUrn=term_urn,
aspect=term_info,
)

# Emit the metadata
rest_emitter = DatahubRestEmitter(gms_server=gms_server, token=token)
rest_emitter.emit(event)
print(f"Created glossary term: {term_urn}")

Python SDK: Create a GlossaryTerm with full metadata
# Inlined from /metadata-ingestion/examples/library/glossary_term_create_with_metadata.py
import os

from datahub.emitter.mce_builder import make_term_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AuditStampClass,
GlossaryTermInfoClass,
InstitutionalMemoryClass,
InstitutionalMemoryMetadataClass,
OwnerClass,
OwnershipClass,
OwnershipSourceClass,
OwnershipSourceTypeClass,
OwnershipTypeClass,
)
from datahub.metadata.urns import GlossaryNodeUrn

# Create the term URN
term_urn = make_term_urn("Classification.PII")

# Create GlossaryTermInfo with full metadata
term_info = GlossaryTermInfoClass(
name="Personally Identifiable Information",
definition="Information that can be used to identify, contact, or locate a single person, or to identify an individual in context. Examples include name, email address, phone number, and social security number.",
termSource="INTERNAL",
# Link to a parent term group (glossary node)
parentNode=str(GlossaryNodeUrn("Classification")),
# Custom properties for additional metadata
customProperties={
"sensitivity_level": "HIGH",
"data_retention_period": "7_years",
"regulatory_framework": "GDPR,CCPA",
},
)

# Add ownership information
ownership = OwnershipClass(
owners=[
OwnerClass(
owner="urn:li:corpuser:datahub",
type=OwnershipTypeClass.DATAOWNER,
source=OwnershipSourceClass(type=OwnershipSourceTypeClass.MANUAL),
),
OwnerClass(
owner="urn:li:corpGroup:privacy-team",
type=OwnershipTypeClass.DATAOWNER,
source=OwnershipSourceClass(type=OwnershipSourceTypeClass.MANUAL),
),
]
)

# Add links to related documentation
institutional_memory = InstitutionalMemoryClass(
elements=[
InstitutionalMemoryMetadataClass(
url="https://wiki.company.com/privacy/pii-guidelines",
description="Internal PII Handling Guidelines",
createStamp=AuditStampClass(time=0, actor="urn:li:corpuser:datahub"),
),
InstitutionalMemoryMetadataClass(
url="https://gdpr.eu/",
description="GDPR Official Documentation",
createStamp=AuditStampClass(time=0, actor="urn:li:corpuser:datahub"),
),
]
)

# Emit all aspects for the glossary term
# Get DataHub connection details from environment
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")

rest_emitter = DatahubRestEmitter(gms_server=gms_server, token=token)

# Emit term info
rest_emitter.emit(MetadataChangeProposalWrapper(entityUrn=term_urn, aspect=term_info))

# Emit ownership
rest_emitter.emit(MetadataChangeProposalWrapper(entityUrn=term_urn, aspect=ownership))

# Emit institutional memory (documentation links)
rest_emitter.emit(
MetadataChangeProposalWrapper(entityUrn=term_urn, aspect=institutional_memory)
)

print(f"Created glossary term with full metadata: {term_urn}")

Managing Term Relationships

Python SDK: Add relationships between GlossaryTerms
# Inlined from /metadata-ingestion/examples/library/glossary_term_add_relationships.py
from datahub.emitter.mce_builder import make_term_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import GlossaryRelatedTermsClass
from datahub.metadata.urns import GlossaryTermUrn

# First, ensure the related terms exist (you would have created these previously)
# For this example, assume we have:
# - Classification.PII (a broad category)
# - Classification.Sensitive (another category)
# - PersonalInformation.Email (a specific term)
# - PersonalInformation.Address (another specific term)

# Create relationships for the Email term
email_term_urn = make_term_urn("PersonalInformation.Email")

# Define relationships
email_relationships = GlossaryRelatedTermsClass(
# IsA relationship: Email is a type of PII
# This creates an inheritance hierarchy
isRelatedTerms=[
str(GlossaryTermUrn("Classification.PII")),
str(GlossaryTermUrn("Classification.Sensitive")),
],
# RelatedTo: General semantic relationship
relatedTerms=[
str(GlossaryTermUrn("PersonalInformation.PhoneNumber")),
str(GlossaryTermUrn("PersonalInformation.Contact")),
],
)

# Create relationships for the Address term
address_term_urn = make_term_urn("PersonalInformation.Address")

address_relationships = GlossaryRelatedTermsClass(
# IsA: Address is also a type of PII
isRelatedTerms=[str(GlossaryTermUrn("Classification.PII"))],
# HasA: Address contains these components
hasRelatedTerms=[
str(GlossaryTermUrn("PersonalInformation.ZipCode")),
str(GlossaryTermUrn("PersonalInformation.Street")),
str(GlossaryTermUrn("PersonalInformation.City")),
str(GlossaryTermUrn("PersonalInformation.Country")),
],
)

# Create an enumeration term with fixed values
color_enum_urn = make_term_urn("ColorEnum")

color_enum_relationships = GlossaryRelatedTermsClass(
# Values: Define the allowed values for this enumeration
values=[
str(GlossaryTermUrn("Colors.Red")),
str(GlossaryTermUrn("Colors.Green")),
str(GlossaryTermUrn("Colors.Blue")),
str(GlossaryTermUrn("Colors.Yellow")),
]
)

# Emit the relationships
rest_emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

# Emit Email term relationships
rest_emitter.emit(
MetadataChangeProposalWrapper(entityUrn=email_term_urn, aspect=email_relationships)
)
print(f"Added relationships to: {email_term_urn}")

# Emit Address term relationships
rest_emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=address_term_urn, aspect=address_relationships
)
)
print(f"Added relationships to: {address_term_urn}")

# Emit Color enumeration relationships
rest_emitter.emit(
MetadataChangeProposalWrapper(
entityUrn=color_enum_urn, aspect=color_enum_relationships
)
)
print(f"Added value relationships to: {color_enum_urn}")

print("\nRelationship types explained:")
print("- isRelatedTerms (IsA): Inheritance relationship - term is a type of another")
print("- hasRelatedTerms (HasA): Containment relationship - term contains other terms")
print("- values: Enumeration values - defines allowed values for the term")
print("- relatedTerms: General semantic relationship between terms")

Applying Terms to Assets

Python SDK: Add a GlossaryTerm to a dataset
# Inlined from /metadata-ingestion/examples/library/dataset_add_term.py
from typing import List, Optional, Union

from datahub.sdk import DataHubClient, DatasetUrn, GlossaryTermUrn


def add_terms_to_dataset(
client: DataHubClient,
dataset_urn: DatasetUrn,
term_urns: List[Union[GlossaryTermUrn, str]],
) -> None:
"""
Add glossary terms to a dataset.

Args:
client: DataHub client to use
dataset_urn: URN of the dataset to update
term_urns: List of term URNs or term names to add
"""
dataset = client.entities.get(dataset_urn)

for term in term_urns:
if isinstance(term, str):
resolved_term_urn = client.resolve.term(name=term)
dataset.add_term(resolved_term_urn)
else:
dataset.add_term(term)

client.entities.update(dataset)


def main(client: Optional[DataHubClient] = None) -> None:
"""
Main function to add terms to dataset example.

Args:
client: Optional DataHub client (for testing). If not provided, creates one from env.
"""
client = client or DataHubClient.from_env()

dataset_urn = DatasetUrn(platform="hive", name="realestate_db.sales", env="PROD")

# Add terms using both URN and name resolution
add_terms_to_dataset(
client=client,
dataset_urn=dataset_urn,
term_urns=[
GlossaryTermUrn("Classification.HighlyConfidential"),
"PII", # Will be resolved by name
],
)


if __name__ == "__main__":
main()

Python SDK: Add a GlossaryTerm to a dataset column
# Inlined from /metadata-ingestion/examples/library/dataset_add_column_term.py
from datahub.sdk import DataHubClient, DatasetUrn, GlossaryTermUrn

client = DataHubClient.from_env()

dataset = client.entities.get(
DatasetUrn(platform="hive", name="realestate_db.sales", env="PROD")
)

dataset["address.zipcode"].add_term(GlossaryTermUrn("Classification.Location"))

client.entities.update(dataset)

Querying GlossaryTerms

REST API: Get a GlossaryTerm by URN
# Fetch a GlossaryTerm entity
curl -X GET 'http://localhost:8080/entities/urn%3Ali%3AglossaryTerm%3ACustomerLifetimeValue' \
-H 'Authorization: Bearer <token>'

# Response includes all aspects:
# - glossaryTermKey (identity)
# - glossaryTermInfo (definition, name, etc.)
# - glossaryRelatedTerms (relationships)
# - ownership (who owns this term)
# - institutionalMemory (links to documentation)
# - etc.
REST API: Search for assets tagged with a term
# Find all datasets tagged with a specific term
curl -X POST 'http://localhost:8080/entities?action=search' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{
"entity": "dataset",
"input": "*",
"filter": {
"or": [
{
"and": [
{
"field": "glossaryTerms",
"value": "urn:li:glossaryTerm:Classification.PII",
"condition": "EQUAL"
}
]
}
]
},
"start": 0,
"count": 10
}'
Python SDK: Query terms applied to a dataset
# Inlined from /metadata-ingestion/examples/library/dataset_query_terms.py
from datahub.sdk import DataHubClient, DatasetUrn

client = DataHubClient.from_env()

dataset = client.entities.get(
DatasetUrn(platform="hive", name="realestate_db.sales", env="PROD")
)

print(dataset.terms)

Bulk Operations

YAML Ingestion: Create multiple terms from a Business Glossary file
# business_glossary.yml
version: "1"
source: MyOrganization
owners:
users:
- datahub
nodes:
- name: Classification
description: Data classification categories
terms:
- name: PII
description: Personally Identifiable Information
- name: Confidential
description: Confidential business data
- name: Public
description: Publicly available data

- name: Finance
description: Financial domain terms
terms:
- name: Revenue
description: Total income from business operations
- name: Profit
description: Financial gain after expenses
related_terms:
- Finance.Revenue
# Ingest using the DataHub CLI:
# datahub ingest -c business_glossary.yml

See the Business Glossary Source documentation for the full YAML format specification.

Integration Points

Relationship with GlossaryNode

GlossaryNodes (term groups) provide hierarchical organization for GlossaryTerms. Think of GlossaryNodes as folders and GlossaryTerms as files within those folders.

  • A GlossaryTerm can have at most one parent GlossaryNode (specified via parentNode in glossaryTermInfo)
  • GlossaryNodes can contain both GlossaryTerms and other GlossaryNodes (creating nested hierarchies)
  • Terms at the root level (no parent) appear at the top of the glossary

Application to Data Assets

GlossaryTerms can be applied to most entity types in DataHub through the glossaryTerms aspect:

Supported entities:

  • dataset, schemaField (dataset columns)
  • dashboard, chart
  • dataJob, dataFlow
  • mlModel, mlFeature, mlFeatureTable, mlPrimaryKey
  • notebook
  • container
  • dataProduct, application
  • erModelRelationship, businessAttribute

When you apply a term to an entity, DataHub creates:

  1. A glossaryTerms aspect on the target entity containing the term association
  2. A TermedWith relationship edge in the graph
  3. A searchable index entry allowing you to find all assets with that term

GraphQL API

The GraphQL API provides rich querying and mutation capabilities for GlossaryTerms:

Queries:

  • Fetch term details with related entities
  • Browse terms hierarchically
  • Search terms by name or definition
  • Get all entities tagged with a term

Mutations:

  • createGlossaryTerm: Create a new term
  • addTerms, addTerm: Apply terms to entities
  • removeTerm, batchRemoveTerms: Remove terms from entities
  • updateParentNode: Move a term to a different parent group

See the GraphQL API documentation for detailed examples.

Integration with Search and Discovery

GlossaryTerms enhance discoverability in multiple ways:

  1. Faceted Search: Users can filter search results by glossary terms
  2. Term Propagation: When a term is applied at the dataset level, it can be inherited by downstream assets
  3. Related Entities: The term's page shows all assets tagged with that term
  4. Autocomplete: Terms are suggested as users type in search or when tagging assets

Governance and Access Control

GlossaryTerms support fine-grained access control through DataHub's policy system:

  • Manage Direct Glossary Children: Permission to create/edit/delete terms directly under a specific term group
  • Manage All Glossary Children: Permission to manage any term within a term group's entire subtree
  • Standard entity policies (view, edit, delete) apply to individual terms

See the Business Glossary documentation for details on privileges.

Notable Exceptions

Term Name vs Display Name

The URN identifier (name in glossaryTermKey) is separate from the display name (name in glossaryTermInfo). Best practice:

  • URN name: Use a stable, unchanging identifier (e.g., "clv-001", "Classification.PII")
  • Display name: Use a human-friendly label that can be updated (e.g., "Customer Lifetime Value", "Personally Identifiable Information")

External Term Sources

When using terms from external standards (FIBO, ISO, industry glossaries):

  • Set termSource to "EXTERNAL"
  • Populate sourceRef with the standard name (e.g., "FIBO")
  • Include sourceUrl linking to the authoritative definition
  • Consider using the external standard's identifier as your URN name for consistency

Term Relationships vs Hierarchy

Don't confuse:

  • Parent-child hierarchy (via parentNode → GlossaryNode): Organizational structure for browsing
  • Semantic relationships (via glossaryRelatedTerms): Meaning connections between concepts

A term can have a parentNode for organization (e.g., term "Email" under node "PersonalInformation") AND semantic relationships (e.g., "Email" IsA "PII", "Email" RelatedTo "Contact").

Schema Metadata on GlossaryTerm

GlossaryTerms support the schemaMetadata aspect, which is rarely used but can be helpful for defining structured attributes on terms themselves. This is an advanced feature for when terms need to carry typed properties beyond simple custom properties.

Deprecation Behavior

When a GlossaryTerm is deprecated (via the deprecation aspect):

  • The term remains in the system and its relationships are preserved
  • Assets tagged with the term retain those associations
  • The UI displays a deprecation warning
  • The term may be hidden from autocomplete and suggestions
  • Consider creating a new term and migrating assets rather than reusing deprecated term names

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

  • ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
  • Searchable: This field is indexed and can be searched in DataHub's search interface
  • Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
  • → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

glossaryTermKey

Key for a GlossaryTerm

FieldTypeRequiredDescriptionAnnotations
namestringThe term name, which serves as a unique idSearchable (id)

glossaryTermInfo

Properties associated with a GlossaryTerm

FieldTypeRequiredDescriptionAnnotations
customPropertiesmapCustom property bag.Searchable
idstringOptional id for the termSearchable
namestringDisplay name of the termSearchable
definitionstringDefinition of business term.Searchable
parentNodestringParent node of the glossary termSearchable, → IsPartOf
termSourcestringSource of the Business Term (INTERNAL or EXTERNAL) with default value as INTERNALSearchable
sourceRefstringExternal Reference to the business-termSearchable
sourceUrlstringThe abstracted URL such as https://spec.edmcouncil.org/fibo/ontology/FBC/FinancialInstruments/Fin...
rawSchemastringSchema definition of the glossary term⚠️ Deprecated

ownership

Ownership information of an entity.

FieldTypeRequiredDescriptionAnnotations
ownersOwner[]List of owners of the entity.
ownerTypesmapOwnership type to Owners map, populated via mutation hook.Searchable
lastModifiedAuditStampAudit stamp containing who last modified the record and when. A value of 0 in the time field indi...

status

The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc. This aspect is used to represent soft deletes conventionally.

FieldTypeRequiredDescriptionAnnotations
removedbooleanWhether the entity has been removed (soft-deleted).Searchable

browsePaths

Shared aspect containing Browse Paths to be indexed for an entity.

FieldTypeRequiredDescriptionAnnotations
pathsstring[]A list of valid browse paths for the entity. Browse paths are expected to be forward slash-separ...Searchable

glossaryRelatedTerms

Has A / Is A lineage information about a glossary Term reporting the lineage

FieldTypeRequiredDescriptionAnnotations
isRelatedTermsstring[]The relationship Is A with glossary termSearchable, → IsA
hasRelatedTermsstring[]The relationship Has A with glossary termSearchable, → HasA
valuesstring[]The relationship Has Value with glossary term. These are fixed value a term has. For example a Co...Searchable, → HasValue
relatedTermsstring[]The relationship isRelatedTo with glossary termSearchable, → IsRelatedTo

institutionalMemory

Institutional memory of an entity. This is a way to link to relevant documentation and provide description of the documentation. Institutional or tribal knowledge is very important for users to leverage the entity.

FieldTypeRequiredDescriptionAnnotations
elementsInstitutionalMemoryMetadata[]List of records that represent institutional memory of an entity. Each record consists of a link,...

schemaMetadata

SchemaMetadata to describe metadata related to store schema

FieldTypeRequiredDescriptionAnnotations
schemaNamestringSchema name e.g. PageViewEvent, identity.Profile, ams.account_management_tracking
platformstringStandardized platform urn where schema is defined. The data platform Urn (urn:li:platform:{platfo...
versionlongEvery change to SchemaMetadata in the resource results in a new version. Version is server assign...
createdAuditStampAn AuditStamp corresponding to the creation of this resource/association/sub-resource. A value of...
lastModifiedAuditStampAn AuditStamp corresponding to the last modification of this resource/association/sub-resource. I...
deletedAuditStampAn AuditStamp corresponding to the deletion of this resource/association/sub-resource. Logically,...
datasetstringDataset this schema metadata is associated with.
clusterstringThe cluster this schema metadata resides from
hashstringthe SHA1 hash of the schema content
platformSchemaunionThe native schema in the dataset's platform.
fieldsSchemaField[]Client provided a list of fields from document schema.
primaryKeysstring[]Client provided list of fields that define primary keys to access record. Field order defines hie...
foreignKeysSpecsmapMap captures all the references schema makes to external datasets. Map key is ForeignKeySpecName ...⚠️ Deprecated
foreignKeysForeignKeyConstraint[]List of foreign key constraints for the schema

deprecation

Deprecation status of an entity

FieldTypeRequiredDescriptionAnnotations
deprecatedbooleanWhether the entity is deprecated.Searchable
decommissionTimelongThe time user plan to decommission this entity.
notestringAdditional information about the entity deprecation plan, such as the wiki, doc, RB.
actorstringThe user URN which will be credited for modifying this deprecation content.
replacementstring

domains

Links from an Asset to its Domains

FieldTypeRequiredDescriptionAnnotations
domainsstring[]The Domains attached to an AssetSearchable, → AssociatedWith

applications

Links from an Asset to its Applications

FieldTypeRequiredDescriptionAnnotations
applicationsstring[]The Applications attached to an AssetSearchable, → AssociatedWith

structuredProperties

Properties about an entity governed by StructuredPropertyDefinition

FieldTypeRequiredDescriptionAnnotations
propertiesStructuredPropertyValueAssignment[]Custom property bag.

forms

Forms that are assigned to this entity to be filled out

FieldTypeRequiredDescriptionAnnotations
incompleteFormsFormAssociation[]All incomplete forms assigned to the entity.Searchable
completedFormsFormAssociation[]All complete forms assigned to the entity.Searchable
verificationsFormVerificationAssociation[]Verifications that have been applied to the entity via completed forms.Searchable

testResults

Information about a Test Result

FieldTypeRequiredDescriptionAnnotations
failingTestResult[]Results that are failingSearchable, → IsFailing
passingTestResult[]Results that are passingSearchable, → IsPassing

subTypes

Sub Types. Use this aspect to specialize a generic Entity e.g. Making a Dataset also be a View or also be a LookerExplore

FieldTypeRequiredDescriptionAnnotations
typeNamesstring[]The names of the specific types.Searchable

assetSettings

Settings associated with this asset

FieldTypeRequiredDescriptionAnnotations
assetSummaryAssetSummarySettingsInformation related to the asset summary for this asset

Common Types

These types are used across multiple aspects in this entity.

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

  • time (long): When did the resource/association/sub-resource move into the specific lifecyc...
  • actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
  • impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
  • message (string?): Additional context around how DataHub was informed of the particular change. ...

FormAssociation

Properties of an applied form.

Fields:

  • urn (string): Urn of the applied form
  • incompletePrompts (FormPromptAssociation[]): A list of prompts that are not yet complete for this form.
  • completedPrompts (FormPromptAssociation[]): A list of prompts that have been completed for this form.

TestResult

Information about a Test Result

Fields:

  • test (string): The urn of the test
  • type (TestResultType): The type of the result
  • testDefinitionMd5 (string?): The md5 of the test definition that was used to compute this result. See Test...
  • lastComputed (AuditStamp?): The audit stamp of when the result was computed, including the actor who comp...

Relationships

Self

These are the relationships to itself, stored in this entity's aspects

  • IsA (via glossaryRelatedTerms.isRelatedTerms)
  • HasA (via glossaryRelatedTerms.hasRelatedTerms)
  • HasValue (via glossaryRelatedTerms.values)
  • IsRelatedTo (via glossaryRelatedTerms.relatedTerms)
  • SchemaFieldWithGlossaryTerm (via schemaMetadata.fields.glossaryTerms)
  • TermedWith (via schemaMetadata.fields.glossaryTerms.terms.urn)

Outgoing

These are the relationships stored in this entity's aspects

  • IsPartOf

    • GlossaryNode via glossaryTermInfo.parentNode
  • OwnedBy

    • Corpuser via ownership.owners.owner
    • CorpGroup via ownership.owners.owner
  • ownershipType

    • OwnershipType via ownership.owners.typeUrn
  • SchemaFieldTaggedWith

    • Tag via schemaMetadata.fields.globalTags
  • TaggedWith

    • Tag via schemaMetadata.fields.globalTags.tags
  • ForeignKeyTo

    • SchemaField via schemaMetadata.foreignKeys.foreignFields
  • ForeignKeyToDataset

    • Dataset via schemaMetadata.foreignKeys.foreignDataset
  • AssociatedWith

    • Domain via domains.domains
    • Application via applications.applications
  • IsFailing

    • Test via testResults.failing
  • IsPassing

    • Test via testResults.passing
  • HasSummaryTemplate

    • DataHubPageTemplate via assetSettings.assetSummary.templates

Incoming

These are the relationships stored in other entity's aspects

  • SchemaFieldWithGlossaryTerm

    • Dataset via schemaMetadata.fields.glossaryTerms
    • Chart via inputFields.fields.schemaField.glossaryTerms
    • Dashboard via inputFields.fields.schemaField.glossaryTerms
  • TermedWith

    • Dataset via schemaMetadata.fields.glossaryTerms.terms.urn
    • Dataset via editableSchemaMetadata.editableSchemaFieldInfo.glossaryTerms.terms.urn
    • Dataset via glossaryTerms.terms.urn
    • DataJob via glossaryTerms.terms.urn
    • DataFlow via glossaryTerms.terms.urn
    • Chart via glossaryTerms.terms.urn
    • Chart via inputFields.fields.schemaField.glossaryTerms.terms.urn
    • Dashboard via glossaryTerms.terms.urn
    • Dashboard via inputFields.fields.schemaField.glossaryTerms.terms.urn
    • Notebook via glossaryTerms.terms.urn
    • Container via glossaryTerms.terms.urn
  • EditableSchemaFieldWithGlossaryTerm

    • Dataset via editableSchemaMetadata.editableSchemaFieldInfo.glossaryTerms

Global Metadata Model

Global Graph