Skip to main content
Version: Next

ER Model Relationship

Entity-Relationship (ER) Model Relationships represent the connections between entities in an entity-relationship diagram, specifically modeling how dataset fields relate to each other through foreign key constraints, joins, and other referential relationships. In DataHub, these relationships capture the semantic connections between tables, enabling users to understand data structure, enforce referential integrity, and trace data lineage at the field level.

ER Model Relationships are particularly valuable for documenting database schemas, data warehouse models, and any structured data system where understanding table relationships is critical for data governance, impact analysis, and query optimization.

Identity

ER Model Relationships are uniquely identified by a single identifier:

  • id: A unique string identifier for the relationship. When created programmatically, this is typically generated as an MD5 hash based on the relationship name and the two datasets involved (sorted alphabetically to ensure consistency).

The URN structure follows the pattern:

urn:li:erModelRelationship:<id>

Example URNs

urn:li:erModelRelationship:employee_to_company
urn:li:erModelRelationship:a1b2c3d4e5f6g7h8i9j0

ID Generation

When creating relationships through the UI or API, the ID is often generated deterministically using a hash function to ensure consistency:

  1. Create a JSON string with keys in alphabetical order: Destination, ERModelRelationName, Source
  2. Use the lower lexicographic dataset URN as "Destination" and the higher as "Source"
  3. Generate an MD5 hash of this JSON string

This ensures that the same relationship between two datasets always gets the same ID, regardless of creation order.

Important Capabilities

Relationship Properties

ER Model Relationships capture essential metadata about how datasets connect to each other through the erModelRelationshipProperties aspect. This core aspect contains:

Core Attributes

  • name: A human-readable name for the relationship (e.g., "Employee to Company Relationship")
  • source: The URN of the source dataset (first entity in the relationship)
  • destination: The URN of the destination dataset (second entity in the relationship)
  • cardinality: Defines the relationship type between datasets

Cardinality Types

DataHub supports four cardinality types that describe how records in one dataset relate to records in another:

  • ONE_ONE: One-to-one relationship. Each record in the source dataset corresponds to exactly one record in the destination dataset.

    • Example: Employee → EmployeeDetails (one employee has one detail record)
  • ONE_N: One-to-many relationship. Each record in the source dataset can correspond to multiple records in the destination dataset.

    • Example: Department → Employee (one department has many employees)
  • N_ONE: Many-to-one relationship. Multiple records in the source dataset can correspond to one record in the destination dataset.

    • Example: Employee → Company (many employees belong to one company)
  • N_N: Many-to-many relationship. Records in both datasets can have multiple corresponding records in the other dataset.

    • Example: Student → Course (students take many courses, courses have many students)

Field Mappings

The relationshipFieldMappings array defines which specific fields connect the two datasets. Each mapping contains:

  • sourceField: The field path in the source dataset (e.g., "company_id")
  • destinationField: The field path in the destination dataset (e.g., "id")

Multiple field mappings can be specified for composite keys where the relationship depends on multiple fields.

Custom Properties

Like other DataHub entities, ER Model Relationships support custom properties for storing additional metadata such as:

  • Constraint types (e.g., "Foreign Key", "Referential Integrity")
  • Index information
  • Database-specific metadata
  • Business rules or validation logic

Timestamps

Relationships include optional timestamp information to track when they were created and last modified in the source system:

  • created: AuditStamp with creation time and actor
  • lastModified: AuditStamp with last modification time and actor

Creating an ER Model Relationship

Here's a complete example showing how to create two datasets and establish a many-to-one relationship between them:

Python SDK: Create an ER Model Relationship
# Inlined from /metadata-ingestion/examples/library/ermodelrelationship_create_basic.py
# metadata-ingestion/examples/library/ermodelrelationship_create_basic.py
import os
import time

from datahub.emitter.mce_builder import make_data_platform_urn, make_dataset_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AuditStampClass,
ERModelRelationshipCardinalityClass,
ERModelRelationshipKeyClass,
ERModelRelationshipPropertiesClass,
NumberTypeClass,
OtherSchemaClass,
RelationshipFieldMappingClass,
SchemaFieldClass,
SchemaFieldDataTypeClass,
SchemaMetadataClass,
StringTypeClass,
)

GMS_ENDPOINT = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
GMS_TOKEN = os.getenv("DATAHUB_GMS_TOKEN")
PLATFORM = "mysql"
ENV = "PROD"

emitter = DatahubRestEmitter(gms_server=GMS_ENDPOINT, token=GMS_TOKEN)


def create_dataset_with_schema(
dataset_name: str, fields: list[SchemaFieldClass]
) -> str:
"""Helper function to create a dataset with schema."""
dataset_urn = make_dataset_urn(PLATFORM, dataset_name, ENV)

schema_metadata = SchemaMetadataClass(
schemaName=dataset_name,
platform=make_data_platform_urn(PLATFORM),
fields=fields,
version=0,
hash="",
platformSchema=OtherSchemaClass(rawSchema=""),
)

emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=dataset_urn,
aspect=schema_metadata,
)
)

return dataset_urn


def create_schema_field(
field_path: str, native_type: str, data_type: SchemaFieldDataTypeClass
) -> SchemaFieldClass:
"""Helper function to create a schema field."""
return SchemaFieldClass(
fieldPath=field_path,
type=data_type,
nativeDataType=native_type,
description=f"Field: {field_path}",
lastModified=AuditStampClass(
time=int(time.time() * 1000),
actor="urn:li:corpuser:datahub",
),
)


# Create Employee table
employee_fields = [
create_schema_field("id", "int", SchemaFieldDataTypeClass(type=NumberTypeClass())),
create_schema_field(
"name", "varchar(100)", SchemaFieldDataTypeClass(type=StringTypeClass())
),
create_schema_field(
"email", "varchar(255)", SchemaFieldDataTypeClass(type=StringTypeClass())
),
create_schema_field(
"company_id", "int", SchemaFieldDataTypeClass(type=NumberTypeClass())
),
]
employee_urn = create_dataset_with_schema("Employee", employee_fields)
print(f"Created Employee dataset: {employee_urn}")

# Create Company table
company_fields = [
create_schema_field("id", "int", SchemaFieldDataTypeClass(type=NumberTypeClass())),
create_schema_field(
"name", "varchar(200)", SchemaFieldDataTypeClass(type=StringTypeClass())
),
create_schema_field(
"industry", "varchar(100)", SchemaFieldDataTypeClass(type=StringTypeClass())
),
]
company_urn = create_dataset_with_schema("Company", company_fields)
print(f"Created Company dataset: {company_urn}")

# Create ER Model Relationship
relationship_id = "employee_to_company"
relationship_urn = f"urn:li:erModelRelationship:{relationship_id}"

# Emit the key aspect
relationship_key = ERModelRelationshipKeyClass(id=relationship_id)
emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=relationship_urn,
aspect=relationship_key,
)
)

# Emit the properties aspect
relationship_properties = ERModelRelationshipPropertiesClass(
name="Employee to Company Relationship",
source=employee_urn,
destination=company_urn,
relationshipFieldMappings=[
RelationshipFieldMappingClass(
sourceField="company_id",
destinationField="id",
)
],
cardinality=ERModelRelationshipCardinalityClass.N_ONE,
customProperties={
"constraint_type": "FOREIGN_KEY",
"on_delete": "CASCADE",
"on_update": "CASCADE",
},
created=AuditStampClass(
time=int(time.time() * 1000),
actor="urn:li:corpuser:datahub",
),
)

emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=relationship_urn,
aspect=relationship_properties,
)
)

print(f"Created ER Model Relationship: {relationship_urn}")
print(
"This N:1 relationship connects Employee.company_id to Company.id, "
"representing that many employees belong to one company."
)

Editable Properties

The editableERModelRelationshipProperties aspect allows users to add or modify relationship metadata through the DataHub UI without overwriting information ingested from source systems. This separation follows the same pattern used across DataHub entities.

Editable properties include:

  • description: Documentation explaining the relationship's purpose, constraints, or business logic
  • name: An alternative display name that overrides the source system name

Updating Editable Properties

Python SDK: Update editable relationship properties
# Inlined from /metadata-ingestion/examples/library/ermodelrelationship_update_properties.py
# metadata-ingestion/examples/library/ermodelrelationship_update_properties.py
import time

from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AuditStampClass,
EditableERModelRelationshipPropertiesClass,
)

GMS_ENDPOINT = "http://localhost:8080"
relationship_urn = "urn:li:erModelRelationship:employee_to_company"

emitter = DatahubRestEmitter(gms_server=GMS_ENDPOINT, extra_headers={})

# Create or update editable properties
audit_stamp = AuditStampClass(
time=int(time.time() * 1000), actor="urn:li:corpuser:datahub"
)

editable_properties = EditableERModelRelationshipPropertiesClass(
name="Employee-Company Foreign Key",
description=(
"This relationship establishes referential integrity between the Employee "
"and Company tables. Each employee record must reference a valid company. "
"The relationship enforces CASCADE on both UPDATE and DELETE operations, "
"meaning changes to company IDs will propagate to employee records, and "
"deleting a company will delete all associated employees."
),
created=audit_stamp,
)

emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=relationship_urn,
aspect=editable_properties,
)
)

print(f"Updated editable properties for ER Model Relationship {relationship_urn}")
print(f"Name: {editable_properties.name}")
print(f"Description: {editable_properties.description}")

Tags and Glossary Terms

ER Model Relationships support tagging and glossary term attachment just like other DataHub entities. This allows you to categorize relationships, mark them with data classification tags, or link them to business concepts.

Adding Tags

Tags can be used to classify relationships by type, importance, or data domain:

Python SDK: Add a tag to an ER Model Relationship
# Inlined from /metadata-ingestion/examples/library/ermodelrelationship_add_tag.py
# metadata-ingestion/examples/library/ermodelrelationship_add_tag.py
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
GlobalTagsClass,
TagAssociationClass,
)

GMS_ENDPOINT = "http://localhost:8080"
relationship_urn = "urn:li:erModelRelationship:employee_to_company"
tag_urn = "urn:li:tag:ForeignKey"

emitter = DatahubRestEmitter(gms_server=GMS_ENDPOINT, extra_headers={})

# Read current tags
# FIXME: emitter.get not available
# gms_response = emitter.get(relationship_urn, aspects=["globalTags"])
current_tags: dict[
str, object
] = {} # gms_response.get("globalTags", {}) if gms_response else {}

# Build new tags list
existing_tags = []
if isinstance(current_tags, dict) and "tags" in current_tags:
tags_list = current_tags["tags"]
if isinstance(tags_list, list):
existing_tags = [tag["tag"] for tag in tags_list]

# Add new tag if not already present
if tag_urn not in existing_tags:
tag_associations = [
TagAssociationClass(tag=existing_tag) for existing_tag in existing_tags
]
tag_associations.append(TagAssociationClass(tag=tag_urn))

global_tags = GlobalTagsClass(tags=tag_associations)

emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=relationship_urn,
aspect=global_tags,
)
)

print(f"Added tag {tag_urn} to ER Model Relationship {relationship_urn}")
else:
print(f"Tag {tag_urn} already exists on {relationship_urn}")

Adding Glossary Terms

Glossary terms connect relationships to business concepts and terminology:

Python SDK: Add a glossary term to an ER Model Relationship
# Inlined from /metadata-ingestion/examples/library/ermodelrelationship_add_term.py
# metadata-ingestion/examples/library/ermodelrelationship_add_term.py
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AuditStampClass,
GlossaryTermAssociationClass,
GlossaryTermsClass,
)

GMS_ENDPOINT = "http://localhost:8080"
relationship_urn = "urn:li:erModelRelationship:employee_to_company"
term_urn = "urn:li:glossaryTerm:ReferentialIntegrity"

emitter = DatahubRestEmitter(gms_server=GMS_ENDPOINT, extra_headers={})

# Read current glossary terms
# FIXME: emitter.get not available
# gms_response = emitter.get(relationship_urn, aspects=["glossaryTerms"])
current_terms: dict[
str, object
] = {} # gms_response.get("glossaryTerms", {}) if gms_response else {}

# Build new terms list
existing_terms = []
if isinstance(current_terms, dict) and "terms" in current_terms:
terms_list = current_terms["terms"]
if isinstance(terms_list, list):
existing_terms = [term["urn"] for term in terms_list]

# Add new term if not already present
if term_urn not in existing_terms:
term_associations = [
GlossaryTermAssociationClass(urn=existing_term)
for existing_term in existing_terms
]
term_associations.append(GlossaryTermAssociationClass(urn=term_urn))

glossary_terms = GlossaryTermsClass(
terms=term_associations,
auditStamp=AuditStampClass(time=0, actor="urn:li:corpuser:datahub"),
)

emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=relationship_urn,
aspect=glossary_terms,
)
)

print(f"Added glossary term {term_urn} to ER Model Relationship {relationship_urn}")
else:
print(f"Glossary term {term_urn} already exists on {relationship_urn}")

Ownership

Ownership can be assigned to ER Model Relationships to indicate who is responsible for maintaining the relationship definition or who should be consulted about changes to the connected datasets.

Python SDK: Add an owner to an ER Model Relationship
# Inlined from /metadata-ingestion/examples/library/ermodelrelationship_add_owner.py
# metadata-ingestion/examples/library/ermodelrelationship_add_owner.py
import time

from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AuditStampClass,
OwnerClass,
OwnershipClass,
OwnershipTypeClass,
)

GMS_ENDPOINT = "http://localhost:8080"
relationship_urn = "urn:li:erModelRelationship:employee_to_company"
owner_urn = "urn:li:corpuser:jdoe"

emitter = DatahubRestEmitter(gms_server=GMS_ENDPOINT, extra_headers={})

# Read current ownership
# FIXME: emitter.get not available
# gms_response = emitter.get(relationship_urn, aspects=["ownership"])
current_ownership: dict[
str, object
] = {} # gms_response.get("ownership", {}) if gms_response else {}

# Build new owners list
existing_owners = []
if isinstance(current_ownership, dict) and "owners" in current_ownership:
owners_list = current_ownership["owners"]
if isinstance(owners_list, list):
existing_owners = [owner["owner"] for owner in owners_list]

# Add new owner if not already present
if owner_urn not in existing_owners:
owner_list = [
OwnerClass(owner=existing_owner, type=OwnershipTypeClass.DATAOWNER)
for existing_owner in existing_owners
]
owner_list.append(
OwnerClass(
owner=owner_urn,
type=OwnershipTypeClass.DATAOWNER,
)
)

ownership = OwnershipClass(
owners=owner_list,
lastModified=AuditStampClass(
time=int(time.time() * 1000),
actor="urn:li:corpuser:datahub",
),
)

emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=relationship_urn,
aspect=ownership,
)
)

print(f"Added owner {owner_urn} to ER Model Relationship {relationship_urn}")
else:
print(f"Owner {owner_urn} already exists on {relationship_urn}")

Complex Relationships

ER Model Relationships can model sophisticated data structures including composite keys and many-to-many relationships through junction tables:

Python SDK: Create a many-to-many relationship with composite keys
# Inlined from /metadata-ingestion/examples/library/ermodelrelationship_complex_many_to_many.py
# metadata-ingestion/examples/library/ermodelrelationship_complex_many_to_many.py
import time

from datahub.emitter.mce_builder import make_data_platform_urn, make_dataset_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AuditStampClass,
ERModelRelationshipCardinalityClass,
ERModelRelationshipKeyClass,
ERModelRelationshipPropertiesClass,
NumberTypeClass,
OtherSchemaClass,
RelationshipFieldMappingClass,
SchemaFieldClass,
SchemaFieldDataTypeClass,
SchemaMetadataClass,
StringTypeClass,
)

GMS_ENDPOINT = "http://localhost:8080"
PLATFORM = "postgres"
ENV = "PROD"

emitter = DatahubRestEmitter(gms_server=GMS_ENDPOINT, extra_headers={})


def create_dataset_with_schema(
dataset_name: str, fields: list[SchemaFieldClass]
) -> str:
"""Helper function to create a dataset with schema."""
dataset_urn = make_dataset_urn(PLATFORM, dataset_name, ENV)

schema_metadata = SchemaMetadataClass(
schemaName=dataset_name,
platform=make_data_platform_urn(PLATFORM),
fields=fields,
version=0,
hash="",
platformSchema=OtherSchemaClass(rawSchema=""),
)

emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=dataset_urn,
aspect=schema_metadata,
)
)

return dataset_urn


def create_schema_field(
field_path: str, native_type: str, data_type: SchemaFieldDataTypeClass
) -> SchemaFieldClass:
"""Helper function to create a schema field."""
return SchemaFieldClass(
fieldPath=field_path,
type=data_type,
nativeDataType=native_type,
description=f"Field: {field_path}",
lastModified=AuditStampClass(
time=int(time.time() * 1000),
actor="urn:li:corpuser:datahub",
),
)


# Create Student table
student_fields = [
create_schema_field("id", "int", SchemaFieldDataTypeClass(type=NumberTypeClass())),
create_schema_field(
"name", "varchar(100)", SchemaFieldDataTypeClass(type=StringTypeClass())
),
create_schema_field(
"email", "varchar(255)", SchemaFieldDataTypeClass(type=StringTypeClass())
),
]
student_urn = create_dataset_with_schema("Student", student_fields)
print(f"Created Student dataset: {student_urn}")

# Create Course table
course_fields = [
create_schema_field("id", "int", SchemaFieldDataTypeClass(type=NumberTypeClass())),
create_schema_field(
"code", "varchar(20)", SchemaFieldDataTypeClass(type=StringTypeClass())
),
create_schema_field(
"title", "varchar(200)", SchemaFieldDataTypeClass(type=StringTypeClass())
),
]
course_urn = create_dataset_with_schema("Course", course_fields)
print(f"Created Course dataset: {course_urn}")

# Create StudentCourse junction table with composite key
student_course_fields = [
create_schema_field(
"student_id", "int", SchemaFieldDataTypeClass(type=NumberTypeClass())
),
create_schema_field(
"course_id", "int", SchemaFieldDataTypeClass(type=NumberTypeClass())
),
create_schema_field(
"enrollment_date", "date", SchemaFieldDataTypeClass(type=StringTypeClass())
),
create_schema_field(
"grade", "varchar(2)", SchemaFieldDataTypeClass(type=StringTypeClass())
),
]
student_course_urn = create_dataset_with_schema("StudentCourse", student_course_fields)
print(f"Created StudentCourse junction table: {student_course_urn}")

# Create relationship: StudentCourse -> Student (many-to-one)
student_relationship_id = "student_course_to_student"
student_relationship_urn = f"urn:li:erModelRelationship:{student_relationship_id}"

student_relationship_key = ERModelRelationshipKeyClass(id=student_relationship_id)
emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=student_relationship_urn,
aspect=student_relationship_key,
)
)

student_relationship_properties = ERModelRelationshipPropertiesClass(
name="StudentCourse to Student Relationship",
source=student_course_urn,
destination=student_urn,
relationshipFieldMappings=[
RelationshipFieldMappingClass(
sourceField="student_id",
destinationField="id",
)
],
cardinality=ERModelRelationshipCardinalityClass.N_ONE,
customProperties={
"constraint_type": "FOREIGN_KEY",
"part_of_composite_key": "true",
},
created=AuditStampClass(
time=int(time.time() * 1000),
actor="urn:li:corpuser:datahub",
),
)

emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=student_relationship_urn,
aspect=student_relationship_properties,
)
)

print(f"Created relationship: {student_relationship_urn}")

# Create relationship: StudentCourse -> Course (many-to-one)
course_relationship_id = "student_course_to_course"
course_relationship_urn = f"urn:li:erModelRelationship:{course_relationship_id}"

course_relationship_key = ERModelRelationshipKeyClass(id=course_relationship_id)
emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=course_relationship_urn,
aspect=course_relationship_key,
)
)

course_relationship_properties = ERModelRelationshipPropertiesClass(
name="StudentCourse to Course Relationship",
source=student_course_urn,
destination=course_urn,
relationshipFieldMappings=[
RelationshipFieldMappingClass(
sourceField="course_id",
destinationField="id",
)
],
cardinality=ERModelRelationshipCardinalityClass.N_ONE,
customProperties={
"constraint_type": "FOREIGN_KEY",
"part_of_composite_key": "true",
},
created=AuditStampClass(
time=int(time.time() * 1000),
actor="urn:li:corpuser:datahub",
),
)

emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=course_relationship_urn,
aspect=course_relationship_properties,
)
)

print(f"Created relationship: {course_relationship_urn}")

print("\nMany-to-many relationship established through junction table:")
print("- Student N:N Course (via StudentCourse junction table)")
print("- StudentCourse has composite primary key (student_id, course_id)")
print("- Each component of the composite key is a foreign key to its respective table")

Querying Relationships

ER Model Relationships can be queried using the standard DataHub REST API:

Fetch an ER Model Relationship
curl 'http://localhost:8080/entities/urn%3Ali%3AerModelRelationship%3Aemployee_to_company'

The response includes all aspects of the relationship:

{
"urn": "urn:li:erModelRelationship:employee_to_company",
"aspects": {
"erModelRelationshipKey": {
"id": "employee_to_company"
},
"erModelRelationshipProperties": {
"name": "Employee to Company Relationship",
"source": "urn:li:dataset:(urn:li:dataPlatform:mysql,Employee,PROD)",
"destination": "urn:li:dataset:(urn:li:dataPlatform:mysql,Company,PROD)",
"relationshipFieldMappings": [
{
"sourceField": "company_id",
"destinationField": "id"
}
],
"cardinality": "N_ONE",
"customProperties": {
"constraint": "Foreign Key"
}
}
}
}
Find all relationships for a dataset

You can discover relationships connected to a specific dataset by querying the relationships API:

# Find relationships where the dataset is the source
curl 'http://localhost:8080/relationships?direction=OUTGOING&urn=urn%3Ali%3Adataset%3A(urn%3Ali%3AdataPlatform%3Amysql,Employee,PROD)&types=ermodelrelationA'

# Find relationships where the dataset is the destination
curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3Adataset%3A(urn%3Ali%3AdataPlatform%3Amysql,Company,PROD)&types=ermodelrelationB'

Integration Points

ER Model Relationships integrate with several other DataHub entities and features:

Dataset Integration

ER Model Relationships are fundamentally connected to Dataset entities. Each relationship must reference exactly two datasets:

  • Relationships are discoverable from dataset pages in the UI
  • The GraphQL API automatically resolves source and destination dataset details
  • Relationship information enriches dataset schema views

Schema Field Integration

While the entity stores field paths as strings, these correspond to SchemaField entities within the referenced datasets. This enables:

  • Visual representation of foreign key relationships in the UI
  • Field-level lineage analysis
  • Impact analysis when schema changes occur

Data Lineage

ER Model Relationships complement but are distinct from DataHub's lineage features:

  • ER Model Relationships: Model the static structure and referential constraints between datasets
  • Upstream/Downstream Lineage: Captures how data flows through transformations and pipelines

Together, these features provide a complete picture of both data structure and data flow.

GraphQL API

The DataHub GraphQL API provides rich querying capabilities for ER Model Relationships:

  • erModelRelationship(urn: String!): Fetch a specific relationship
  • Create and update relationships through mutations
  • Traverse from datasets to their relationships
  • Bulk query capabilities for building ER diagrams

Authorization

Creating and modifying ER Model Relationships requires appropriate permissions in DataHub's policy framework. Users must have edit permissions on both the source and destination datasets to create a relationship between them.

Notable Exceptions

Non-directional Relationships

While ER Model Relationships have "source" and "destination" fields, these do not necessarily imply directionality in the traditional sense of foreign keys:

  • The source/destination ordering is primarily for internal consistency
  • When generating IDs, datasets are ordered alphabetically to ensure the same relationship always produces the same ID
  • Cardinality types (ONE_N vs N_ONE) explicitly capture the actual relationship direction

Relationship Lifecycle

ER Model Relationships are currently separate from the datasets they connect:

  • Deleting a dataset does not automatically delete its relationships
  • Orphaned relationships (pointing to non-existent datasets) may exist after dataset deletion
  • Applications should handle cases where relationship endpoints may not exist

Schema Evolution

ER Model Relationships reference field paths as strings, not versioned schema references:

  • If field names change in a dataset schema, the relationship may reference outdated field names
  • No automatic validation ensures that referenced fields exist in current schemas
  • Applications should implement field validation when creating relationships

Platform Support

Not all data platforms have first-class support for ER Model Relationships:

  • Relational databases (MySQL, PostgreSQL, Oracle) naturally map to this model
  • NoSQL databases and data lakes may not have explicit relationship metadata
  • Some ingestion connectors automatically extract foreign key relationships, others do not

Future Considerations

The ER Model Relationship entity may evolve to include:

  • Additional relationship types beyond cardinality (inheritance, composition)
  • Versioning to track relationship changes over time
  • Bidirectional field mappings for complex transformation logic
  • Integration with data quality rules and constraint validation

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

  • ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
  • Searchable: This field is indexed and can be searched in DataHub's search interface
  • Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
  • → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

erModelRelationshipProperties

Properties associated with a ERModelRelationship

FieldTypeRequiredDescriptionAnnotations
customPropertiesmapCustom property bag.Searchable
namestringName of the ERModelRelationSearchable
sourcestringFirst dataset in the erModelRelationship (no directionality)Searchable, → ermodelrelationA
destinationstringSecond dataset in the erModelRelationship (no directionality)Searchable, → ermodelrelationB
relationshipFieldMappingsRelationshipFieldMapping[]ERModelRelationFieldMapping (in future we can make it an array)
createdAuditStampA timestamp documenting when the asset was created in the source Data Platform (not on DataHub)Searchable
lastModifiedAuditStampA timestamp documenting when the asset was last modified in the source Data Platform (not on Data...Searchable
cardinalityERModelRelationshipCardinalityCardinality of the relationship

editableERModelRelationshipProperties

EditableERModelRelationProperties stores editable changes made to erModelRelationship properties. This separates changes made from ingestion pipelines and edits in the UI to avoid accidental overwrites of user-provided data by ingestion pipelines

FieldTypeRequiredDescriptionAnnotations
createdAuditStampAn AuditStamp corresponding to the creation of this resource/association/sub-resource. A value of...
lastModifiedAuditStampAn AuditStamp corresponding to the last modification of this resource/association/sub-resource. I...
deletedAuditStampAn AuditStamp corresponding to the deletion of this resource/association/sub-resource. Logically,...
descriptionstringDocumentation of the erModelRelationshipSearchable (editedDescription)
namestringDisplay name of the ERModelRelationSearchable (editedName)

institutionalMemory

Institutional memory of an entity. This is a way to link to relevant documentation and provide description of the documentation. Institutional or tribal knowledge is very important for users to leverage the entity.

FieldTypeRequiredDescriptionAnnotations
elementsInstitutionalMemoryMetadata[]List of records that represent institutional memory of an entity. Each record consists of a link,...

ownership

Ownership information of an entity.

FieldTypeRequiredDescriptionAnnotations
ownersOwner[]List of owners of the entity.
ownerTypesmapOwnership type to Owners map, populated via mutation hook.Searchable
lastModifiedAuditStampAudit stamp containing who last modified the record and when. A value of 0 in the time field indi...

status

The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc. This aspect is used to represent soft deletes conventionally.

FieldTypeRequiredDescriptionAnnotations
removedbooleanWhether the entity has been removed (soft-deleted).Searchable

globalTags

Tag aspect used for applying tags to an entity

FieldTypeRequiredDescriptionAnnotations
tagsTagAssociation[]Tags associated with a given entitySearchable, → TaggedWith

glossaryTerms

Related business terms information

FieldTypeRequiredDescriptionAnnotations
termsGlossaryTermAssociation[]The related business terms
auditStampAuditStampAudit stamp containing who reported the related business term

Common Types

These types are used across multiple aspects in this entity.

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

  • time (long): When did the resource/association/sub-resource move into the specific lifecyc...
  • actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
  • impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
  • message (string?): Additional context around how DataHub was informed of the particular change. ...

Relationships

Outgoing

These are the relationships stored in this entity's aspects

  • ermodelrelationA

    • Dataset via erModelRelationshipProperties.source
  • ermodelrelationB

    • Dataset via erModelRelationshipProperties.destination
  • OwnedBy

    • Corpuser via ownership.owners.owner
    • CorpGroup via ownership.owners.owner
  • ownershipType

    • OwnershipType via ownership.owners.typeUrn
  • TaggedWith

    • Tag via globalTags.tags
  • TermedWith

    • GlossaryTerm via glossaryTerms.terms.urn

Global Metadata Model

Global Graph