Skip to main content
Version: Next

Domain

Domains are curated, top-level categories for organizing data assets within an organization. They represent logical groupings that typically align with business units, departments, or functional areas. Unlike tags which are informal labels, Domains provide a structured way to organize assets with centralized or distributed management. A data asset can belong to only one Domain at a time.

Identity

Domains are identified by a single piece of information:

  • A unique domain id: This is a string identifier that uniquely identifies the domain within DataHub. The id can be either auto-generated by DataHub or manually specified during domain creation. When creating a domain via the UI or API without specifying an id, DataHub will auto-generate a UUID-based identifier. For programmatic access or when human-readable identifiers are desired, you can specify a custom id like "marketing", "engineering", or "finance".

An example of a domain identifier is urn:li:domain:marketing.

For auto-generated domains, the URN might look like urn:li:domain:6289fccc-4af2-4cbb-96ed-051e7d1de93c.

Important Capabilities

Domain Properties

Domain properties are stored in the domainProperties aspect and contain the core metadata about a domain:

  • Name: The display name of the domain (e.g., "Marketing", "Platform Engineering")
  • Description: An optional detailed description of what the domain represents
  • Parent Domain: Domains can be hierarchical, with child domains nested under parent domains. This allows for organizational structures like "Engineering" > "Data Engineering" > "Data Platform"
  • Created Timestamp: Audit information about when the domain was created

Here is an example of creating a domain with properties:

Python SDK: Create a domain
# Inlined from /metadata-ingestion/examples/library/domain_create.py
import logging
import os

from datahub.emitter.mce_builder import make_domain_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import DomainPropertiesClass

log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# Get DataHub connection details from environment
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")

domain_urn = make_domain_urn("marketing")
domain_properties_aspect = DomainPropertiesClass(
name="Marketing", description="Entities related to the marketing department"
)

event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityUrn=domain_urn,
aspect=domain_properties_aspect,
)

rest_emitter = DatahubRestEmitter(gms_server=gms_server, token=token)
rest_emitter.emit(event)
log.info(f"Created domain {domain_urn}")

Nested Domain Hierarchies

Domains support hierarchical organization through parent-child relationships. This enables representing organizational structures with multiple levels. For example, you might have a top-level "Engineering" domain with child domains for "Data Engineering", "ML Engineering", and "Infrastructure Engineering".

Python SDK: Create a nested domain
# Inlined from /metadata-ingestion/examples/library/domain_create_nested.py
import logging
import os

from datahub.emitter.mce_builder import make_domain_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import DomainPropertiesClass

log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

domain_urn = make_domain_urn("marketing")
domain_properties_aspect = DomainPropertiesClass(
name="Verticals",
description="Entities related to the verticals sub-domain",
parentDomain="urn:li:domain:marketing",
)

event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityUrn=domain_urn,
aspect=domain_properties_aspect,
)

# Get DataHub connection details from environment
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")

rest_emitter = DatahubRestEmitter(gms_server=gms_server, token=token)
rest_emitter.emit(event)
log.info(f"Created domain {domain_urn}")

Ownership

Like other entities in DataHub, domains can have owners assigned to them using the ownership aspect. Domain owners are typically responsible for:

  • Managing which assets belong to the domain
  • Maintaining domain metadata and documentation
  • Governing data quality standards within the domain
  • Serving as points of contact for domain-related questions

Ownership types for domains follow the same patterns as other entities, including TECHNICAL_OWNER, BUSINESS_OWNER, DATA_STEWARD, etc.

Python SDK: Add an owner to a domain
# Inlined from /metadata-ingestion/examples/library/domain_add_owner.py
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.metadata.schema_classes import (
OwnerClass,
OwnershipClass,
OwnershipTypeClass,
)
from datahub.metadata.urns import CorpUserUrn, DomainUrn

graph = DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

domain_urn = DomainUrn(id="marketing")

# Get existing ownership
existing_ownership = graph.get_aspect(str(domain_urn), OwnershipClass)
owner_list = (
list(existing_ownership.owners)
if existing_ownership and existing_ownership.owners
else []
)

# Add new owner with the TECHNICAL_OWNER type
owner_list.append(
OwnerClass(owner=str(CorpUserUrn("jdoe")), type=OwnershipTypeClass.TECHNICAL_OWNER)
)

# Emit ownership
emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=str(domain_urn), aspect=OwnershipClass(owners=owner_list)
)
)

Domains support documentation through the institutionalMemory aspect, which allows linking to external resources such as:

  • Confluence pages describing the domain's purpose and scope
  • Documentation about data governance policies
  • Team wikis or handbooks
  • Onboarding guides for the domain
Python SDK: Add documentation links to a domain
# Inlined from /metadata-ingestion/examples/library/domain_add_documentation.py
import time

from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.metadata.schema_classes import (
AuditStampClass,
DomainPropertiesClass,
InstitutionalMemoryClass,
InstitutionalMemoryMetadataClass,
)
from datahub.metadata.urns import CorpUserUrn, DomainUrn

graph = DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

domain_urn = DomainUrn(id="marketing")

# Get existing properties
existing_properties = graph.get_aspect(str(domain_urn), DomainPropertiesClass)

# Update description
if existing_properties:
existing_properties.description = (
"The Marketing domain contains all data assets related to marketing operations, "
"campaigns, customer analytics, and brand management."
)
properties = existing_properties
else:
properties = DomainPropertiesClass(
name="Marketing",
description=(
"The Marketing domain contains all data assets related to marketing operations, "
"campaigns, customer analytics, and brand management."
),
)

# Emit properties
emitter.emit_mcp(
MetadataChangeProposalWrapper(entityUrn=str(domain_urn), aspect=properties)
)

# Get existing institutional memory
existing_memory = graph.get_aspect(str(domain_urn), InstitutionalMemoryClass)
links_list = (
list(existing_memory.elements)
if existing_memory and existing_memory.elements
else []
)

# Add new links
audit_stamp = AuditStampClass(
time=int(time.time() * 1000), actor=str(CorpUserUrn("datahub"))
)

links_list.append(
InstitutionalMemoryMetadataClass(
url="https://wiki.company.com/domains/marketing",
description="Marketing Domain Wiki - Overview and Guidelines",
createStamp=audit_stamp,
)
)

links_list.append(
InstitutionalMemoryMetadataClass(
url="https://confluence.company.com/marketing-data-governance",
description="Marketing Data Governance Policies",
createStamp=audit_stamp,
)
)

# Emit institutional memory
emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=str(domain_urn), aspect=InstitutionalMemoryClass(elements=links_list)
)
)

Assigning Assets to Domains

The primary purpose of domains is to organize data assets. Assets are assigned to domains using the domains aspect on the asset entity (not on the domain entity itself). This creates a relationship between the asset and the domain.

Python SDK: Assign a dataset to a domain
# Inlined from /metadata-ingestion/examples/library/dataset_add_domain.py
from datahub.metadata.urns import DatasetUrn, DomainUrn
from datahub.sdk import DataHubClient

client = DataHubClient.from_env()

dataset = client.entities.get(DatasetUrn(platform="snowflake", name="example_dataset"))

# If you don't know the domain urn, you can look it up:
# domain_urn = client.resolve.domain(name="marketing")

# NOTE: This will overwrite the existing domain
dataset.set_domain(DomainUrn(id="marketing"))

client.entities.update(dataset)

When you assign an asset to a domain, it will:

  • Appear in the domain's entity list in the UI
  • Be filterable by domain in search results
  • Show the domain badge on the asset's profile page

Querying Domains

You can query domains and their associated entities using both the REST API and GraphQL API.

Fetching Domain Information via REST API

REST API: Get domain by URN
curl 'http://localhost:8080/entities/urn%3Ali%3Adomain%3Amarketing' \
-H 'Authorization: Bearer <token>'

This will return the domain entity with all its aspects, including:

  • domainKey: The unique identifier
  • domainProperties: Name, description, parent domain
  • ownership: Owners of the domain
  • institutionalMemory: Links and documentation

Listing Assets in a Domain

Domains maintain relationships to all assets assigned to them. You can query these relationships to find all entities within a domain.

REST API: Find all assets in a domain
curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3Adomain%3Amarketing&types=AssociatedWith' \
-H 'Authorization: Bearer <token>'

This returns all entities that have been associated with the specified domain.

Python SDK: Query domain from a dataset
# Inlined from /metadata-ingestion/examples/library/dataset_query_domain.py
from datahub.sdk import DataHubClient, DatasetUrn

client = DataHubClient.from_env()

dataset = client.entities.get(
DatasetUrn(platform="hive", name="fct_users_created", env="PROD")
)

# Print the dataset domain
print(dataset.domain)

Searching and Filtering by Domain

Once assets are assigned to domains, you can:

  • Filter search results by domain using the domain filter
  • Search within a specific domain to find assets
  • Use domains as part of complex search queries

The domains field on assets is indexed and searchable, making it efficient to filter large datasets by domain membership.

Python SDK: Search for entities in a domain
# Inlined from /metadata-ingestion/examples/library/search_filter_by_domain.py
from datahub.sdk import DataHubClient
from datahub.sdk.search_filters import FilterDsl as F

# search for all assets in the marketing domain
client = DataHubClient.from_env()
results = client.search.get_urns(filter=F.domain("urn:li:domain:marketing"))

Integration Points

Domains integrate with several key DataHub features:

Relationship to Other Entities

Domains have relationships with:

  • Data Assets: Datasets, dashboards, charts, ML models, and other data assets can be assigned to domains via the domains aspect
  • Parent Domains: Domains can have parent-child relationships, creating hierarchical organizational structures
  • Users and Groups: Domains have owners (via the ownership aspect) who are responsible for managing the domain

GraphQL Resolvers

The domain entity is supported by several GraphQL resolvers in the datahub-graphql-core module:

  • CreateDomainResolver: Creates new domains
  • SetDomainResolver: Assigns assets to domains
  • UnsetDomainResolver: Removes assets from domains
  • ListDomainsResolver: Lists all available domains
  • DeleteDomainResolver: Deletes a domain
  • DomainEntitiesResolver: Retrieves all entities within a domain
  • ParentDomainsResolver: Resolves the parent hierarchy of a domain
  • BatchSetDomainResolver: Assigns multiple assets to a domain in one operation
  • MoveDomainResolver: Moves a domain to a different parent

Usage Patterns

Common usage patterns include:

  1. Data Mesh Organization: Using domains to represent different data product teams or business domains
  2. Departmental Structure: Organizing assets by company departments (Finance, Marketing, Engineering)
  3. Product Lines: Grouping assets by product or business line
  4. Regulatory Boundaries: Separating assets by compliance requirements or data residency rules
  5. Nested Structures: Creating hierarchical organizations like Region > Country > Business Unit

Integration with Ingestion

During metadata ingestion, domains can be automatically assigned using the domain configuration in ingestion recipes. This allows:

  • Bulk assignment of domains based on naming patterns
  • Automated domain assignment during discovery
  • Consistent domain tagging across similar assets

See the Domains feature guide for detailed ingestion configuration examples.

Notable Exceptions

Single Domain Assignment

Unlike tags and glossary terms which support multiple assignments, an asset can belong to only one domain at a time. If you assign an asset to a new domain, it will automatically be removed from its previous domain.

Domain Resolution During Ingestion

When using bare domain names (like "Marketing") in ingestion recipes, DataHub will attempt to resolve them to provisioned domains. The resolution process checks:

  1. First, for a domain with URN urn:li:domain:Marketing
  2. Then, for any domain with the name "Marketing"

If resolution fails, ingestion will fail to ensure data integrity. To avoid resolution issues, you can use fully-qualified domain URNs in ingestion configurations.

Hierarchical Considerations

When organizing domains hierarchically:

  • Assets are assigned to the most specific (leaf) domain, not to parent domains
  • Parent domains do not automatically inherit assets from child domains
  • Domain hierarchies are primarily for organizational clarity in the UI
  • Deleting a parent domain does not automatically delete or reassign child domains

Permissions

Managing domains requires the "Manage Domains" platform privilege. This includes:

  • Creating new domains
  • Modifying domain properties
  • Assigning assets to domains
  • Deleting domains

Individual asset assignment can also be controlled by "Edit Domain" metadata policies on specific entity types.

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

  • ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
  • Searchable: This field is indexed and can be searched in DataHub's search interface
  • Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
  • → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

domainProperties

Information about a Domain

FieldTypeRequiredDescriptionAnnotations
customPropertiesmapCustom property bag.Searchable
namestringDisplay name of the DomainSearchable
descriptionstringDescription of the DomainSearchable
createdAuditStampCreated Audit stampSearchable
parentDomainstringOptional: Parent of the domainSearchable, → IsPartOf

institutionalMemory

Institutional memory of an entity. This is a way to link to relevant documentation and provide description of the documentation. Institutional or tribal knowledge is very important for users to leverage the entity.

FieldTypeRequiredDescriptionAnnotations
elementsInstitutionalMemoryMetadata[]List of records that represent institutional memory of an entity. Each record consists of a link,...

ownership

Ownership information of an entity.

FieldTypeRequiredDescriptionAnnotations
ownersOwner[]List of owners of the entity.
ownerTypesmapOwnership type to Owners map, populated via mutation hook.Searchable
lastModifiedAuditStampAudit stamp containing who last modified the record and when. A value of 0 in the time field indi...

structuredProperties

Properties about an entity governed by StructuredPropertyDefinition

FieldTypeRequiredDescriptionAnnotations
propertiesStructuredPropertyValueAssignment[]Custom property bag.

forms

Forms that are assigned to this entity to be filled out

FieldTypeRequiredDescriptionAnnotations
incompleteFormsFormAssociation[]All incomplete forms assigned to the entity.Searchable
completedFormsFormAssociation[]All complete forms assigned to the entity.Searchable
verificationsFormVerificationAssociation[]Verifications that have been applied to the entity via completed forms.Searchable

testResults

Information about a Test Result

FieldTypeRequiredDescriptionAnnotations
failingTestResult[]Results that are failingSearchable, → IsFailing
passingTestResult[]Results that are passingSearchable, → IsPassing

displayProperties

Properties related to how the entity is displayed in the Datahub UI

FieldTypeRequiredDescriptionAnnotations
colorHexstringThe color associated with the entity in Hex. For example #FFFFFF.
iconIconPropertiesThe icon associated with the entity

assetSettings

Settings associated with this asset

FieldTypeRequiredDescriptionAnnotations
assetSummaryAssetSummarySettingsInformation related to the asset summary for this asset

Common Types

These types are used across multiple aspects in this entity.

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

  • time (long): When did the resource/association/sub-resource move into the specific lifecyc...
  • actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
  • impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
  • message (string?): Additional context around how DataHub was informed of the particular change. ...

FormAssociation

Properties of an applied form.

Fields:

  • urn (string): Urn of the applied form
  • incompletePrompts (FormPromptAssociation[]): A list of prompts that are not yet complete for this form.
  • completedPrompts (FormPromptAssociation[]): A list of prompts that have been completed for this form.

TestResult

Information about a Test Result

Fields:

  • test (string): The urn of the test
  • type (TestResultType): The type of the result
  • testDefinitionMd5 (string?): The md5 of the test definition that was used to compute this result. See Test...
  • lastComputed (AuditStamp?): The audit stamp of when the result was computed, including the actor who comp...

Relationships

Self

These are the relationships to itself, stored in this entity's aspects

  • IsPartOf (via domainProperties.parentDomain)

Outgoing

These are the relationships stored in this entity's aspects

  • OwnedBy

    • Corpuser via ownership.owners.owner
    • CorpGroup via ownership.owners.owner
  • ownershipType

    • OwnershipType via ownership.owners.typeUrn
  • IsFailing

    • Test via testResults.failing
  • IsPassing

    • Test via testResults.passing
  • HasSummaryTemplate

    • DataHubPageTemplate via assetSettings.assetSummary.templates

Incoming

These are the relationships stored in other entity's aspects

  • AssociatedWith

    • Dataset via domains.domains
    • DataJob via domains.domains
    • DataFlow via domains.domains
    • Chart via domains.domains
    • Dashboard via domains.domains
    • Notebook via domains.domains

Global Metadata Model

Global Graph