Skip to main content
Version: Next

Domain

Domains are curated, top-level categories for organizing data assets within an organization. They represent logical groupings that typically align with business units, departments, or functional areas. Unlike tags which are informal labels, Domains provide a structured way to organize assets with centralized or distributed management. A data asset can belong to only one Domain at a time.

Identity

Domains are identified by a single piece of information:

  • A unique domain id: This is a string identifier that uniquely identifies the domain within DataHub. The id can be either auto-generated by DataHub or manually specified during domain creation. When creating a domain via the UI or API without specifying an id, DataHub will auto-generate a UUID-based identifier. For programmatic access or when human-readable identifiers are desired, you can specify a custom id like "marketing", "engineering", or "finance".

An example of a domain identifier is urn:li:domain:marketing.

For auto-generated domains, the URN might look like urn:li:domain:6289fccc-4af2-4cbb-96ed-051e7d1de93c.

Important Capabilities

Domain Properties

Domain properties are stored in the domainProperties aspect and contain the core metadata about a domain:

  • Name: The display name of the domain (e.g., "Marketing", "Platform Engineering")
  • Description: An optional detailed description of what the domain represents
  • Parent Domain: Domains can be hierarchical, with child domains nested under parent domains. This allows for organizational structures like "Engineering" > "Data Engineering" > "Data Platform"
  • Created Timestamp: Audit information about when the domain was created

Here is an example of creating a domain with properties:

Python SDK: Create a domain
import logging
import os

from datahub.emitter.mce_builder import make_domain_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import DomainPropertiesClass

log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# Get DataHub connection details from environment
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")

domain_urn = make_domain_urn("marketing")
domain_properties_aspect = DomainPropertiesClass(
name="Marketing", description="Entities related to the marketing department"
)

event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityUrn=domain_urn,
aspect=domain_properties_aspect,
)

rest_emitter = DatahubRestEmitter(gms_server=gms_server, token=token)
rest_emitter.emit(event)
log.info(f"Created domain {domain_urn}")

Nested Domain Hierarchies

Domains support hierarchical organization through parent-child relationships. This enables representing organizational structures with multiple levels. For example, you might have a top-level "Engineering" domain with child domains for "Data Engineering", "ML Engineering", and "Infrastructure Engineering".

Python SDK: Create a nested domain
import logging
import os

from datahub.emitter.mce_builder import make_domain_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import DomainPropertiesClass

log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

domain_urn = make_domain_urn("marketing")
domain_properties_aspect = DomainPropertiesClass(
name="Verticals",
description="Entities related to the verticals sub-domain",
parentDomain="urn:li:domain:marketing",
)

event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityUrn=domain_urn,
aspect=domain_properties_aspect,
)

# Get DataHub connection details from environment
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")

rest_emitter = DatahubRestEmitter(gms_server=gms_server, token=token)
rest_emitter.emit(event)
log.info(f"Created domain {domain_urn}")

Ownership

Like other entities in DataHub, domains can have owners assigned to them using the ownership aspect. Domain owners are typically responsible for:

  • Managing which assets belong to the domain
  • Maintaining domain metadata and documentation
  • Governing data quality standards within the domain
  • Serving as points of contact for domain-related questions

Ownership types for domains follow the same patterns as other entities, including TECHNICAL_OWNER, BUSINESS_OWNER, DATA_STEWARD, etc.

Python SDK: Add an owner to a domain
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.metadata.schema_classes import (
OwnerClass,
OwnershipClass,
OwnershipTypeClass,
)
from datahub.metadata.urns import CorpUserUrn, DomainUrn

graph = DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

domain_urn = DomainUrn(id="marketing")

# Get existing ownership
existing_ownership = graph.get_aspect(str(domain_urn), OwnershipClass)
owner_list = (
list(existing_ownership.owners)
if existing_ownership and existing_ownership.owners
else []
)

# Add new owner with the TECHNICAL_OWNER type
owner_list.append(
OwnerClass(owner=str(CorpUserUrn("jdoe")), type=OwnershipTypeClass.TECHNICAL_OWNER)
)

# Emit ownership
emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=str(domain_urn), aspect=OwnershipClass(owners=owner_list)
)
)

Domains support documentation through the institutionalMemory aspect, which allows linking to external resources such as:

  • Confluence pages describing the domain's purpose and scope
  • Documentation about data governance policies
  • Team wikis or handbooks
  • Onboarding guides for the domain
Python SDK: Add documentation links to a domain
import time

from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.metadata.schema_classes import (
AuditStampClass,
DomainPropertiesClass,
InstitutionalMemoryClass,
InstitutionalMemoryMetadataClass,
)
from datahub.metadata.urns import CorpUserUrn, DomainUrn

graph = DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

domain_urn = DomainUrn(id="marketing")

# Get existing properties
existing_properties = graph.get_aspect(str(domain_urn), DomainPropertiesClass)

# Update description
if existing_properties:
existing_properties.description = (
"The Marketing domain contains all data assets related to marketing operations, "
"campaigns, customer analytics, and brand management."
)
properties = existing_properties
else:
properties = DomainPropertiesClass(
name="Marketing",
description=(
"The Marketing domain contains all data assets related to marketing operations, "
"campaigns, customer analytics, and brand management."
),
)

# Emit properties
emitter.emit_mcp(
MetadataChangeProposalWrapper(entityUrn=str(domain_urn), aspect=properties)
)

# Get existing institutional memory
existing_memory = graph.get_aspect(str(domain_urn), InstitutionalMemoryClass)
links_list = (
list(existing_memory.elements)
if existing_memory and existing_memory.elements
else []
)

# Add new links
audit_stamp = AuditStampClass(
time=int(time.time() * 1000), actor=str(CorpUserUrn("datahub"))
)

links_list.append(
InstitutionalMemoryMetadataClass(
url="https://wiki.company.com/domains/marketing",
description="Marketing Domain Wiki - Overview and Guidelines",
createStamp=audit_stamp,
)
)

links_list.append(
InstitutionalMemoryMetadataClass(
url="https://confluence.company.com/marketing-data-governance",
description="Marketing Data Governance Policies",
createStamp=audit_stamp,
)
)

# Emit institutional memory
emitter.emit_mcp(
MetadataChangeProposalWrapper(
entityUrn=str(domain_urn), aspect=InstitutionalMemoryClass(elements=links_list)
)
)

Assigning Assets to Domains

The primary purpose of domains is to organize data assets. Assets are assigned to domains using the domains aspect on the asset entity (not on the domain entity itself). This creates a relationship between the asset and the domain.

Python SDK: Assign a dataset to a domain
from datahub.metadata.urns import DatasetUrn, DomainUrn
from datahub.sdk import DataHubClient

client = DataHubClient.from_env()

dataset = client.entities.get(DatasetUrn(platform="snowflake", name="example_dataset"))

# If you don't know the domain urn, you can look it up:
# domain_urn = client.resolve.domain(name="marketing")

# NOTE: This will overwrite the existing domain
dataset.set_domain(DomainUrn(id="marketing"))

client.entities.update(dataset)

When you assign an asset to a domain, it will:

  • Appear in the domain's entity list in the UI
  • Be filterable by domain in search results
  • Show the domain badge on the asset's profile page

Querying Domains

You can query domains and their associated entities using both the REST API and GraphQL API.

Fetching Domain Information via REST API

REST API: Get domain by URN
curl 'http://localhost:8080/entities/urn%3Ali%3Adomain%3Amarketing' \
-H 'Authorization: Bearer <token>'

This will return the domain entity with all its aspects, including:

  • domainKey: The unique identifier
  • domainProperties: Name, description, parent domain
  • ownership: Owners of the domain
  • institutionalMemory: Links and documentation

Listing Assets in a Domain

Domains maintain relationships to all assets assigned to them. You can query these relationships to find all entities within a domain.

REST API: Find all assets in a domain
curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3Adomain%3Amarketing&types=AssociatedWith' \
-H 'Authorization: Bearer <token>'

This returns all entities that have been associated with the specified domain.

Python SDK: Query domain from a dataset
from datahub.sdk import DataHubClient, DatasetUrn

client = DataHubClient.from_env()

dataset = client.entities.get(
DatasetUrn(platform="hive", name="fct_users_created", env="PROD")
)

# Print the dataset domain
print(dataset.domain)

Searching and Filtering by Domain

Once assets are assigned to domains, you can:

  • Filter search results by domain using the domain filter
  • Search within a specific domain to find assets
  • Use domains as part of complex search queries

The domains field on assets is indexed and searchable, making it efficient to filter large datasets by domain membership.

Python SDK: Search for entities in a domain
from datahub.sdk import DataHubClient
from datahub.sdk.search_filters import FilterDsl as F

# search for all assets in the marketing domain
client = DataHubClient.from_env()
results = client.search.get_urns(filter=F.domain("urn:li:domain:marketing"))

Integration Points

Domains integrate with several key DataHub features:

Relationship to Other Entities

Domains have relationships with:

  • Data Assets: Datasets, dashboards, charts, ML models, and other data assets can be assigned to domains via the domains aspect
  • Parent Domains: Domains can have parent-child relationships, creating hierarchical organizational structures
  • Users and Groups: Domains have owners (via the ownership aspect) who are responsible for managing the domain

GraphQL Resolvers

The domain entity is supported by several GraphQL resolvers in the datahub-graphql-core module:

  • CreateDomainResolver: Creates new domains
  • SetDomainResolver: Assigns assets to domains
  • UnsetDomainResolver: Removes assets from domains
  • ListDomainsResolver: Lists all available domains
  • DeleteDomainResolver: Deletes a domain
  • DomainEntitiesResolver: Retrieves all entities within a domain
  • ParentDomainsResolver: Resolves the parent hierarchy of a domain
  • BatchSetDomainResolver: Assigns multiple assets to a domain in one operation
  • MoveDomainResolver: Moves a domain to a different parent

Usage Patterns

Common usage patterns include:

  1. Data Mesh Organization: Using domains to represent different data product teams or business domains
  2. Departmental Structure: Organizing assets by company departments (Finance, Marketing, Engineering)
  3. Product Lines: Grouping assets by product or business line
  4. Regulatory Boundaries: Separating assets by compliance requirements or data residency rules
  5. Nested Structures: Creating hierarchical organizations like Region > Country > Business Unit

Integration with Ingestion

During metadata ingestion, domains can be automatically assigned using the domain configuration in ingestion recipes. This allows:

  • Bulk assignment of domains based on naming patterns
  • Automated domain assignment during discovery
  • Consistent domain tagging across similar assets

See the Domains feature guide for detailed ingestion configuration examples.

Notable Exceptions

Single Domain Assignment

Unlike tags and glossary terms which support multiple assignments, an asset can belong to only one domain at a time. If you assign an asset to a new domain, it will automatically be removed from its previous domain.

Domain Resolution During Ingestion

When using bare domain names (like "Marketing") in ingestion recipes, DataHub will attempt to resolve them to provisioned domains. The resolution process checks:

  1. First, for a domain with URN urn:li:domain:Marketing
  2. Then, for any domain with the name "Marketing"

If resolution fails, ingestion will fail to ensure data integrity. To avoid resolution issues, you can use fully-qualified domain URNs in ingestion configurations.

Hierarchical Considerations

When organizing domains hierarchically:

  • Assets are assigned to the most specific (leaf) domain, not to parent domains
  • Parent domains do not automatically inherit assets from child domains
  • Domain hierarchies are primarily for organizational clarity in the UI
  • Deleting a parent domain does not automatically delete or reassign child domains

Permissions

Managing domains requires the "Manage Domains" platform privilege. This includes:

  • Creating new domains
  • Modifying domain properties
  • Assigning assets to domains
  • Deleting domains

Individual asset assignment can also be controlled by "Edit Domain" metadata policies on specific entity types.

Technical Reference

For technical details about fields, searchability, and relationships, view the Columns tab in DataHub.