Skip to main content

DataHubApply

Overview

DataHub Apply is a DataHub utility or metadata-focused integration. Learn more in the official DataHub Apply documentation.

The DataHub integration for DataHub Apply covers metadata entities and operational objects relevant to this connector. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.

Concept Mapping

Source ConceptDataHub ConceptNotes
Apply operation inputMetadata Change Proposal (MCP) updatesInput drives metadata updates rather than discovery.
Asset target listDataset / Container (and other supported entities)Targets are selected explicitly in recipe configuration.
Ownership / domain / tag / term assignmentOwnership, Domain, GlobalTags, GlossaryTerms aspectsApplied directly to existing DataHub entities.

Module datahub-apply

Testing

Important Capabilities

Capability metadata is not explicitly declared for this module. Refer to module documentation and configuration sections below.

Overview

The datahub-apply module applies metadata changes directly to existing DataHub entities. It is useful for programmatic curation tasks such as bulk ownership, domain, tag, and glossary-term updates.

Prerequisites

  • Access to a DataHub instance with permissions to update target entities.
  • Valid authentication configuration for the ingestion run.
  • Existing target entities in DataHub for each configured apply operation.

Install the Plugin

pip install 'acryl-datahub[datahub-apply]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: datahub-apply
config:
owner_apply:
- owner_urn: "urn:li:corpuser:datahub"
assets:
- "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
domain_apply
One of array, null
List to apply domains to assets
Default: None
domain_apply.DomainApplyConfig
DomainApplyConfig
domain_apply.DomainApplyConfig.domain_urn
string
Default:
domain_apply.DomainApplyConfig.assets
array
List of assets to apply domain hierarchically. Currently only containers and datasets are supported
domain_apply.DomainApplyConfig.assets.string
string
owner_apply
One of array, null
List to apply owners to assets
Default: None
owner_apply.OwnerApplyConfig
OwnerApplyConfig
owner_apply.OwnerApplyConfig.owner_urn
string
Default:
owner_apply.OwnerApplyConfig.assets
array
List of assets to apply owner hierarchically. Currently only containers and datasets are supported
owner_apply.OwnerApplyConfig.assets.string
string
tag_apply
One of array, null
List to apply tags to assets
Default: None
tag_apply.TagApplyConfig
TagApplyConfig
tag_apply.TagApplyConfig.tag_urn
string
Default:
tag_apply.TagApplyConfig.assets
array
List of assets to apply tag hierarchically. Currently only containers and datasets are supported
tag_apply.TagApplyConfig.assets.string
string
term_apply
One of array, null
List to apply terms to assets
Default: None
term_apply.TermApplyConfig
TermApplyConfig
term_apply.TermApplyConfig.term_urn
string
Default:
term_apply.TermApplyConfig.assets
array
List of assets to apply term hierarchically. Currently only containers and datasets are supported
term_apply.TermApplyConfig.assets.string
string

Capabilities

Use the Important Capabilities table above as the source of truth for supported features. This module focuses on applying metadata updates rather than extracting metadata from external systems.

Limitations

  • This module does not discover source metadata; it only applies configured updates to existing DataHub entities.
  • Incorrect URNs or selectors can lead to partial updates or no-ops.

Troubleshooting

  • Validate target URNs and entity existence before running large apply jobs.
  • Start with a small scoped recipe to verify permissions and expected update behavior.
  • Review ingestion logs for validation or authorization errors returned by DataHub APIs.

Code Coordinates

  • Class Name: datahub.ingestion.source.apply.datahub_apply.DataHubApplySource
  • Browse on GitHub
Questions?

If you've got any questions on configuring ingestion for DataHubApply, feel free to ping us on our Slack.

💡 Contributing to this documentation

This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.

Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.