DataHubApply
Overview
DataHub Apply is a DataHub utility or metadata-focused integration. Learn more in the official DataHub Apply documentation.
The DataHub integration for DataHub Apply covers metadata entities and operational objects relevant to this connector. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.
Concept Mapping
| Source Concept | DataHub Concept | Notes |
|---|---|---|
| Apply operation input | Metadata Change Proposal (MCP) updates | Input drives metadata updates rather than discovery. |
| Asset target list | Dataset / Container (and other supported entities) | Targets are selected explicitly in recipe configuration. |
| Ownership / domain / tag / term assignment | Ownership, Domain, GlobalTags, GlossaryTerms aspects | Applied directly to existing DataHub entities. |
Module datahub-apply
Important Capabilities
Capability metadata is not explicitly declared for this module. Refer to module documentation and configuration sections below.
Overview
The datahub-apply module applies metadata changes directly to existing DataHub entities. It is useful for programmatic curation tasks such as bulk ownership, domain, tag, and glossary-term updates.
Prerequisites
- Access to a DataHub instance with permissions to update target entities.
- Valid authentication configuration for the ingestion run.
- Existing target entities in DataHub for each configured apply operation.
Install the Plugin
pip install 'acryl-datahub[datahub-apply]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: datahub-apply
config:
owner_apply:
- owner_urn: "urn:li:corpuser:datahub"
assets:
- "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
sink:
# sink configs
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description |
|---|---|
domain_apply One of array, null | List to apply domains to assets Default: None |
domain_apply.DomainApplyConfig DomainApplyConfig | |
domain_apply.DomainApplyConfig.domain_urn string | Default: |
domain_apply.DomainApplyConfig.assets array | List of assets to apply domain hierarchically. Currently only containers and datasets are supported |
domain_apply.DomainApplyConfig.assets.string string | |
owner_apply One of array, null | List to apply owners to assets Default: None |
owner_apply.OwnerApplyConfig OwnerApplyConfig | |
owner_apply.OwnerApplyConfig.owner_urn string | Default: |
owner_apply.OwnerApplyConfig.assets array | List of assets to apply owner hierarchically. Currently only containers and datasets are supported |
owner_apply.OwnerApplyConfig.assets.string string | |
tag_apply One of array, null | List to apply tags to assets Default: None |
tag_apply.TagApplyConfig TagApplyConfig | |
tag_apply.TagApplyConfig.tag_urn string | Default: |
tag_apply.TagApplyConfig.assets array | List of assets to apply tag hierarchically. Currently only containers and datasets are supported |
tag_apply.TagApplyConfig.assets.string string | |
term_apply One of array, null | List to apply terms to assets Default: None |
term_apply.TermApplyConfig TermApplyConfig | |
term_apply.TermApplyConfig.term_urn string | Default: |
term_apply.TermApplyConfig.assets array | List of assets to apply term hierarchically. Currently only containers and datasets are supported |
term_apply.TermApplyConfig.assets.string string |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"DomainApplyConfig": {
"additionalProperties": false,
"properties": {
"assets": {
"description": "List of assets to apply domain hierarchically. Currently only containers and datasets are supported",
"items": {
"type": "string"
},
"title": "Assets",
"type": "array"
},
"domain_urn": {
"default": "",
"title": "Domain Urn",
"type": "string"
}
},
"title": "DomainApplyConfig",
"type": "object"
},
"OwnerApplyConfig": {
"additionalProperties": false,
"properties": {
"assets": {
"description": "List of assets to apply owner hierarchically. Currently only containers and datasets are supported",
"items": {
"type": "string"
},
"title": "Assets",
"type": "array"
},
"owner_urn": {
"default": "",
"title": "Owner Urn",
"type": "string"
}
},
"title": "OwnerApplyConfig",
"type": "object"
},
"TagApplyConfig": {
"additionalProperties": false,
"properties": {
"assets": {
"description": "List of assets to apply tag hierarchically. Currently only containers and datasets are supported",
"items": {
"type": "string"
},
"title": "Assets",
"type": "array"
},
"tag_urn": {
"default": "",
"title": "Tag Urn",
"type": "string"
}
},
"title": "TagApplyConfig",
"type": "object"
},
"TermApplyConfig": {
"additionalProperties": false,
"properties": {
"assets": {
"description": "List of assets to apply term hierarchically. Currently only containers and datasets are supported",
"items": {
"type": "string"
},
"title": "Assets",
"type": "array"
},
"term_urn": {
"default": "",
"title": "Term Urn",
"type": "string"
}
},
"title": "TermApplyConfig",
"type": "object"
}
},
"additionalProperties": false,
"properties": {
"domain_apply": {
"anyOf": [
{
"items": {
"$ref": "#/$defs/DomainApplyConfig"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "List to apply domains to assets",
"title": "Domain Apply"
},
"tag_apply": {
"anyOf": [
{
"items": {
"$ref": "#/$defs/TagApplyConfig"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "List to apply tags to assets",
"title": "Tag Apply"
},
"term_apply": {
"anyOf": [
{
"items": {
"$ref": "#/$defs/TermApplyConfig"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "List to apply terms to assets",
"title": "Term Apply"
},
"owner_apply": {
"anyOf": [
{
"items": {
"$ref": "#/$defs/OwnerApplyConfig"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "List to apply owners to assets",
"title": "Owner Apply"
}
},
"title": "DataHubApplyConfig",
"type": "object"
}
Capabilities
Use the Important Capabilities table above as the source of truth for supported features. This module focuses on applying metadata updates rather than extracting metadata from external systems.
Limitations
- This module does not discover source metadata; it only applies configured updates to existing DataHub entities.
- Incorrect URNs or selectors can lead to partial updates or no-ops.
Troubleshooting
- Validate target URNs and entity existence before running large apply jobs.
- Start with a small scoped recipe to verify permissions and expected update behavior.
- Review ingestion logs for validation or authorization errors returned by DataHub APIs.
Code Coordinates
- Class Name:
datahub.ingestion.source.apply.datahub_apply.DataHubApplySource - Browse on GitHub
If you've got any questions on configuring ingestion for DataHubApply, feel free to ping us on our Slack.
This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.
Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.