Skip to main content
Version: Next

Document

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

  • ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
  • Searchable: This field is indexed and can be searched in DataHub's search interface
  • Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
  • → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

documentInfo

Information about a document

FieldTypeRequiredDescriptionAnnotations
customPropertiesmapCustom property bag.Searchable
titlestringOptional title for the document.Searchable
sourceDocumentSourceInformation about the external source of this document. Only populated for third-party documents...
statusDocumentStatusStatus of the document (published, unpublished.)
contentsDocumentContentsContent of the document
createdAuditStampThe time and actor who created the documentSearchable
lastModifiedAuditStampThe time and actor who last modified the document (any field)Searchable
relatedAssetsRelatedAsset[]Assets referenced by or related to this document.
relatedDocumentsRelatedDocument[]Documents referenced by or related to this document.
parentDocumentParentDocumentParent article for this asset.
draftOfDraftOfIf this document is a draft, the document it is a draft of. When set, this document should be hid...

documentSettings

Settings specific to a document entity

FieldTypeRequiredDescriptionAnnotations
showInGlobalContextbooleanWhether or not this document should be visible in the global context (e.g., global navigation, kn...Searchable
lastModifiedAuditStampLast Modified Audit stampSearchable

status

The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc. This aspect is used to represent soft deletes conventionally.

FieldTypeRequiredDescriptionAnnotations
removedbooleanWhether the entity has been removed (soft-deleted).Searchable

ownership

Ownership information of an entity.

FieldTypeRequiredDescriptionAnnotations
ownersOwner[]List of owners of the entity.
ownerTypesmapOwnership type to Owners map, populated via mutation hook.Searchable
lastModifiedAuditStampAudit stamp containing who last modified the record and when. A value of 0 in the time field indi...

domains

Links from an Asset to its Domains

FieldTypeRequiredDescriptionAnnotations
domainsstring[]The Domains attached to an AssetSearchable, → AssociatedWith

structuredProperties

Properties about an entity governed by StructuredPropertyDefinition

FieldTypeRequiredDescriptionAnnotations
propertiesStructuredPropertyValueAssignment[]Custom property bag.

subTypes

Sub Types. Use this aspect to specialize a generic Entity e.g. Making a Dataset also be a View or also be a LookerExplore

FieldTypeRequiredDescriptionAnnotations
typeNamesstring[]The names of the specific types.Searchable

dataPlatformInstance

The specific instance of the data platform that this entity belongs to

FieldTypeRequiredDescriptionAnnotations
platformstringData PlatformSearchable
instancestringInstance of the data platform (e.g. db instance)Searchable (platformInstance)

browsePathsV2

Shared aspect containing a Browse Path to be indexed for an entity.

FieldTypeRequiredDescriptionAnnotations
pathBrowsePathEntry[]A valid browse path for the entity. This field is provided by DataHub by default. This aspect is ...Searchable

globalTags

Tag aspect used for applying tags to an entity

FieldTypeRequiredDescriptionAnnotations
tagsTagAssociation[]Tags associated with a given entitySearchable, → TaggedWith

glossaryTerms

Related business terms information

FieldTypeRequiredDescriptionAnnotations
termsGlossaryTermAssociation[]The related business terms
auditStampAuditStampAudit stamp containing who reported the related business term

Common Types

These types are used across multiple aspects in this entity.

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

  • time (long): When did the resource/association/sub-resource move into the specific lifecyc...
  • actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
  • impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
  • message (string?): Additional context around how DataHub was informed of the particular change. ...

Relationships

Self

These are the relationships to itself, stored in this entity's aspects

  • RelatedDocument (via documentInfo.relatedDocuments.document)
  • IsChildOf (via documentInfo.parentDocument.document)
  • IsDraftOf (via documentInfo.draftOf.document)

Outgoing

These are the relationships stored in this entity's aspects

  • RelatedAsset

    • Container via documentInfo.relatedAssets.asset
    • Dataset via documentInfo.relatedAssets.asset
    • DataJob via documentInfo.relatedAssets.asset
    • DataFlow via documentInfo.relatedAssets.asset
    • Dashboard via documentInfo.relatedAssets.asset
    • Chart via documentInfo.relatedAssets.asset
    • Application via documentInfo.relatedAssets.asset
    • DataPlatform via documentInfo.relatedAssets.asset
    • MlModel via documentInfo.relatedAssets.asset
    • MlModelGroup via documentInfo.relatedAssets.asset
    • MlPrimaryKey via documentInfo.relatedAssets.asset
    • MlFeatureTable via documentInfo.relatedAssets.asset
    • Corpuser via documentInfo.relatedAssets.asset
    • CorpGroup via documentInfo.relatedAssets.asset
    • DataProduct via documentInfo.relatedAssets.asset
    • Domain via documentInfo.relatedAssets.asset
    • GlossaryTerm via documentInfo.relatedAssets.asset
    • GlossaryNode via documentInfo.relatedAssets.asset
    • Tag via documentInfo.relatedAssets.asset
    • StructuredProperty via documentInfo.relatedAssets.asset
  • OwnedBy

    • Corpuser via ownership.owners.owner
    • CorpGroup via ownership.owners.owner
  • ownershipType

    • OwnershipType via ownership.owners.typeUrn
  • AssociatedWith

    • Domain via domains.domains
  • TaggedWith

    • Tag via globalTags.tags
  • TermedWith

    • GlossaryTerm via glossaryTerms.terms.urn

Global Metadata Model

Global Graph