Skip to main content
Version: Next

DataHubFile

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

  • ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
  • Searchable: This field is indexed and can be searched in DataHub's search interface
  • Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
  • → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

dataHubFileInfo

Information about a DataHub file - a file stored in S3 for use within DataHub platform features like documentation, home pages, and announcements.

FieldTypeRequiredDescriptionAnnotations
bucketStorageLocationBucketStorageLocationInfo about where a file is stored
originalFileNamestringThe original filename as uploaded by the userSearchable
mimeTypestringMIME type of the file (e.g., image/png, application/pdf)Searchable
sizeInByteslongSize of the file in bytes
scenarioFileUploadScenarioThe scenario/context in which this file was uploadedSearchable
referencedByAssetstringOptional URN of the entity this file is associated with (e.g., the dataset whose docs contain thi...Searchable, → ReferencedBy
schemaFieldstringThe dataset schema field urn this file is referenced bySearchable, → ReferencedBy
createdAuditStampTimestamp when this file was created and by whomSearchable
contentHashstringSHA-256 hash of file contentsSearchable

Common Types

These types are used across multiple aspects in this entity.

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

  • time (long): When did the resource/association/sub-resource move into the specific lifecyc...
  • actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
  • impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
  • message (string?): Additional context around how DataHub was informed of the particular change. ...

Relationships

Outgoing

These are the relationships stored in this entity's aspects

  • ReferencedBy

    • Dataset via dataHubFileInfo.referencedByAsset
    • Chart via dataHubFileInfo.referencedByAsset
    • Container via dataHubFileInfo.referencedByAsset
    • Dashboard via dataHubFileInfo.referencedByAsset
    • DataFlow via dataHubFileInfo.referencedByAsset
    • DataJob via dataHubFileInfo.referencedByAsset
    • GlossaryTerm via dataHubFileInfo.referencedByAsset
    • GlossaryNode via dataHubFileInfo.referencedByAsset
    • MlModel via dataHubFileInfo.referencedByAsset
    • MlFeature via dataHubFileInfo.referencedByAsset
    • Notebook via dataHubFileInfo.referencedByAsset
    • MlFeatureTable via dataHubFileInfo.referencedByAsset
    • MlPrimaryKey via dataHubFileInfo.referencedByAsset
    • MlModelGroup via dataHubFileInfo.referencedByAsset
    • Domain via dataHubFileInfo.referencedByAsset
    • DataProduct via dataHubFileInfo.referencedByAsset
    • BusinessAttribute via dataHubFileInfo.referencedByAsset
    • SchemaField via dataHubFileInfo.schemaField

Global Metadata Model

Global Graph