Skip to main content
Version: Next

Container

The container entity is a core entity in the metadata model that represents a grouping of related data assets. Containers provide hierarchical organization for datasets, charts, dashboards, and other containers, enabling navigation and structure discovery within data platforms.

Identity

Containers are uniquely identified by a GUID (Globally Unique Identifier) that is typically derived from a combination of attributes specific to the container type. Unlike datasets which use platform, name, and environment, containers use a more flexible identification scheme based on their hierarchical properties.

The URN structure for a container is: urn:li:container:{guid}

The GUID is typically computed from container-specific properties such as:

  • Database containers: platform + instance + database name
  • Schema containers: platform + instance + database + schema name
  • Project containers: platform + instance + project_id
  • Folder containers: platform + instance + folder_abs_path
  • Bucket containers: platform + instance + bucket_name

URN Examples

urn:li:container:b5e95fce839e7d78151ed7e0a7420d84

The GUID is generated using the datahub_guid() function from a dictionary of properties. For example, a Snowflake schema container would be identified by:

{
"platform": "snowflake",
"instance": "prod_instance",
"database": "analytics",
"schema": "reporting"
}

Real-World Concepts

Containers represent various hierarchical structures in data platforms:

  • Databases: Top-level organizational units in relational systems (MySQL, PostgreSQL, Snowflake)
  • Schemas: Logical groupings within databases (Snowflake schemas, PostgreSQL schemas)
  • Projects: Organizational units in cloud platforms (BigQuery projects)
  • Datasets: Logical groupings in cloud platforms (BigQuery datasets)
  • Folders: Directory structures in file systems and data lakes (S3 folders, ADLS directories)
  • Buckets: Top-level storage containers in cloud object stores (S3 buckets, GCS buckets)
  • Workspaces: Organizational units in BI platforms (Power BI workspaces, Tableau sites)
  • Catalogs: Top-level organizational units in data catalogs (Unity Catalog, Iceberg catalogs)
  • Metastores: Storage metadata repositories (Hive metastore, Unity metastore)

Important Capabilities

Container Properties

The containerProperties aspect contains metadata inherited from the source system:

  • name: Display name of the container (required)
  • qualifiedName: Fully-qualified name (optional, e.g., "prod.analytics.reporting")
  • description: Description from the source system
  • env: Environment indicator (PROD, DEV, QA, etc.)
  • customProperties: Additional key-value properties from the source system
  • externalUrl: Link to the container in the source system
  • created: Timestamp when the container was created in the source system
  • lastModified: Timestamp when the container was last modified in the source system

Editable Container Properties

The editableContainerProperties aspect allows users to override or add information via the UI:

  • description: User-provided description that supplements or overrides the source system description

This separation ensures that metadata from source systems doesn't conflict with user-provided annotations.

Hierarchical Relationships

Containers support nested hierarchies through the container aspect, which links a container to its parent container. This enables multi-level organizational structures:

Platform (implicit)
└── Database Container
└── Schema Container
└── Dataset

For example, in Snowflake:

Snowflake Platform
└── ANALYTICS_DB (Database Container)
└── REPORTING (Schema Container)
└── SALES_METRICS (Dataset)
└── REVENUE_TABLE (Dataset)

Subtypes

The subTypes aspect specifies the type of container, which helps the UI render appropriate icons and behaviors. Common subtypes include:

  • Database: Relational database containers
  • Schema: Schema-level containers within databases
  • Project: Cloud project containers (GCP, Azure)
  • Dataset: BigQuery dataset containers
  • Folder: File system folders
  • Bucket: Object storage buckets
  • Workspace: BI platform workspaces
  • Catalog: Data catalog containers
  • Metastore: Metadata storage containers
  • MLflow Experiment (MLAssetSubTypes.MLFLOW_EXPERIMENT): ML experiment containers that organize training runs

ML Experiments as Containers

Machine learning experiments are modeled as containers with the MLFLOW_EXPERIMENT subtype. This pattern enables organizing related training runs (which are dataProcessInstance entities) into logical groups for comparison and tracking:

ML Experiment (Container)
├── Training Run 1 (DataProcessInstance)
├── Training Run 2 (DataProcessInstance)
└── Training Run 3 (DataProcessInstance)

Training runs belong to experiments through the container aspect. This structure mirrors common ML platform patterns (like MLflow) and enables:

  • Comparing metrics across multiple training attempts
  • Tracking the evolution of a model through iterations
  • Organizing training work by project or objective

For more information on ML experiments and training runs, see:

Containable Entities

The following entity types can be contained within a container:

  • Datasets
  • Charts
  • Dashboards
  • DataProcessInstances (e.g., training runs in ML experiments)
  • Other Containers (for nested hierarchies)

Code Examples

Create a Database Container

Python SDK: Create a database container
# Inlined from /metadata-ingestion/examples/library/container_create_database.py
# metadata-ingestion/examples/library/container_create_database.py
from datahub.emitter.mcp_builder import DatabaseKey
from datahub.sdk import Container, DataHubClient

client = DataHubClient.from_env()

container = Container(
container_key=DatabaseKey(
platform="snowflake",
instance="production",
database="analytics_db",
),
display_name="Analytics Database",
description="Main analytics database containing reporting and metrics data",
subtype="Database",
external_url="https://app.snowflake.com/analytics_db",
parent_container=None,
)

client.entities.upsert(container)

print(f"Created database container with URN: {container.urn}")

Create a Schema Container with Parent

Python SDK: Create a schema container with parent database
# Inlined from /metadata-ingestion/examples/library/container_create_schema.py
# metadata-ingestion/examples/library/container_create_schema.py
from datahub.emitter.mcp_builder import DatabaseKey, SchemaKey
from datahub.sdk import Container, DataHubClient

client = DataHubClient.from_env()

# First, create the database container
database_key = DatabaseKey(
platform="snowflake",
instance="production",
database="analytics_db",
)

database_container = Container(
container_key=database_key,
display_name="Analytics Database",
description="Main analytics database",
subtype="Database",
)

client.entities.upsert(database_container)
print(f"Created database container: {database_container.urn}")

# Create a schema container within the database
schema_key = SchemaKey(
platform="snowflake",
instance="production",
database="analytics_db",
schema="reporting",
)

schema_container = Container(
container_key=schema_key,
display_name="Reporting Schema",
description="Schema containing all reporting tables and views",
subtype="Schema",
)

client.entities.upsert(schema_container)
print(f"Created schema container: {schema_container.urn}")
print("Schema container is nested under database container")

Add Metadata to a Container

Python SDK: Add tags, terms, and ownership to a container
# Inlined from /metadata-ingestion/examples/library/container_add_metadata.py
from datahub.emitter.mcp_builder import DatabaseKey
from datahub.sdk import ContainerUrn, CorpUserUrn, DataHubClient, DomainUrn, TagUrn

client = DataHubClient.from_env()

database_key = DatabaseKey(
platform="snowflake",
instance="production",
database="analytics_db",
)

container = client.entities.get(ContainerUrn.from_string(database_key.as_urn()))

container.set_display_name("Analytics Database")
container.set_description(
"Main analytics database containing reporting and metrics data"
)
container.set_subtype("Database")
container.set_external_url("https://app.snowflake.com/analytics_db")

container.set_tags([TagUrn("production"), TagUrn("analytics"), TagUrn("pii")])

container.set_terms(["urn:li:glossaryTerm:Finance.ReportingData"])

container.set_owners(
[
(CorpUserUrn("john.doe"), "DATAOWNER"),
(CorpUserUrn("analytics-team"), "TECHNICAL_OWNER"),
]
)

container.set_domain(DomainUrn("Analytics"))

container.set_links(
[
(
"https://wiki.company.com/analytics-db",
"Database Documentation",
),
(
"https://jira.company.com/ANALYTICS-123",
"Setup Ticket",
),
]
)

client.entities.update(container)

print(f"Updated container with comprehensive metadata: {container.urn}")
print(f" - Tags: {len(container.tags or [])} tags")
print(f" - Terms: {len(container.terms or [])} terms")
print(f" - Owners: {len(container.owners or [])} owners")
print(f" - Links: {len(container.links or [])} links")
print(f" - Domain: {container.domain}")

Query Container via REST API

Containers can be retrieved using the standard entity retrieval APIs:

Fetch container entity including all aspects
curl 'http://localhost:8080/entities/urn%3Ali%3Acontainer%3Ab5e95fce839e7d78151ed7e0a7420d84'

The response will include all aspects associated with the container, including properties, ownership, tags, terms, etc.

To find all entities within a container, use the relationships API:

Find all entities contained within a container
curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3Acontainer%3Ab5e95fce839e7d78151ed7e0a7420d84&types=IsPartOf'

This returns all entities (datasets, charts, dashboards, sub-containers) that have this container as their parent.

Integration Points

Relationship with Datasets

Datasets are the most common entities contained within containers. The relationship is established through the container aspect on the dataset, which points to the container URN.

# Dataset links to its parent container (schema)
dataset = Dataset(
platform="snowflake",
name="analytics_db.reporting.sales_table",
env="PROD",
parent_container=schema_key, # Links to schema container
)

Hierarchical Navigation

Containers enable hierarchical navigation in the DataHub UI through parent-child relationships:

  1. Top-down browsing: Users can navigate from databases to schemas to tables
  2. Bottom-up breadcrumbs: Datasets show their parent containers in breadcrumb trails
  3. Browse paths: Containers are used to generate browse paths automatically

GraphQL Resolvers

The container entity has specialized GraphQL resolvers:

  • ContainerEntitiesResolver: Retrieves all entities (datasets, charts, dashboards, sub-containers) within a container
  • ParentContainersResolver: Retrieves the full hierarchy of parent containers for any entity

These resolvers power the UI's hierarchical navigation and container overview pages.

Common Usage Patterns

  1. Database/Schema Hierarchy: Relational databases use Database and Schema containers
  2. Project/Dataset Hierarchy: BigQuery uses Project and Dataset containers
  3. Workspace/Folder Hierarchy: BI tools use Workspace containers for organization
  4. Bucket/Folder Hierarchy: Data lakes use Bucket and Folder containers
  5. Catalog/Schema Hierarchy: Modern catalogs (Unity, Iceberg) use Catalog and Schema containers

Notable Exceptions

GUID Stability

Container GUIDs must remain stable across ingestion runs. Since containers are identified by GUID rather than explicit properties in the URN, changing the GUID computation will create a new container entity instead of updating the existing one.

When creating custom containers, ensure that the properties used to generate the GUID are:

  • Stable across time
  • Unique within the platform
  • Derived from immutable source system identifiers

Self-Referential Containers

While containers can contain other containers, be careful not to create circular references. The parent-child relationship should form a directed acyclic graph (DAG), not a cycle.

Environment Handling

The env field in ContainerKey has special handling for backwards compatibility. In some sources, the platform instance was incorrectly set to the environment value. The backcompat_env_as_instance flag handles this case.

When using the env field:

  • Set it to a valid FabricType (PROD, DEV, QA, etc.)
  • Don't use it for platform instance identification
  • Use the separate instance field for multi-instance deployments

Platform Instance Association

Unlike datasets which embed platform instance in their URN, containers associate platform instances through the dataPlatformInstance aspect. This allows containers to be associated with specific instances of a data platform while maintaining a stable GUID.

Access Control

Containers support the access aspect, which can be used to model access control policies at the container level. This is particularly useful for:

  • Database-level permissions
  • Schema-level access control
  • Project-level authorization
  • Workspace-level security

Access controls set on containers can be inherited by contained entities, though this behavior depends on the specific platform's implementation.

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

  • ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
  • Searchable: This field is indexed and can be searched in DataHub's search interface
  • Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
  • → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

containerProperties

Information about a Asset Container as received from a 3rd party source system

FieldTypeRequiredDescriptionAnnotations
customPropertiesmapCustom property bag.Searchable
externalUrlstringURL where the reference existSearchable
namestringDisplay name of the Asset ContainerSearchable
qualifiedNamestringFully-qualified name of the ContainerSearchable
descriptionstringDescription of the Asset Container as it exists inside a source systemSearchable
envFabricTypeEnvironment for this flowSearchable
createdTimeStampA timestamp documenting when the asset was created in the source Data Platform (not on DataHub)Searchable
lastModifiedTimeStampA timestamp documenting when the asset was last modified in the source Data Platform (not on Data...Searchable

editableContainerProperties

Editable information about an Asset Container as defined on the DataHub Platform

FieldTypeRequiredDescriptionAnnotations
descriptionstringDescription of the Asset Container as its received on the DataHub PlatformSearchable (editedDescription)

dataPlatformInstance

The specific instance of the data platform that this entity belongs to

FieldTypeRequiredDescriptionAnnotations
platformstringData PlatformSearchable
instancestringInstance of the data platform (e.g. db instance)Searchable (platformInstance)

subTypes

Sub Types. Use this aspect to specialize a generic Entity e.g. Making a Dataset also be a View or also be a LookerExplore

FieldTypeRequiredDescriptionAnnotations
typeNamesstring[]The names of the specific types.Searchable

ownership

Ownership information of an entity.

FieldTypeRequiredDescriptionAnnotations
ownersOwner[]List of owners of the entity.
ownerTypesmapOwnership type to Owners map, populated via mutation hook.Searchable
lastModifiedAuditStampAudit stamp containing who last modified the record and when. A value of 0 in the time field indi...

deprecation

Deprecation status of an entity

FieldTypeRequiredDescriptionAnnotations
deprecatedbooleanWhether the entity is deprecated.Searchable
decommissionTimelongThe time user plan to decommission this entity.
notestringAdditional information about the entity deprecation plan, such as the wiki, doc, RB.
actorstringThe user URN which will be credited for modifying this deprecation content.
replacementstring

container

Link from an asset to its parent container

FieldTypeRequiredDescriptionAnnotations
containerstringThe parent container of an assetSearchable, → IsPartOf

globalTags

Tag aspect used for applying tags to an entity

FieldTypeRequiredDescriptionAnnotations
tagsTagAssociation[]Tags associated with a given entitySearchable, → TaggedWith

glossaryTerms

Related business terms information

FieldTypeRequiredDescriptionAnnotations
termsGlossaryTermAssociation[]The related business terms
auditStampAuditStampAudit stamp containing who reported the related business term

institutionalMemory

Institutional memory of an entity. This is a way to link to relevant documentation and provide description of the documentation. Institutional or tribal knowledge is very important for users to leverage the entity.

FieldTypeRequiredDescriptionAnnotations
elementsInstitutionalMemoryMetadata[]List of records that represent institutional memory of an entity. Each record consists of a link,...

browsePaths

Shared aspect containing Browse Paths to be indexed for an entity.

FieldTypeRequiredDescriptionAnnotations
pathsstring[]A list of valid browse paths for the entity. Browse paths are expected to be forward slash-separ...Searchable

status

The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc. This aspect is used to represent soft deletes conventionally.

FieldTypeRequiredDescriptionAnnotations
removedbooleanWhether the entity has been removed (soft-deleted).Searchable

domains

Links from an Asset to its Domains

FieldTypeRequiredDescriptionAnnotations
domainsstring[]The Domains attached to an AssetSearchable, → AssociatedWith

applications

Links from an Asset to its Applications

FieldTypeRequiredDescriptionAnnotations
applicationsstring[]The Applications attached to an AssetSearchable, → AssociatedWith

browsePathsV2

Shared aspect containing a Browse Path to be indexed for an entity.

FieldTypeRequiredDescriptionAnnotations
pathBrowsePathEntry[]A valid browse path for the entity. This field is provided by DataHub by default. This aspect is ...Searchable

structuredProperties

Properties about an entity governed by StructuredPropertyDefinition

FieldTypeRequiredDescriptionAnnotations
propertiesStructuredPropertyValueAssignment[]Custom property bag.

forms

Forms that are assigned to this entity to be filled out

FieldTypeRequiredDescriptionAnnotations
incompleteFormsFormAssociation[]All incomplete forms assigned to the entity.Searchable
completedFormsFormAssociation[]All complete forms assigned to the entity.Searchable
verificationsFormVerificationAssociation[]Verifications that have been applied to the entity via completed forms.Searchable

testResults

Information about a Test Result

FieldTypeRequiredDescriptionAnnotations
failingTestResult[]Results that are failingSearchable, → IsFailing
passingTestResult[]Results that are passingSearchable, → IsPassing

access

Aspect used for associating roles to a dataset or any asset

FieldTypeRequiredDescriptionAnnotations
rolesRoleAssociation[]List of Roles which needs to be associated

Common Types

These types are used across multiple aspects in this entity.

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

  • time (long): When did the resource/association/sub-resource move into the specific lifecyc...
  • actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
  • impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
  • message (string?): Additional context around how DataHub was informed of the particular change. ...

FormAssociation

Properties of an applied form.

Fields:

  • urn (string): Urn of the applied form
  • incompletePrompts (FormPromptAssociation[]): A list of prompts that are not yet complete for this form.
  • completedPrompts (FormPromptAssociation[]): A list of prompts that have been completed for this form.

TestResult

Information about a Test Result

Fields:

  • test (string): The urn of the test
  • type (TestResultType): The type of the result
  • testDefinitionMd5 (string?): The md5 of the test definition that was used to compute this result. See Test...
  • lastComputed (AuditStamp?): The audit stamp of when the result was computed, including the actor who comp...

TimeStamp

A standard event timestamp

Fields:

  • time (long): When did the event occur
  • actor (string?): Optional: The actor urn involved in the event.

Relationships

Self

These are the relationships to itself, stored in this entity's aspects

  • IsPartOf (via container.container)

Outgoing

These are the relationships stored in this entity's aspects

  • OwnedBy

    • Corpuser via ownership.owners.owner
    • CorpGroup via ownership.owners.owner
  • ownershipType

    • OwnershipType via ownership.owners.typeUrn
  • TaggedWith

    • Tag via globalTags.tags
  • TermedWith

    • GlossaryTerm via glossaryTerms.terms.urn
  • AssociatedWith

    • Domain via domains.domains
    • Application via applications.applications
    • Role via access.roles.urn
  • IsFailing

    • Test via testResults.failing
  • IsPassing

    • Test via testResults.passing

Incoming

These are the relationships stored in other entity's aspects

  • IsPartOf

    • Dataset via container.container
    • DataJob via container.container
    • DataFlow via container.container
    • DataProcessInstance via container.container
    • Chart via container.container
    • Dashboard via container.container

Global Metadata Model

Global Graph