Container
Why Would You Use Containers?
The Container entity represents a logical grouping of entities, such as datasets, data processing instances, or even other containers. It helps users organize and manage metadata in a hierarchical structure, making it easier to navigate and understand relationships between different entities.
How is a Container related to other entities?
Parent-Child Relationship : Containers can contain other entities such as datasets, charts, dashboards, data jobs, ML models, and even other containers (nested containers). For example, a dataset can have a container aspect that links it to the schema or folder (container) it belongs to.
Hierarchical Organization : Containers can be nested, forming a hierarchy (e.g., a database container contains schema containers, which contain table containers). This enables a folder-like browsing experience in the DataHub UI.
Relationships in the Metadata Model : Many entities (datasets, data jobs, ML models, etc.) have a container aspect that links them to their parent container. Containers themselves can have a parentContainer aspect for nesting.
Goal Of This Guide
This guide will show you how to
- Create a container
Prerequisites
For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. For detailed steps, please refer to Datahub Quickstart Guide.
Create Container
- Python
# Inlined from /metadata-ingestion/examples/library/create_container.py
from datahub.emitter.mcp_builder import ContainerKey
from datahub.sdk import Container, DataHubClient
client = DataHubClient.from_env()
# datajob will inherit the platform and platform instance from the flow
container = Container(
container_key=ContainerKey(platform="mlflow", name="airline_forecast_experiment"),
display_name="Airline Forecast Experiment",
)
client.entities.upsert(container)