Skip to main content

Demo Data

Overview

Demo Data is a DataHub utility or metadata-focused integration. Learn more in the official Demo Data documentation.

The DataHub integration for Demo Data covers metadata entities and operational objects relevant to this connector. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.

Concept Mapping

While the specific concept mapping is still pending, this shows the generic concept mapping in DataHub.

Source ConceptDataHub ConceptNotes
Platform/account/project scopePlatform Instance, ContainerOrganizes assets within the platform context.
Core technical asset (for example table/view/topic/file)DatasetPrimary ingested technical asset.
Schema fields / columnsSchemaFieldIncluded when schema extraction is supported.
Ownership and collaboration principalsCorpUser, CorpGroupEmitted by modules that support ownership and identity metadata.
Dependencies and processing relationshipsLineage edgesAvailable when lineage extraction is supported and enabled.

Module demo-data

Important Capabilities

Capability metadata is not explicitly declared for this module. Refer to module documentation and configuration sections below.

Overview

The demo-data source loads curated data packs into DataHub. By default it loads the bootstrap sample data (datasets, dashboards, users, and tags) with original timestamps.

It also supports loading named packs from the DataHub registry (e.g. showcase-ecommerce, covid-bigquery) or custom URLs, with optional time-shifting to make timestamps appear recent.

Use this source for demos, testing, or bootstrapping a DataHub instance with realistic metadata.

Prerequisites

A running DataHub instance. No external credentials or network access beyond the pack URL is required.

Install the Plugin

pip install 'acryl-datahub[demo-data]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

# Zero-config: load bootstrap sample data
source:
type: demo-data
config: {}

# Or load a specific data pack with time-shifting:
# source:
# type: demo-data
# config:
# pack_name: "showcase-ecommerce"
# no_time_shift: false

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
as_of
One of string, null
ISO 8601 datetime to use as the time-shift target (default: current time).
Default: None
no_cache
boolean
Force re-download even if the pack is cached.
Default: False
no_time_shift
boolean
If true, load with original timestamps (no time-shifting).
Default: True
pack_name
One of string, null
Name of a data pack from the registry (e.g. 'bootstrap', 'showcase-ecommerce').
Default: bootstrap
pack_url
One of string, null
HTTP(S) URL to an MCP/MCE JSON file. Use instead of pack_name for custom packs.
Default: None
trust_community
boolean
Allow loading community-contributed packs without warning.
Default: False
trust_custom
boolean
Allow loading from unverified URLs without warning.
Default: False

Capabilities

  • Loads any named pack from the DataHub registry (bootstrap, showcase-ecommerce, covid-bigquery)
  • Loads custom packs from arbitrary HTTP(S) URLs via pack_url
  • Time-shifts timestamps so ingested metadata appears current (set no_time_shift: false)
  • SHA256 integrity verification for registry packs
  • Local caching to avoid repeated downloads

Limitations

  • Data packs are read-only collections of MCPs; they cannot be modified before loading.
  • Time-shifting adjusts all temporal fields by a fixed offset — relative ordering is preserved but absolute times may not match real-world events.
  • Custom URL packs (pack_url) bypass SHA256 verification unless the pack includes a checksum.

Troubleshooting

  • "Pack not found": Verify the pack name with datahub datapack list. Pack names are case-sensitive.
  • Trust errors: Community and custom packs require explicit opt-in via trust_community: true or trust_custom: true.
  • Download failures: Check network connectivity to the pack URL. Use no_cache: true to force a fresh download if a cached file is corrupted.

Code Coordinates

  • Class Name: datahub.ingestion.source.demo_data.DemoDataSource
  • Browse on GitHub
Questions?

If you've got any questions on configuring ingestion for Demo Data, feel free to ping us on our Slack.

💡 Contributing to this documentation

This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.

Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.