Skip to main content
Version: Next

DataHubMockData

Testing

This source is for generating mock data for testing purposes. Expect breaking changes as we iterate on the mock data source.

CLI based Ingestion

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
enabled
boolean
Whether this source is enabled
Default: True
gen_1
LineageConfigGen1
Configuration for lineage data generation
gen_1.emit_lineage
boolean
Whether to emit lineage data for testing purposes. When False, no lineage data is generated regardless of other settings.
Default: False
gen_1.level_subtypes
map(str,string)
gen_1.lineage_fan_out
integer
Number of downstream tables that each upstream table connects to. This controls the 'width' of the lineage graph. Higher values create more parallel downstream tables per level.
Default: 3
gen_1.lineage_fan_out_after_first_hop
integer
Optional limit on fanout for hops after the first hop. When set, prevents exponential growth by limiting the number of downstream tables per upstream table at levels 2 and beyond. When None, uses the standard exponential growth (lineage_fan_out^level).
gen_1.lineage_hops
integer
Number of hops (levels) in the lineage graph. This controls the 'depth' of the lineage graph. Level 0 is the root table, and each subsequent level contains downstream tables. Higher values create deeper lineage chains.
Default: 2
gen_1.subtype_pattern
Enum
Pattern for determining SubTypes. Options: 'alternating', 'all_table', 'all_view', 'level_based'
Default: alternating

Code Coordinates

  • Class Name: datahub.ingestion.source.mock_data.datahub_mock_data.DataHubMockDataSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for DataHubMockData, feel free to ping us on our Slack.