Skip to main content
Version: Next

Automatic Lineage Extraction

DataHub supports automatic table- and column-level lineage detection from BigQuery, Snowflake, dbt, Looker, PowerBI, and 20+ modern data tools. For data tools with limited native lineage tracking, DataHub's SQL Parser detects lineage with 97-99% accuracy, ensuring teams will have high quality lineage graphs across all corners of their data stack.

Types of Lineage Connections

Types of lineage connections supported in DataHub and the example codes are as follows.

Automatic Lineage Extraction Support

This is a summary of automatic lineage extraction support in our data source. Please refer to the Important Capabilities table in the source documentation. Note that even if the source does not support automatic extraction, you can still add lineage manually using our API & SDKs.

SourceTable-Level LineageColumn-Level LineageRelated Configs
ABS Data Lake
Athena- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
BigQuery- enable_stateful_lineage_ingestion
- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- gcs_lineage_config
- lineage_use_sql_parser
- lineage_sql_parser_use_raw_names
- extract_column_lineage
- extract_lineage_from_catalog
- include_table_lineage
- include_column_lineage_with_gcs
- upstream_lineage_in_report
Business Glossary
Cassandra
ClickHouse clickhouse- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- include_table_lineage
ClickHouse clickhouse-usage
CockroachDB- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
CSV Enricher
Databricks- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- include_table_lineage
- include_external_lineage
- include_column_lineage
- column_lineage_column_limit
DataHub
DataHubApply
DataHubDebug
DataHubGc
DataHubMockData
dbt dbt- incremental_lineage
- prefer_sql_parser_lineage
- skip_sources_in_lineage
- include_column_lineage
dbt dbt-cloud- incremental_lineage
- prefer_sql_parser_lineage
- skip_sources_in_lineage
- include_column_lineage
Delta Lake
Dremio- include_query_lineage
Druid- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
Elasticsearch
Feast
File Based Lineage
Fivetran- include_column_lineage
Glue- emit_s3_lineage
- glue_s3_lineage_direction
- include_column_lineage
Google Cloud Storage
Grafana
Hex
Hive- incremental_lineage
- include_view_lineage
- include_view_column_lineage
- emit_storage_lineage
- hive_storage_lineage_direction
- include_column_lineage
Hive Metastore hive-metastore- incremental_lineage
- include_table_location_lineage
- include_view_column_lineage
Hive Metastore presto-on-hive- incremental_lineage
- include_table_location_lineage
- include_view_column_lineage
Kafka
Kafka Connect- convert_lineage_urns_to_lowercase
Looker looker- extract_column_level_lineage
Looker lookml- extract_column_level_lineage
MariaDB- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
Metabase
Metadata File
Microsoft SQL Server- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- include_lineage
MLflow
Mode
MongoDB
MySQL- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
Neo4j
NiFi- incremental_lineage
Okta
Oracle- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
Postgres- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
PowerBI- incremental_lineage
- extract_lineage
- convert_lineage_urns_to_lowercase
- enable_advance_lineage_sql_construct
- extract_column_level_lineage
PowerBI Report Server
Preset
Presto- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- ingest_lineage_to_connectors
Qlik Sense
Redash
Redshift- enable_stateful_lineage_ingestion
- incremental_lineage
- s3_lineage_config
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- use_lineage_v2
- lineage_v2_generate_queries
- include_table_lineage
- include_copy_lineage
- include_share_lineage
- include_unload_lineage
- include_table_rename_lineage
- table_lineage_mode
- extract_column_level_lineage
- resolve_temp_table_in_lineage
S3 / Local Files
SageMaker
Salesforce
SAP Analytics Cloud- incremental_lineage
SAP HANA- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
Sigma- extract_lineage
- workbook_lineage_pattern
Slack
Snowflake- enable_stateful_lineage_ingestion
- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- include_table_lineage
- ignore_start_time_lineage
- upstream_lineage_in_report
- include_column_lineage
SQL Queries
Superset
Tableau- extract_column_level_lineage
- lineage_overrides
- extract_lineage_from_unsupported_custom_sql_queries
- force_extraction_of_lineage_from_custom_sql_queries
Teradata- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- include_historical_lineage
- include_table_lineage
Trino trino- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- ingest_lineage_to_connectors
Trino starburst-trino-usage
Vertex AI
Vertica- incremental_lineage
- include_table_location_lineage
- include_view_lineage
- include_view_column_lineage
- include_projection_lineage

SQL Parser Lineage Extraction

If you're using a different database system for which we don't support column-level lineage out of the box, but you do have a database query log available, we have a SQL queries connector that generates column-level lineage and detailed table usage statistics from the query log.

If these does not suit your needs, you can use the new DataHubGraph.parse_sql_lineage() method in our SDK. (See the source code here)

For more information, refer to the Extracting Column-Level Lineage from SQL

Our Roadmap

We're actively working on expanding lineage support for new data sources. Visit our Official Roadmap for upcoming updates!

References