Automatic Lineage Extraction
DataHub supports automatic table- and column-level lineage detection from BigQuery, Snowflake, dbt, Looker, PowerBI, and 20+ modern data tools. For data tools with limited native lineage tracking, DataHub's SQL Parser detects lineage with 97-99% accuracy, ensuring teams will have high quality lineage graphs across all corners of their data stack.
Types of Lineage Connections
Types of lineage connections supported in DataHub and the example codes are as follows.
Automatic Lineage Extraction Support
This is a summary of automatic lineage extraction support in our data source. Please refer to the Important Capabilities table in the source documentation. Note that even if the source does not support automatic extraction, you can still add lineage manually using our API & SDKs.
Source | Table-Level Lineage | Column-Level Lineage | Related Configs |
---|---|---|---|
ABS Data Lake | ❌ | ❌ | |
Athena | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage |
BigQuery | ✅ | ✅ | - enable_stateful_lineage_ingestion - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage - gcs_lineage_config - lineage_use_sql_parser - lineage_sql_parser_use_raw_names - extract_column_lineage - extract_lineage_from_catalog - include_table_lineage - include_column_lineage_with_gcs - upstream_lineage_in_report |
Business Glossary | ❌ | ❌ | |
Cassandra | ❌ | ❌ | |
ClickHouse clickhouse | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage - include_table_lineage |
ClickHouse clickhouse-usage | ❌ | ❌ | |
CockroachDB | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage |
CSV Enricher | ❌ | ❌ | |
Databricks | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage - include_table_lineage - include_external_lineage - include_column_lineage - column_lineage_column_limit |
DataHub | ❌ | ❌ | |
DataHubApply | ❌ | ❌ | |
DataHubDebug | ❌ | ❌ | |
DataHubGc | ❌ | ❌ | |
DataHubMockData | ❌ | ❌ | |
dbt dbt | ✅ | ✅ | - incremental_lineage - prefer_sql_parser_lineage - skip_sources_in_lineage - include_column_lineage |
dbt dbt-cloud | ✅ | ✅ | - incremental_lineage - prefer_sql_parser_lineage - skip_sources_in_lineage - include_column_lineage |
Delta Lake | ❌ | ❌ | |
Dremio | ✅ | ❌ | - include_query_lineage |
Druid | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage |
Elasticsearch | ❌ | ❌ | |
Feast | ✅ | ❌ | |
File Based Lineage | ✅ | ✅ | |
Fivetran | ❌ | ✅ | - include_column_lineage |
Glue | ✅ | ✅ | - emit_s3_lineage - glue_s3_lineage_direction - include_column_lineage |
Google Cloud Storage | ❌ | ❌ | |
Grafana | ❌ | ❌ | |
Hex | ❌ | ❌ | |
Hive | ✅ | ✅ | - incremental_lineage - include_view_lineage - include_view_column_lineage - emit_storage_lineage - hive_storage_lineage_direction - include_column_lineage |
Hive Metastore hive-metastore | ❌ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_column_lineage |
Hive Metastore presto-on-hive | ❌ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_column_lineage |
Kafka | ❌ | ❌ | |
Kafka Connect | ✅ | ❌ | - convert_lineage_urns_to_lowercase |
Looker looker | ✅ | ✅ | - extract_column_level_lineage |
Looker lookml | ✅ | ✅ | - extract_column_level_lineage |
MariaDB | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage |
Metabase | ✅ | ❌ | |
Metadata File | ❌ | ❌ | |
Microsoft SQL Server | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage - include_lineage |
MLflow | ❌ | ❌ | |
Mode | ✅ | ✅ | |
MongoDB | ❌ | ❌ | |
MySQL | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage |
Neo4j | ❌ | ❌ | |
NiFi | ✅ | ❌ | - incremental_lineage |
Okta | ❌ | ❌ | |
Oracle | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage |
Postgres | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage |
PowerBI | ✅ | ✅ | - incremental_lineage - extract_lineage - convert_lineage_urns_to_lowercase - enable_advance_lineage_sql_construct - extract_column_level_lineage |
PowerBI Report Server | ❌ | ❌ | |
Preset | ✅ | ❌ | |
Presto | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage - ingest_lineage_to_connectors |
Qlik Sense | ✅ | ✅ | |
Redash | ✅ | ❌ | |
Redshift | ✅ | ✅ | - enable_stateful_lineage_ingestion - incremental_lineage - s3_lineage_config - include_table_location_lineage - include_view_lineage - include_view_column_lineage - use_lineage_v2 - lineage_v2_generate_queries - include_table_lineage - include_copy_lineage - include_share_lineage - include_unload_lineage - include_table_rename_lineage - table_lineage_mode - extract_column_level_lineage - resolve_temp_table_in_lineage |
S3 / Local Files | ❌ | ❌ | |
SageMaker | ✅ | ❌ | |
Salesforce | ❌ | ❌ | |
SAP Analytics Cloud | ✅ | ❌ | - incremental_lineage |
SAP HANA | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage |
Sigma | ✅ | ❌ | - extract_lineage - workbook_lineage_pattern |
Slack | ❌ | ❌ | |
Snowflake | ✅ | ✅ | - enable_stateful_lineage_ingestion - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage - include_table_lineage - ignore_start_time_lineage - upstream_lineage_in_report - include_column_lineage |
SQL Queries | ✅ | ✅ | |
Superset | ✅ | ❌ | |
Tableau | ✅ | ✅ | - extract_column_level_lineage - lineage_overrides - extract_lineage_from_unsupported_custom_sql_queries - force_extraction_of_lineage_from_custom_sql_queries |
Teradata | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage - include_historical_lineage - include_table_lineage |
Trino trino | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage - ingest_lineage_to_connectors |
Trino starburst-trino-usage | ❌ | ❌ | |
Vertex AI | ❌ | ❌ | |
Vertica | ✅ | ✅ | - incremental_lineage - include_table_location_lineage - include_view_lineage - include_view_column_lineage - include_projection_lineage |
SQL Parser Lineage Extraction
If you're using a different database system for which we don't support column-level lineage out of the box, but you do have a database query log available, we have a SQL queries connector that generates column-level lineage and detailed table usage statistics from the query log.
If these does not suit your needs, you can use the new DataHubGraph.parse_sql_lineage()
method in our SDK. (See the source code here)
For more information, refer to the Extracting Column-Level Lineage from SQL
We're actively working on expanding lineage support for new data sources. Visit our Official Roadmap for upcoming updates!