Hex

Overview

Hex is a collaborative data workspace where teams build interactive notebooks combining SQL, Python, and visualizations.

The DataHub integration emits Hex Projects (Dashboards) and Components (Charts) along with workspace containers, ownership, tags from Collections/Status/Categories, usage statistics, and upstream lineage to the warehouses Hex queries. It also emits per-project run history, and per-project context documents for AI agent retrieval (opt-in via include_context_documents). Upstream lineage is produced directly from Hex's own APIs (SQL parsing by default; Hex's queriedTables API can be enabled on Hex Enterprise workspaces) — no warehouse ingestion dependency is required.

Concept Mapping

Hex Concept	DataHub Concept	Notes
`"hex"`	Data Platform
Workspace	Container	Parent container for all projects and components in the workspace.
Project	Dashboard	Subtype `Project`. Carries usage statistics, last refresh time from run history, and upstream lineage edges to warehouse datasets.
Component	Chart	Subtype `Component`. Reusable shared cell group with its own visualization; linked to importing projects via `DashboardInfo.charts`.
Collection	Tag	Emitted as `hex:collection:<name>` when `collections_as_tags` is enabled.
Status	Tag	Emitted as `hex:status:<name>` when `status_as_tag` is enabled.
Category	Tag	Emitted as `hex:category:<name>` when `categories_as_tags` is enabled.
Project Doc	Document	One per Project and per Component when `include_context_documents` is enabled. Hidden from global search; linked to the Dashboard/Chart for AI agents.

Other Hex concepts are not mapped to DataHub entities yet.

Module `hex`

Important Capabilities

Capability	Status	Notes
Asset Containers	✅	Enabled by default.
Column-level Lineage	✅	Column-level lineage via SQL parsing when datahub-api is configured. The graph-backed SchemaResolver fetches table schemas from DataHub on demand to expand SELECT * and resolve column references. Graceful degradation to dataset-level when datahub-api is absent.
Dataset Usage	✅	Supported by default. Supported for types - Project.
Descriptions	✅	Supported by default.
Detect Deleted Entities	✅	Enabled by default via stateful ingestion.
Extract Ownership	✅	Supported by default.
Extract Tags	✅	Status, categories, and collections emitted as tags.
Platform Instance	✅	Enabled by default.
Table-Level Lineage	✅	Enabled by default via queriedTables API (Hex Enterprise workspaces) or SQL parsing from cells (all Hex tiers). Applied to both projects and components. Unpublished entities always use SQL parsing. No warehouse ingestion dependency required.

Overview

The hex module ingests Hex Projects, Components, workspaces, and upstream lineage directly from the Hex REST API.

Prerequisites

Workspace Name

Open the workspace switcher dropdown in the top-left corner of the Hex app — the workspace name (and its slug) is shown next to each workspace entry. Use the slug value for workspace_name.

Authentication

The connector authenticates with a Hex Workspace token issued from Settings → API → Workspace tokens. Grant the token these read-only scopes:

Projects → Read access — list projects/components and read their detail and run history.
Cells → Read access — read SQL cells for lineage and context documents.
Read project queried tables — lineage from Hex's pre-resolved table list. Available on Hex Enterprise workspaces only; skip this scope on lower Hex tiers — the connector falls back to SQL parsing.
Data connections → Read access — map each Hex connection to its warehouse platform/database/schema.
Users → Read access — optional, only needed to auto-discover the workspace (org) UUID used in external URLs. Skip this scope and set workspace_id in the recipe instead.

No write scopes are required — the connector never modifies state in Hex.

Personal Access Tokens (PATs) also work but ingest with the issuing user's permissions, so projects the user cannot see in Hex will be skipped. Workspace tokens are recommended for production ingestion. See the Hex API overview for the full list of token types.

Lineage URN Alignment

Upstream URNs are built from Hex's /v1/data-connections response — platform, database, and schema all come from there. Configure connection_platform_map (keyed by Hex dataConnectionId) in two cases:

the upstream warehouse was ingested under a platform_instance — set the matching platform_instance so the URNs collide with the warehouse-ingested ones,
a Hex connection's type is unrecognized (deleted, custom, or the token lacks scope on /v1/data-connections) — set platform explicitly so its cells aren't skipped.

See Connection Platform Resolution in the sections below for the full configuration shape.

Install the Plugin

pip install 'acryl-datahub[hex]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
  type: hex
  config:
    # Hex workspace name — find it in the workspace switcher dropdown in the top-left corner of the Hex app
    workspace_name: my-workspace
    workspace_id: id  # optional override for workspace ID (UUID); if not set, the source will call the Hex API to fetch it
    token: "${HEX_TOKEN}"

    # (Optional) platform_instance / env for the Hex side (Dashboards, Charts).
    # platform_instance: prod_hex
    # env: PROD

    # (Optional) Feature toggles — all default to true. Uncomment to opt out.
    # include_components: false
    # include_lineage: false
    # include_run_history: false
    # set_ownership_from_email: false
    # collections_as_tags: false
    # status_as_tag: false
    # categories_as_tags: false

    # (Optional) Emit a DataHub Document per Project and per Component for
    # AI agent retrieval. Off by default — opt in if you use AI agents and
    # want context documents in your catalog.
    # include_context_documents: true

    # (Optional) Hex Enterprise workspaces only — use Hex's queriedTables API
    # as the primary lineage source for published projects/components.
    # Defaults to false (SQL-cell parsing for everything).
    # use_queried_tables_lineage: true

    # (Optional) Match the platform_instance under which the upstream warehouses
    # were ingested. Required so Hex's lineage URNs collide with the
    # warehouse-ingested ones. Keyed by Hex dataConnectionId (UUID).
    # connection_platform_map:
    #   "8f3a1c2d-4b5e-6789-abcd-ef0123456789":
    #     platform: snowflake
    #     platform_instance: prod_snowflake
    #     default_database: ANALYTICS
    #     default_schema: PUBLIC
    #   "1a2b3c4d-5e6f-7890-abcd-1234567890ab":
    #     platform: bigquery
    #     default_database: my-gcp-project
    #     default_schema: analytics

    # (Optional) Filter projects and components by title or category.
    # project_title_pattern:
    #   allow:
    #     - "^Production .*"
    # component_title_pattern:
    #   allow:
    #     - "^Shared .*"
    # category_pattern:
    #   deny:
    #     - "^Sandbox$"

    # (Optional) Cap projects per run — useful for staged rollouts.
    # WARNING: with stateful_ingestion enabled, projects beyond the limit are
    # soft-deleted on the next run.
    # max_projects: 50

    # Enable stale-entity removal (projects deleted in Hex are soft-deleted in DataHub).
    stateful_ingestion:
      enabled: true

# sink configs — see https://docs.datahub.com/docs/metadata-ingestion/sink_docs/datahub
sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

Config Details

Options
Schema

Note that a . is used to denote nested fields in the YAML recipe.

Field	Description
token ✅ string(password)	Hex Workspace Token with the 'Read projects' scope. Create one at Settings → API → Workspace tokens. The 'Read projects' scope is required to access project cells for lineage; tokens without it can enumerate projects but not read their content. See https://learn.hex.tech/docs/api-integrations/api/overview for token types.
workspace_name ✅ string	Hex workspace name. Find it in the workspace switcher dropdown in the top-left corner of the Hex app.
base_url string	Hex API base URL. For most Hex users, this will be https://app.hex.tech/api/v1. Single-tenant app users should replace this with the URL they use to access Hex. Default: https://app.hex.tech/api/v1
categories_as_tags boolean	Emit Hex Category as tags Default: True
collections_as_tags boolean	Emit Hex Collections as tags Default: True
include_components boolean	Include Hex Components in the ingestion Default: True
include_context_documents boolean	Emit a DataHub Document per Project and per Component containing SQL sources, visualisation metadata, and notebook documentation. Documents are hidden from global search and linked to the Dashboard/Chart for AI agent retrieval. Default: False
include_lineage boolean	Extract upstream lineage. Uses queriedTables API (Hex Enterprise workspaces) or falls back to parsing SQL from cells (all workspaces). No warehouse ingestion dependency required. Default: True
include_run_history boolean	Emit the most recent COMPLETED run as a DashboardInfo PATCH setting lastRefreshed. Default: True
max_projects One of integer, null	Maximum number of projects to process. Useful for testing or staged rollouts. Components discovered during project processing are not counted. Defaults to None (process all projects). WARNING: with stateful ingestion enabled, projects beyond this limit are soft-deleted on the next run. Default: None
page_size integer	Number of items to fetch per Hex API call. Default: 100
patch_metadata boolean	Emit metadata as patch events Default: False
platform_instance One of string, null	The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None
set_ownership_from_email boolean	Set ownership identity from owner/creator email Default: True
status_as_tag boolean	Emit Hex Status as tags Default: True
use_queried_tables_lineage boolean	Use Hex's queriedTables API (Hex Enterprise workspaces only) as the primary lineage source for published projects and components. Unpublished entities always fall back to SQL-cell parsing since queriedTables is only populated for published runs. Set to False to force SQL-cell parsing for everything. Default: False
workspace_id One of string, null	Hex workspace (org) UUID, used to build external URLs to the Hex app (e.g. https://app.hex.tech/<workspace_id>/hex/<project_id>). If left unset, the connector calls /users/me to auto-discover it — which requires the token to have 'Users → Read access'. Set this explicitly to avoid granting that scope. Find the UUID in any Hex project URL. Default: None
env string	The environment that all assets produced by this connector belong to Default: PROD
category_pattern AllowDenyPattern	A class to store allow deny regexes
category_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
component_title_pattern AllowDenyPattern	A class to store allow deny regexes
component_title_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
connection_platform_map map(str,HexConnectionDetail)	Per-connection override for upstream lineage URN construction.
connection_platform_map.`key`.platform One of string, null	DataHub platform name. Required only when Hex's connection type cannot be auto-resolved (deleted connections, permission gaps, custom types). Default: None
connection_platform_map.`key`.default_database One of string, null	Default outer-scope qualifier for unqualified table refs in SQL cells. For BigQuery this is the GCP project ID; for Snowflake/Postgres/Redshift/MSSQL the database; for Trino/Databricks/Presto the catalog. Leave empty for 2-part platforms (MySQL/MariaDB/Clickhouse) — set only `default_schema` there. Overrides the value auto-extracted from Hex's /v1/data-connections response. Default: None
connection_platform_map.`key`.default_schema One of string, null	Default inner-scope qualifier for unqualified table refs in SQL cells. For BigQuery this is the dataset; for Snowflake/Postgres/Redshift/MSSQL/Trino/Databricks/Presto/Athena the schema; for MySQL/MariaDB/Clickhouse the database name. Overrides the value auto-extracted from Hex's /v1/data-connections response. Default: None
connection_platform_map.`key`.platform_instance One of string, null	DataHub platform_instance the underlying warehouse was ingested under. Leave unset for warehouses ingested without one (e.g. typical BigQuery). Default: None
project_title_pattern AllowDenyPattern	A class to store allow deny regexes
project_title_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null	Configuration for stateful ingestion and stale metadata removal. Default: None
stateful_ingestion.enabled boolean	Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False Default: False
stateful_ingestion.fail_safe_threshold number	Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0
stateful_ingestion.remove_stale_metadata boolean	Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True

The JSONSchema for this configuration is inlined below.

{
  "$defs": {
    "AllowDenyPattern": {
      "additionalProperties": false,
      "description": "A class to store allow deny regexes",
      "properties": {
        "allow": {
          "default": [
            ".*"
          ],
          "description": "List of regex patterns to include in ingestion",
          "items": {
            "type": "string"
          },
          "title": "Allow",
          "type": "array"
        },
        "deny": {
          "default": [],
          "description": "List of regex patterns to exclude from ingestion.",
          "items": {
            "type": "string"
          },
          "title": "Deny",
          "type": "array"
        },
        "ignoreCase": {
          "anyOf": [
            {
              "type": "boolean"
            },
            {
              "type": "null"
            }
          ],
          "default": true,
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "title": "Ignorecase"
        }
      },
      "title": "AllowDenyPattern",
      "type": "object"
    },
    "HexConnectionDetail": {
      "additionalProperties": false,
      "description": "Per-connection override for upstream lineage URN construction.",
      "properties": {
        "platform": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "DataHub platform name. Required only when Hex's connection type cannot be auto-resolved (deleted connections, permission gaps, custom types).",
          "title": "Platform"
        },
        "platform_instance": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "DataHub platform_instance the underlying warehouse was ingested under. Leave unset for warehouses ingested without one (e.g. typical BigQuery).",
          "title": "Platform Instance"
        },
        "default_database": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Default outer-scope qualifier for unqualified table refs in SQL cells. For BigQuery this is the GCP project ID; for Snowflake/Postgres/Redshift/MSSQL the database; for Trino/Databricks/Presto the catalog. Leave empty for 2-part platforms (MySQL/MariaDB/Clickhouse) \u2014 set only `default_schema` there. Overrides the value auto-extracted from Hex's /v1/data-connections response.",
          "title": "Default Database"
        },
        "default_schema": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Default inner-scope qualifier for unqualified table refs in SQL cells. For BigQuery this is the dataset; for Snowflake/Postgres/Redshift/MSSQL/Trino/Databricks/Presto/Athena the schema; for MySQL/MariaDB/Clickhouse the database name. Overrides the value auto-extracted from Hex's /v1/data-connections response.",
          "title": "Default Schema"
        }
      },
      "title": "HexConnectionDetail",
      "type": "object"
    },
    "StatefulStaleMetadataRemovalConfig": {
      "additionalProperties": false,
      "description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
      "properties": {
        "enabled": {
          "default": false,
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "title": "Enabled",
          "type": "boolean"
        },
        "remove_stale_metadata": {
          "default": true,
          "description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
          "title": "Remove Stale Metadata",
          "type": "boolean"
        },
        "fail_safe_threshold": {
          "default": 75.0,
          "description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
          "maximum": 100.0,
          "minimum": 0.0,
          "title": "Fail Safe Threshold",
          "type": "number"
        }
      },
      "title": "StatefulStaleMetadataRemovalConfig",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "properties": {
    "env": {
      "default": "PROD",
      "description": "The environment that all assets produced by this connector belong to",
      "title": "Env",
      "type": "string"
    },
    "platform_instance": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
      "title": "Platform Instance"
    },
    "stateful_ingestion": {
      "anyOf": [
        {
          "$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Configuration for stateful ingestion and stale metadata removal."
    },
    "workspace_name": {
      "description": "Hex workspace name. Find it in the workspace switcher dropdown in the top-left corner of the Hex app.",
      "title": "Workspace Name",
      "type": "string"
    },
    "workspace_id": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Hex workspace (org) UUID, used to build external URLs to the Hex app (e.g. https://app.hex.tech/<workspace_id>/hex/<project_id>). If left unset, the connector calls /users/me to auto-discover it \u2014 which requires the token to have 'Users \u2192 Read access'. Set this explicitly to avoid granting that scope. Find the UUID in any Hex project URL.",
      "title": "Workspace Id"
    },
    "token": {
      "description": "Hex Workspace Token with the 'Read projects' scope. Create one at Settings \u2192 API \u2192 Workspace tokens. The 'Read projects' scope is required to access project cells for lineage; tokens without it can enumerate projects but not read their content. See https://learn.hex.tech/docs/api-integrations/api/overview for token types.",
      "format": "password",
      "title": "Token",
      "type": "string",
      "writeOnly": true
    },
    "base_url": {
      "default": "https://app.hex.tech/api/v1",
      "description": "Hex API base URL. For most Hex users, this will be https://app.hex.tech/api/v1. Single-tenant app users should replace this with the URL they use to access Hex.",
      "title": "Base Url",
      "type": "string"
    },
    "include_components": {
      "default": true,
      "description": "Include Hex Components in the ingestion",
      "title": "Include Components",
      "type": "boolean"
    },
    "page_size": {
      "default": 100,
      "description": "Number of items to fetch per Hex API call.",
      "minimum": 1,
      "title": "Page Size",
      "type": "integer"
    },
    "patch_metadata": {
      "default": false,
      "description": "Emit metadata as patch events",
      "title": "Patch Metadata",
      "type": "boolean"
    },
    "collections_as_tags": {
      "default": true,
      "description": "Emit Hex Collections as tags",
      "title": "Collections As Tags",
      "type": "boolean"
    },
    "status_as_tag": {
      "default": true,
      "description": "Emit Hex Status as tags",
      "title": "Status As Tag",
      "type": "boolean"
    },
    "categories_as_tags": {
      "default": true,
      "description": "Emit Hex Category as tags",
      "title": "Categories As Tags",
      "type": "boolean"
    },
    "project_title_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex pattern for project titles to filter in ingestion."
    },
    "component_title_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex pattern for component titles to filter in ingestion."
    },
    "set_ownership_from_email": {
      "default": true,
      "description": "Set ownership identity from owner/creator email",
      "title": "Set Ownership From Email",
      "type": "boolean"
    },
    "include_lineage": {
      "default": true,
      "description": "Extract upstream lineage. Uses queriedTables API (Hex Enterprise workspaces) or falls back to parsing SQL from cells (all workspaces). No warehouse ingestion dependency required.",
      "title": "Include Lineage",
      "type": "boolean"
    },
    "use_queried_tables_lineage": {
      "default": false,
      "description": "Use Hex's queriedTables API (Hex Enterprise workspaces only) as the primary lineage source for published projects and components. Unpublished entities always fall back to SQL-cell parsing since queriedTables is only populated for published runs. Set to False to force SQL-cell parsing for everything.",
      "title": "Use Queried Tables Lineage",
      "type": "boolean"
    },
    "connection_platform_map": {
      "additionalProperties": {
        "$ref": "#/$defs/HexConnectionDetail"
      },
      "description": "Per-connection lineage configuration, keyed by Hex dataConnectionId (UUID). Pins platform and platform_instance so upstream URNs match the warehouse's ingestion. Example: {\"<uuid>\": {\"platform\": \"snowflake\", \"platform_instance\": \"prod_snowflake\"}}",
      "title": "Connection Platform Map",
      "type": "object"
    },
    "include_run_history": {
      "default": true,
      "description": "Emit the most recent COMPLETED run as a DashboardInfo PATCH setting lastRefreshed.",
      "title": "Include Run History",
      "type": "boolean"
    },
    "include_context_documents": {
      "default": false,
      "description": "Emit a DataHub Document per Project and per Component containing SQL sources, visualisation metadata, and notebook documentation. Documents are hidden from global search and linked to the Dashboard/Chart for AI agent retrieval.",
      "title": "Include Context Documents",
      "type": "boolean"
    },
    "category_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex pattern for categories to filter in ingestion. This will exclude any project or component that has any category denied or not explicitly allowed."
    },
    "max_projects": {
      "anyOf": [
        {
          "minimum": 1,
          "type": "integer"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Maximum number of projects to process. Useful for testing or staged rollouts. Components discovered during project processing are not counted. Defaults to None (process all projects). WARNING: with stateful ingestion enabled, projects beyond this limit are soft-deleted on the next run.",
      "title": "Max Projects"
    }
  },
  "required": [
    "workspace_name",
    "token"
  ],
  "title": "HexSourceConfig",
  "type": "object"
}

Capabilities

Upstream Lineage

Lineage is tiered, with both tiers opt-out via include_lineage: false:

Tier 1 — queriedTables (Hex Enterprise workspaces only, opt-in via use_queried_tables_lineage: true): Hex's own runtime-proven table list for published projects and components, served by /v1/projects/{id}/queriedTables. Unpublished entities always fall back to Tier 2 since queriedTables is only populated for published runs. A 403 (non-Enterprise Hex workspace) falls back to Tier 2 for everything and emits a warning.
Tier 2 — SQL parsing via sqlglot (all workspaces, default): each cell is parsed with its connection's dialect.

Both tiers resolve warehouse URNs via /v1/data-connections (platform + default database/schema), overridable per-connection via connection_platform_map. For projects that import components, native project SQL is separated from inlined component SQL via the export API so component lineage isn't attributed twice. Cells whose dataConnectionId cannot be resolved are skipped with a structured warning — see Missing Upstream Lineage for triage.

Connection Platform Resolution

Hex's /v1/data-connections endpoint returns a type field that the connector maps to a DataHub platform via CONNECTION_TYPE_TO_DATAHUB_PLATFORM. Default database/schema qualifiers come from the same response.

Configure connection_platform_map (keyed by Hex dataConnectionId UUID) when:

The warehouse was ingested under a platform_instance — set the matching value so URNs collide.
The connection is deleted, permission-gapped, or a custom type — set platform explicitly so its cells aren't skipped.

Example:

connection_platform_map:
  "8f3a1c2d-4b5e-6789-abcd-ef0123456789":
    platform: snowflake
    platform_instance: prod_snowflake
    default_database: ANALYTICS
    default_schema: PUBLIC
  "1a2b3c4d-5e6f-7890-abcd-1234567890ab":
    platform: bigquery
    default_database: my-gcp-project

Migration from `query_fetcher`

Earlier versions of this connector derived lineage by querying DataHub for prior Hex-emitted query metadata (query_fetcher.py). That path has been removed: lineage now comes from SQL parsing of cells by default, or from Hex's queriedTables API when use_queried_tables_lineage: true is set on a Hex Enterprise workspace.

The following config fields fed only the old path and are now removed — drop them from your recipe (the connector will emit a warning if they are still present):

lineage_start_time
lineage_end_time
datahub_page_size

Migration: Components are now Charts

Components were previously emitted as Dashboard entities (subtype Component); they are now Chart entities, linked from their Project's DashboardInfo.charts. This changes their URN entity type, so any saved views, glossary/tag/ownership assignments, and policies that targeted the old Dashboard-typed Component URNs are lost and must be manually reapplied to the new Chart URNs.

Legacy Dashboard-typed Components left over from the old version are soft-deleted by stale-entity removal when stateful_ingestion was enabled on the old run. Because every Component changes URN type, a component-heavy workspace can exceed the stale-removal fail-safe (fail_safe_threshold, default 75%); if that happens, raise the threshold or perform a one-time bulk cleanup via the DataHub UI or CLI.

Stale Entity Removal

Enable by configuring stateful_ingestion. Projects deleted in Hex are soft-deleted in DataHub on the next run.

max_projects caps projects per run. With stateful_ingestion enabled, projects beyond the limit are treated as stale and soft-deleted — only set it if that is the intended behavior.

Context Documents

Opt-in via include_context_documents: true. When enabled, the connector emits a DataHub Document per Project and per Component containing SQL sources, visualization metadata, and notebook documentation.

Run History

When include_run_history is enabled (default), the most recent scheduled run is emitted as an Operation aspect, and last_run_status / last_run_elapsed_seconds are written to the project's custom properties — ERRORED runs surface there so operators can see failures. Only COMPLETED runs additionally update DashboardInfo.lastRefreshed via a targeted PATCH, so projects with sustained failures keep their last known-good refresh time as a freshness signal.

Usage Statistics

Each Project and Component emits an all-time viewsCount and a rolling 7-day window with lastViewedAt. Hex counts app views only when the published app is accessed — unpublished drafts have no view counts, so usage statistics are only emitted for published Projects and Components.

Limitations

queriedTables requires a Hex Enterprise workspace and opt-in. Defaults to SQL parsing; enable use_queried_tables_lineage on Hex Enterprise workspaces to use Hex's API as the primary source.
Non-SQL query paths produce no lineage. SQL parsing cannot recover table references from hextoolkit Python cells, dynamic SQL built from variables, or parameterized table names — the resulting projects will be missing those upstreams.
Context documents are not a complete mirror of the Hex notebook. Only a subset of cell types is captured, so the rendered document will not match the source notebook exactly.
Upstream lineage may be missing or mismatched when Hex's /v1/data-connections metadata is incomplete or uses an unrecognized connectionDetails shape. Without default_database / default_schema, neither SQL parsing nor queriedTables can assemble fully-qualified URNs; without the right platform_instance, URNs won't align with the warehouse ingestion. Set the affected dataConnectionId under connection_platform_map with the correct platform_instance / default_database / default_schema, or report the new connection shape to the DataHub team so the parser can be updated.

Troubleshooting

If ingestion fails, validate credentials, permissions, connectivity, and scope filters first, then review ingestion logs for source-specific errors.

Missing Upstream Lineage

The source report lists every skipped cell with its dataConnectionId and a reason (missing_connection_id or unresolved_platform). For each unresolved connection, add an entry under connection_platform_map and re-run. Cells with no dataConnectionId are non-SQL cells or cells without a Hex connection assigned — these cannot be recovered.

Column Lineage Looks Sparse

When use_queried_tables_lineage is enabled on a Hex Enterprise workspace, the report exposes enterprise_cells_with_mismatch and enterprise_sample_mismatched_cells — SQL cells whose parsed table URN did not match the queriedTables result. Adjusting default_database / default_schema in connection_platform_map resolves most cases.

Code Coordinates

Class Name: datahub.ingestion.source.hex.hex.HexSource
Browse on GitHub

Questions?

If you've got any questions on configuring ingestion for Hex, feel free to ping us on our Slack.

💡 Contributing to this documentation

This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.

Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.

Hex

Overview​

Concept Mapping​

Module hex​

Important Capabilities​

Overview​

Prerequisites​

Workspace Name​

Authentication​

Lineage URN Alignment​

Install the Plugin​

Starter Recipe​

Config Details​

Capabilities​

Upstream Lineage​

Connection Platform Resolution​

Migration from query_fetcher​

Migration: Components are now Charts​

Stale Entity Removal​

Context Documents​

Run History​

Usage Statistics​

Limitations​

Troubleshooting​

Missing Upstream Lineage​

Column Lineage Looks Sparse​

Code Coordinates​

Overview

Concept Mapping

Module `hex`

Important Capabilities

Overview

Prerequisites

Workspace Name

Authentication

Lineage URN Alignment

Install the Plugin

Starter Recipe

Config Details

Capabilities

Upstream Lineage

Connection Platform Resolution

Migration from `query_fetcher`

Migration: Components are now Charts

Stale Entity Removal

Context Documents

Run History

Usage Statistics

Limitations

Troubleshooting

Missing Upstream Lineage

Column Lineage Looks Sparse

Code Coordinates