Sigma
Overview
Sigma is a business intelligence and analytics platform. Learn more in the official Sigma documentation.
The DataHub integration for Sigma covers BI entities such as dashboards, charts, datasets, and related ownership context. It also captures table-level lineage, ownership, tags, and stateful deletion detection.
Concept Mapping
| Sigma | Datahub | Notes |
|---|---|---|
Workspace | Container | SubType "Sigma Workspace" |
Workbook | Dashboard | SubType "Sigma Workbook" |
Page | Dashboard | |
Element | Chart | |
Dataset | Dataset | SubType "Sigma Dataset" |
User | User (a.k.a CorpUser) | Optionally Extracted |
Module sigma
Important Capabilities
| Capability | Status | Notes |
|---|---|---|
| Asset Containers | ✅ | Enabled by default. Supported for types - Sigma Workspace, Sigma Data Model. |
| Descriptions | ✅ | Enabled by default. |
| Detect Deleted Entities | ✅ | Enabled by default via stateful ingestion. |
| Extract Ownership | ✅ | Enabled by default, configured using ingest_owner. |
| Extract Tags | ✅ | Enabled by default. |
| Platform Instance | ✅ | Enabled by default. |
| Schema Metadata | ✅ | Enabled by default. |
| Table-Level Lineage | ✅ | Enabled by default. |
| Test Connection | ✅ | Enabled by default. |
Overview
The sigma module ingests metadata from Sigma into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.
This source extracts the following:
Workspaces and workbooks within that workspaces as Container.
Sigma Datasets as Datahub Datasets.
Pages as Datahub dashboards and elements present inside pages as charts.
Sigma Data Models as Containers, with each element as a Dataset inside the Container. Opt-in via
ingest_data_models: true(defaultfalse). Each element Dataset carriesSchemaMetadataandUpstreamLineagefor intra-DM element references and external upstreams (Sigma Datasets ingested in the same run). Cross-DM lineage and workbook-to-DM element links are not emitted in this release and will arrive in a follow-up.Notes:
- Element Dataset URNs are keyed by the immutable Data Model UUID
(
urn:li:dataset:(sigma,<dataModelId>.<elementId>,env)) so attachments survive Sigma slug rotation; the slug is captured oncustomProperties.dataModelUrlId. - Column types are emitted as
NullTypewithnativeDataType: "unknown"— Sigma's/columnsAPI does not return per-column types today. Earlier pre-releases hardcodedString; that was a lie (no Sigma side confirms it) and has been softened so downstream type-aware features can tell "unknown" from "actually a string." - Column-level lineage is not emitted yet;
SchemaMetadatais present, so CLL can be added later without re-ingestion. - External upstreams to Sigma Datasets only resolve when the
referenced dataset is ingested in the same recipe run. Splitting
Sigma Datasets and Data Models into separate recipes will leave
those upstreams unresolved. The report tracks these under
data_model_element_upstreams_unresolved_external(split out fromdata_model_element_upstreams_unknown_shape, which counts source_id shapes this release does not parse — e.g. cross-DM refs). The aggregatedata_model_element_upstreams_unresolvedis kept for dashboards that already read it. Keep Sigma Datasets and Data Models in the same recipe, or tolerate the gap, until a follow-up adds an opt-in URN-pattern fallback. - Setting
ingest_data_models: trueissues/dataModels/{id}/elementsand/columnscalls per DM unconditionally, but the per-DM/lineagecall is gated onextract_lineage: true. If you opt out of lineage at the workbook surface, the DM connector also stops hitting any/lineageendpoint — DM Containers, element Datasets, andSchemaMetadataare still emitted, but withoutUpstreamLineage. - The DM Container URN is keyed on
platformandplatform_instancebut notenv. Multi-environment deployments against the same tenant should set a distinctplatform_instanceper env so DM Containers do not collide on a single URN. Element Datasets are already env-scoped.
- Element Dataset URNs are keyed by the immutable Data Model UUID
(
Prerequisites
Before running ingestion, ensure network connectivity to the source, valid authentication credentials, and read permissions for metadata APIs required by this module.
- Refer doc to generate an API client credentials.
- Provide the generated Client ID and Secret in Recipe.
We have observed issues with the Sigma API, where certain API endpoints do not return the expected results, even when the user is an admin. In those cases, a workaround is to manually add the user associated with the Client ID/Secret to each workspace with missing metadata.
Empty workspaces are listed in the ingestion report in the logs with the key empty_workspaces.
Install the Plugin
pip install 'acryl-datahub[sigma]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: sigma
config:
# Coordinates
api_url: "https://aws-api.sigmacomputing.com/v2"
# Credentials
client_id: "CLIENTID"
client_secret: "CLIENT_SECRET"
# Optional - filter for certain workspace names instead of ingesting everything.
# workspace_pattern:
# allow:
# - workspace_name
# Optional - filter for certain workbook names instead of ingesting everything.
# workbook_pattern:
# allow:
# - workbook_name
ingest_owner: true
# Optional - mapping of sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path.
# chart_sources_platform_mapping:
# folder_path:
# data_source_platform: postgres
# platform_instance: cloud_instance
# env: DEV
sink:
# sink configs
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description |
|---|---|
client_id ✅ string | Sigma Client ID |
client_secret ✅ string(password) | Sigma Client Secret |
api_url string | Sigma API hosted URL. |
extract_lineage One of boolean, null | Whether to extract lineage of workbook's elements and datasets or not. Default: True |
ingest_data_models boolean | Whether to ingest Sigma Data Models. Each Data Model is emitted as a Container with one Dataset per element inside it (plus per-element SchemaMetadata and, when extract_lineage is also enabled, UpstreamLineage). Default is False because enabling this introduces a new entity class to the graph — existing tenants will see new Containers and Datasets appear on first ingest and will need to factor those into any soft-delete policy if they later disable this flag. Enabling this issues /dataModels/{id}/elements and /columns calls per Data Model unconditionally; the /lineage call is only issued when extract_lineage is also True (so users who opt out of lineage at the workbook surface don't get a lineage endpoint hit under a different flag). Default: False |
ingest_owner One of boolean, null | Ingest Owner from source. This will override Owner info entered from UI. Default: True |
ingest_shared_entities One of boolean, null | Whether to ingest the shared entities or not. Default: False |
platform_instance One of string, null | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None |
env string | The environment that all assets produced by this connector belong to Default: PROD |
chart_sources_platform_mapping map(str,PlatformDetail) | |
chart_sources_platform_mapping. key.envstring | The environment that all assets produced by this connector belong to Default: PROD |
chart_sources_platform_mapping. key.data_source_platform ❓string | A chart's data sources platform name. |
chart_sources_platform_mapping. key.default_dbOne of string, null | Default database name to use when parsing SQL queries. Used to generate fully qualified table URNs (e.g., 'prod' for 'prod.public.table'). Default: None |
chart_sources_platform_mapping. key.default_schemaOne of string, null | Default schema name to use when parsing SQL queries. Used to generate fully qualified table URNs (e.g., 'public' for 'prod.public.table'). Default: None |
chart_sources_platform_mapping. key.platform_instanceOne of string, null | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None |
data_model_pattern AllowDenyPattern | A class to store allow deny regexes |
data_model_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
workbook_lineage_pattern AllowDenyPattern | A class to store allow deny regexes |
workbook_lineage_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
workbook_pattern AllowDenyPattern | A class to store allow deny regexes |
workbook_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
workspace_pattern AllowDenyPattern | A class to store allow deny regexes |
workspace_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null | Sigma Stateful Ingestion Config. Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"AllowDenyPattern": {
"additionalProperties": false,
"description": "A class to store allow deny regexes",
"properties": {
"allow": {
"default": [
".*"
],
"description": "List of regex patterns to include in ingestion",
"items": {
"type": "string"
},
"title": "Allow",
"type": "array"
},
"deny": {
"default": [],
"description": "List of regex patterns to exclude from ingestion.",
"items": {
"type": "string"
},
"title": "Deny",
"type": "array"
},
"ignoreCase": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Whether to ignore case sensitivity during pattern matching.",
"title": "Ignorecase"
}
},
"title": "AllowDenyPattern",
"type": "object"
},
"PlatformDetail": {
"additionalProperties": false,
"properties": {
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
"title": "Platform Instance"
},
"data_source_platform": {
"description": "A chart's data sources platform name.",
"title": "Data Source Platform",
"type": "string"
},
"default_db": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Default database name to use when parsing SQL queries. Used to generate fully qualified table URNs (e.g., 'prod' for 'prod.public.table').",
"title": "Default Db"
},
"default_schema": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Default schema name to use when parsing SQL queries. Used to generate fully qualified table URNs (e.g., 'public' for 'prod.public.table').",
"title": "Default Schema"
}
},
"required": [
"data_source_platform"
],
"title": "PlatformDetail",
"type": "object"
},
"StatefulStaleMetadataRemovalConfig": {
"additionalProperties": false,
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
},
"remove_stale_metadata": {
"default": true,
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"title": "Remove Stale Metadata",
"type": "boolean"
},
"fail_safe_threshold": {
"default": 75.0,
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"maximum": 100.0,
"minimum": 0.0,
"title": "Fail Safe Threshold",
"type": "number"
}
},
"title": "StatefulStaleMetadataRemovalConfig",
"type": "object"
}
},
"additionalProperties": false,
"properties": {
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
"title": "Platform Instance"
},
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Sigma Stateful Ingestion Config."
},
"api_url": {
"default": "https://aws-api.sigmacomputing.com/v2",
"description": "Sigma API hosted URL.",
"title": "Api Url",
"type": "string"
},
"client_id": {
"description": "Sigma Client ID",
"title": "Client Id",
"type": "string"
},
"client_secret": {
"description": "Sigma Client Secret",
"format": "password",
"title": "Client Secret",
"type": "string",
"writeOnly": true
},
"workspace_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter Sigma workspaces in ingestion.Mention 'My documents' if personal entities also need to ingest."
},
"ingest_owner": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Ingest Owner from source. This will override Owner info entered from UI.",
"title": "Ingest Owner"
},
"ingest_shared_entities": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Whether to ingest the shared entities or not.",
"title": "Ingest Shared Entities"
},
"extract_lineage": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Whether to extract lineage of workbook's elements and datasets or not.",
"title": "Extract Lineage"
},
"workbook_lineage_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter workbook's elements and datasets lineage in ingestion.Requires extract_lineage to be enabled."
},
"chart_sources_platform_mapping": {
"additionalProperties": {
"$ref": "#/$defs/PlatformDetail"
},
"default": {},
"description": "A mapping of the sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path.",
"title": "Chart Sources Platform Mapping",
"type": "object"
},
"workbook_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter Sigma workbook names in ingestion."
},
"ingest_data_models": {
"default": false,
"description": "Whether to ingest Sigma Data Models. Each Data Model is emitted as a Container with one Dataset per element inside it (plus per-element ``SchemaMetadata`` and, when ``extract_lineage`` is also enabled, ``UpstreamLineage``). Default is ``False`` because enabling this introduces a new entity class to the graph \u2014 existing tenants will see new Containers and Datasets appear on first ingest and will need to factor those into any soft-delete policy if they later disable this flag. Enabling this issues ``/dataModels/{id}/elements`` and ``/columns`` calls per Data Model unconditionally; the ``/lineage`` call is only issued when ``extract_lineage`` is also ``True`` (so users who opt out of lineage at the workbook surface don't get a lineage endpoint hit under a different flag).",
"title": "Ingest Data Models",
"type": "boolean"
},
"data_model_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter Sigma Data Model names in ingestion. Requires ingest_data_models to be enabled."
}
},
"required": [
"client_id",
"client_secret"
],
"title": "SigmaSourceConfig",
"type": "object"
}
Capabilities
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
Chart source platform mapping
If you want to provide platform details(platform name, platform instance and env) for chart's all external upstream data sources, then you can use chart_sources_platform_mapping as below:
Example - For just one specific chart's external upstream data sources
chart_sources_platform_mapping:
"workspace_name/workbook_name/chart_name_1":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
"workspace_name/folder_name/workbook_name/chart_name_2":
data_source_platform: postgres
platform_instance: cloud_instance
env: DEV
Example - For all charts within one specific workbook
chart_sources_platform_mapping:
"workspace_name/workbook_name_1":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
"workspace_name/folder_name/workbook_name_2":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
Example - For all workbooks charts within one specific workspace
chart_sources_platform_mapping:
"workspace_name":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
Example - All workbooks use the same connection
chart_sources_platform_mapping:
"*":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
Limitations
Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
Troubleshooting
If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.
Code Coordinates
- Class Name:
datahub.ingestion.source.sigma.sigma.SigmaSource - Browse on GitHub
If you've got any questions on configuring ingestion for Sigma, feel free to ping us on our Slack.
This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.
Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.