Skip to main content

Sigma

Overview

Sigma is a business intelligence and analytics platform. Learn more in the official Sigma documentation.

The DataHub integration for Sigma covers BI entities such as dashboards, charts, datasets, and related ownership context. It also captures table-level lineage, ownership, tags, and stateful deletion detection.

Concept Mapping

SigmaDatahubNotes
WorkspaceContainerSubType "Sigma Workspace"
WorkbookDashboardSubType "Sigma Workbook"
PageDashboard
ElementChart
DatasetDatasetSubType "Sigma Dataset"
UserUser (a.k.a CorpUser)Optionally Extracted

Module sigma

Incubating

Important Capabilities

CapabilityStatusNotes
Asset ContainersEnabled by default. Supported for types - Sigma Workspace, Sigma Data Model.
DescriptionsEnabled by default.
Detect Deleted EntitiesEnabled by default via stateful ingestion.
Extract OwnershipEnabled by default, configured using ingest_owner.
Extract TagsEnabled by default.
Platform InstanceEnabled by default.
Schema MetadataEnabled by default.
Table-Level LineageEnabled by default.
Test ConnectionEnabled by default.

Overview

The sigma module ingests metadata from Sigma into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.

This source extracts the following:

  • Workspaces and workbooks within that workspaces as Container.

  • Sigma Datasets as Datahub Datasets.

  • Pages as Datahub dashboards and elements present inside pages as charts.

  • Sigma Data Models as Containers, with each element as a Dataset inside the Container. Opt-in via ingest_data_models: true (default false). Each element Dataset carries SchemaMetadata and UpstreamLineage for intra-DM element references and external upstreams (Sigma Datasets ingested in the same run). Cross-DM lineage and workbook-to-DM element links are not emitted in this release and will arrive in a follow-up.

    Notes:

    • Element Dataset URNs are keyed by the immutable Data Model UUID (urn:li:dataset:(sigma,<dataModelId>.<elementId>,env)) so attachments survive Sigma slug rotation; the slug is captured on customProperties.dataModelUrlId.
    • Column types are emitted as NullType with nativeDataType: "unknown" — Sigma's /columns API does not return per-column types today. Earlier pre-releases hardcoded String; that was a lie (no Sigma side confirms it) and has been softened so downstream type-aware features can tell "unknown" from "actually a string."
    • Column-level lineage is not emitted yet; SchemaMetadata is present, so CLL can be added later without re-ingestion.
    • External upstreams to Sigma Datasets only resolve when the referenced dataset is ingested in the same recipe run. Splitting Sigma Datasets and Data Models into separate recipes will leave those upstreams unresolved. The report tracks these under data_model_element_upstreams_unresolved_external (split out from data_model_element_upstreams_unknown_shape, which counts source_id shapes this release does not parse — e.g. cross-DM refs). The aggregate data_model_element_upstreams_unresolved is kept for dashboards that already read it. Keep Sigma Datasets and Data Models in the same recipe, or tolerate the gap, until a follow-up adds an opt-in URN-pattern fallback.
    • Setting ingest_data_models: true issues /dataModels/{id}/elements and /columns calls per DM unconditionally, but the per-DM /lineage call is gated on extract_lineage: true. If you opt out of lineage at the workbook surface, the DM connector also stops hitting any /lineage endpoint — DM Containers, element Datasets, and SchemaMetadata are still emitted, but without UpstreamLineage.
    • The DM Container URN is keyed on platform and platform_instance but not env. Multi-environment deployments against the same tenant should set a distinct platform_instance per env so DM Containers do not collide on a single URN. Element Datasets are already env-scoped.

Prerequisites

Before running ingestion, ensure network connectivity to the source, valid authentication credentials, and read permissions for metadata APIs required by this module.

  1. Refer doc to generate an API client credentials.
  2. Provide the generated Client ID and Secret in Recipe.

We have observed issues with the Sigma API, where certain API endpoints do not return the expected results, even when the user is an admin. In those cases, a workaround is to manually add the user associated with the Client ID/Secret to each workspace with missing metadata. Empty workspaces are listed in the ingestion report in the logs with the key empty_workspaces.

Install the Plugin

pip install 'acryl-datahub[sigma]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: sigma
config:
# Coordinates
api_url: "https://aws-api.sigmacomputing.com/v2"
# Credentials
client_id: "CLIENTID"
client_secret: "CLIENT_SECRET"

# Optional - filter for certain workspace names instead of ingesting everything.
# workspace_pattern:
# allow:
# - workspace_name

# Optional - filter for certain workbook names instead of ingesting everything.
# workbook_pattern:
# allow:
# - workbook_name

ingest_owner: true

# Optional - mapping of sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path.
# chart_sources_platform_mapping:
# folder_path:
# data_source_platform: postgres
# platform_instance: cloud_instance
# env: DEV

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
client_id 
string
Sigma Client ID
client_secret 
string(password)
Sigma Client Secret
api_url
string
Sigma API hosted URL.
extract_lineage
One of boolean, null
Whether to extract lineage of workbook's elements and datasets or not.
Default: True
ingest_data_models
boolean
Whether to ingest Sigma Data Models. Each Data Model is emitted as a Container with one Dataset per element inside it (plus per-element SchemaMetadata and, when extract_lineage is also enabled, UpstreamLineage). Default is False because enabling this introduces a new entity class to the graph — existing tenants will see new Containers and Datasets appear on first ingest and will need to factor those into any soft-delete policy if they later disable this flag. Enabling this issues /dataModels/{id}/elements and /columns calls per Data Model unconditionally; the /lineage call is only issued when extract_lineage is also True (so users who opt out of lineage at the workbook surface don't get a lineage endpoint hit under a different flag).
Default: False
ingest_owner
One of boolean, null
Ingest Owner from source. This will override Owner info entered from UI.
Default: True
ingest_shared_entities
One of boolean, null
Whether to ingest the shared entities or not.
Default: False
platform_instance
One of string, null
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.
Default: None
env
string
The environment that all assets produced by this connector belong to
Default: PROD
chart_sources_platform_mapping
map(str,PlatformDetail)
chart_sources_platform_mapping.key.env
string
The environment that all assets produced by this connector belong to
Default: PROD
chart_sources_platform_mapping.key.data_source_platform 
string
A chart's data sources platform name.
chart_sources_platform_mapping.key.default_db
One of string, null
Default database name to use when parsing SQL queries. Used to generate fully qualified table URNs (e.g., 'prod' for 'prod.public.table').
Default: None
chart_sources_platform_mapping.key.default_schema
One of string, null
Default schema name to use when parsing SQL queries. Used to generate fully qualified table URNs (e.g., 'public' for 'prod.public.table').
Default: None
chart_sources_platform_mapping.key.platform_instance
One of string, null
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.
Default: None
data_model_pattern
AllowDenyPattern
A class to store allow deny regexes
data_model_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
workbook_lineage_pattern
AllowDenyPattern
A class to store allow deny regexes
workbook_lineage_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
workbook_pattern
AllowDenyPattern
A class to store allow deny regexes
workbook_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
workspace_pattern
AllowDenyPattern
A class to store allow deny regexes
workspace_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
stateful_ingestion
One of StatefulStaleMetadataRemovalConfig, null
Sigma Stateful Ingestion Config.
Default: None
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.fail_safe_threshold
number
Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.
Default: 75.0
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Capabilities

Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.

Chart source platform mapping

If you want to provide platform details(platform name, platform instance and env) for chart's all external upstream data sources, then you can use chart_sources_platform_mapping as below:

Example - For just one specific chart's external upstream data sources
chart_sources_platform_mapping:
"workspace_name/workbook_name/chart_name_1":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD

"workspace_name/folder_name/workbook_name/chart_name_2":
data_source_platform: postgres
platform_instance: cloud_instance
env: DEV
Example - For all charts within one specific workbook
chart_sources_platform_mapping:
"workspace_name/workbook_name_1":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD

"workspace_name/folder_name/workbook_name_2":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
Example - For all workbooks charts within one specific workspace
chart_sources_platform_mapping:
"workspace_name":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
Example - All workbooks use the same connection
chart_sources_platform_mapping:
"*":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD

Limitations

Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.

Troubleshooting

If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.

Code Coordinates

  • Class Name: datahub.ingestion.source.sigma.sigma.SigmaSource
  • Browse on GitHub
Questions?

If you've got any questions on configuring ingestion for Sigma, feel free to ping us on our Slack.

💡 Contributing to this documentation

This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.

Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.