Cube
Overview
Cube is a headless semantic layer that defines metrics, dimensions, and joins once and exposes them to BI tools, data apps, and AI agents through SQL, REST, and GraphQL APIs. Its data model is organised into cubes (business entities such as orders or customers) and views (curated, query-ready datasets built on top of cubes).
This source ingests the Cube data model into DataHub as datasets: each cube and view becomes a dataset whose measures and dimensions are modelled as schema fields. It supports both Cube Core (self-hosted, via the /v1/meta REST endpoint) and Cube Cloud, where it merges /v1/meta (structural and presentation metadata) with the richer Metadata API (warehouse and column-level lineage). On Cube Cloud the connector can mint the metadata-scoped token automatically via the Control Plane API. DataHub captures descriptions, measure/dimension classification, view-to-cube lineage, and — where the deployment exposes it — column-level lineage down to the underlying warehouse tables. On Cube Cloud it also ingests saved reports as charts and workbooks as dashboards via the Platform API, extending lineage to the BI consumption layer. Stateful ingestion removes cubes and views that have been deleted from the model.
Concept Mapping
| Source Concept | DataHub Concept | Notes |
|---|---|---|
| Deployment / data model | Container | Subtype Cube Deployment |
| Cube | Dataset | Subtype Cube |
| View | Dataset | Subtype View |
| Measure | Schema Field | Tagged Measure; aggregation in native type |
| Dimension | Schema Field | Tagged Dimension; primary keys marked as key |
format / drillMembers / cumulative | Schema Field jsonProps | Measure presentation hints |
joins / hierarchies / folders / preAggregations | Dataset custom properties | Structural model metadata |
public / isVisible | Ingestion filter | Hidden cubes/members skipped unless include_hidden |
table_references / cube sql | Lineage | Lineage to upstream warehouse tables |
View member aliasMember | Fine-Grained Lineage | Column-level view-to-cube lineage |
meta | Tags / Terms / Owners / Domains | Mapped via meta_mapping / column_meta_mapping |
| Report (Cube Cloud) | Chart | Input lineage to queried cubes/views |
| Workbook (Cube Cloud) | Dashboard | Contains its reports' charts |
Module cube
Important Capabilities
| Capability | Status | Notes |
|---|---|---|
| Asset Containers | ✅ | Enabled by default. |
| Column-level Lineage | ✅ | Enabled by default, can be disabled via include_column_lineage. |
| Descriptions | ✅ | Enabled by default. |
| Detect Deleted Entities | ✅ | Enabled via stateful ingestion. |
| Domains | ✅ | Enabled via the domain config and meta_mapping. |
| Extract Ownership | ✅ | Enabled via meta_mapping against Cube meta. |
| Extract Tags | ✅ | Enabled via meta_mapping/column_meta_mapping, plus Measure/Dimension/Temporal field tags. |
| Glossary Terms | ✅ | Enabled via meta_mapping/column_meta_mapping. |
| Platform Instance | ✅ | Enabled by default. |
| Schema Metadata | ✅ | Enabled by default. |
| Table-Level Lineage | ✅ | Enabled by default. Includes view->cube lineage and, where available, lineage to upstream warehouse tables. |
| Test Connection | ✅ | Enabled by default. |
Overview
The cube module ingests the Cube semantic layer data model into DataHub. Every cube and view is emitted as a dataset, with its measures and dimensions modelled as schema fields, organised under a container that represents the Cube deployment. The module works against both Cube Core and Cube Cloud.
Prerequisites
Choose a deployment type
Set deployment_type to match your Cube installation:
CORE— a self-hosted Cube Core instance. Metadata is read from the/v1/metaREST endpoint.CLOUD— a Cube Cloud deployment. Whenuse_metadata_apiis enabled, the connector reads from the Metadata API, which additionally exposes lineage to upstream warehouse tables. If the supplied token lacks the required scope, the connector automatically falls back to/v1/meta.
Obtain an API token
The connector authenticates with a token sent in the Authorization header.
- Cube Core: generate a JWT signed with your deployment's
CUBEJS_API_SECRET. See Security context. - Cube Cloud (
/v1/meta): copy a token from the deployment's Playground → API tab, or sign one with the deployment's API secret. - Cube Cloud Metadata API: obtain a token via the Control Plane API. This token is required for warehouse lineage.
Configure the API URL
api_url is the base URL of the REST API, including the base path (defaults to /cubejs-api):
- Cube Core:
http://localhost:4000/cubejs-api - Cube Cloud:
https://<deployment>.cubecloud.dev/cubejs-api
Warehouse lineage (optional)
To connect cubes to the warehouse tables they read from, set warehouse_platform (e.g. snowflake, bigquery, postgres) and, if your existing datasets use them, warehouse_platform_instance and warehouse_env. On Cube Cloud with the Metadata API enabled, the warehouse platform and database are auto-detected from the deployment's data sources. On Cube Core, set parse_sql_for_lineage to derive table lineage from each cube's SQL definition (requires warehouse_platform).
Note that cubes marked public: false are not returned by the /v1/meta endpoint, so views that reference them will still produce lineage edges to those cubes even though the cubes themselves are not ingested.
Install the Plugin
pip install 'acryl-datahub[cube]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: cube
config:
# Base URL of the Cube REST API, including the base path.
api_url: "https://your-deployment.cubecloud.dev/cubejs-api"
api_token: "${CUBE_API_TOKEN}"
# CORE (self-hosted) or CLOUD.
deployment_type: "CLOUD"
# Connect cubes to their upstream warehouse tables. Auto-detected on Cube
# Cloud via the Metadata API; set explicitly for Cube Core.
# warehouse_platform: "snowflake"
# warehouse_database: "ANALYTICS"
# Cube Cloud only: ingest reports as charts and workbooks as dashboards, and
# auto-mint a Metadata API token. cloud_api_key + deployment_id are required;
# environment_id is needed only for the Metadata API token.
# cloud_api_key: "${CUBE_CLOUD_API_KEY}"
# deployment_id: "12345"
# environment_id: "production"
stateful_ingestion:
enabled: true
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description |
|---|---|
api_token ✅ string(password) | API token used to authenticate against Cube. For Cube Core this is a JWT signed with CUBEJS_API_SECRET; for the Cube Cloud Metadata API use a token obtained from the Control Plane API. |
api_url ✅ string | Base URL of the Cube REST API, including the base path. For Cube Core this is typically http://localhost:4000/cubejs-api; for Cube Cloud it looks like https://<name>.cubecloud.dev/cubejs-api. |
cloud_api_key One of string(password), null | Cube Cloud Control Plane API key (Account → API keys). When set together with deployment_id and environment_id, the connector automatically mints a metadata-scoped JWT via the Control Plane tokens-for-meta-sync endpoint to access the Metadata API, instead of requiring a pre-generated token in api_token. Default: None |
cloud_api_url One of string, null | Base URL of the Cube Cloud Control Plane API (e.g. https://<tenant>.cubecloud.dev). If unset, it is derived from the scheme and host of api_url. Only used when cloud_api_key is set. Default: None |
column_meta_mapping map(str,object) | |
convert_lineage_urns_to_lowercase boolean | Whether to lowercase upstream warehouse table and column names when building lineage URNs. Must match the convert_urns_to_lowercase setting of the warehouse connector (e.g. Snowflake ingests lowercased URNs by default) so that the lineage edges resolve. Default: True |
deployment_id One of string, null | Cube Cloud deployment id, used to mint a Metadata API token via the Control Plane API. Default: None |
deployment_type Enum | One of: "CORE", "CLOUD" |
deployment_url One of string, null | Base URL of the Cube deployment UI, used to build an external link on the deployment container. If unset, it is derived from api_url by stripping the API base path. Default: None |
emit_member_details boolean | Whether to capture Cube member presentation hints (format, drill-down members, cumulative flag) as schema-field jsonProps, and structural metadata (joins, hierarchies, folders, pre-aggregations) as dataset custom properties. Default: True |
enable_meta_mapping boolean | Whether to process meta_mapping and column_meta_mapping rules. Default: True |
environment_id One of string, null | Cube Cloud environment id, used to mint a Metadata API token via the Control Plane API. Default: None |
include_column_lineage boolean | Whether to emit column-level (fine-grained) lineage. Requires include_lineage to be enabled. Default: True |
include_cubes boolean | Whether to ingest base cubes as datasets. Default: True |
include_hidden boolean | Whether to ingest cubes, views, and members that Cube marks as hidden (public: false / isVisible: false). Hidden cubes are typically excluded from Cube's own API consumers; enable this to surface them in DataHub anyway. Default: False |
include_lineage boolean | Whether to emit lineage. This includes view->cube lineage and, where available, lineage from cubes to their upstream warehouse tables. Default: True |
include_reports boolean | Cube Cloud only. Whether to ingest saved reports as DataHub charts, with lineage to the cubes/views they query. Requires Platform API access (cloud_api_key + deployment_id). Default: True |
include_views boolean | Whether to ingest views as datasets. Default: True |
include_workbooks boolean | Cube Cloud only. Whether to ingest workbooks as DataHub dashboards containing their reports' charts. Requires Platform API access (cloud_api_key + deployment_id). Default: True |
incremental_lineage boolean | When enabled, emits lineage as incremental to existing lineage already in DataHub. When disabled, re-states lineage on each run. Default: False |
meta_mapping map(str,object) | |
meta_sync_token_expires_in integer | Expiry (in seconds) of the minted Metadata API token. Defaults to 24 hours. Default: 86400 |
parse_sql_for_lineage boolean | Cube Core only. When the /v1/meta?extended response includes a cube's SQL definition, parse it to derive upstream warehouse lineage. Requires warehouse_platform to be set. The Cloud Metadata API provides lineage directly, so this is ignored for Cube Cloud. Default: True |
platform_instance One of string, null | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None |
request_timeout_sec integer | Per-request timeout, in seconds. Default: 30 |
security_context object | Security context embedded in the minted Metadata API token. Controls which parts of the data model are visible, following Cube's multi-tenancy rules. |
strip_user_ids_from_email boolean | Whether to strip the email domain from owners derived via meta_mapping. Default: False |
tag_measures_and_dimensions boolean | Whether to tag schema fields with Measure/Dimension (and Temporal for time dimensions) so the kinds of Cube members can be distinguished and filtered in DataHub. Default: True |
tag_prefix string | Prefix added to tags created via meta_mapping. Default: |
use_metadata_api boolean | Cube Cloud only. When enabled, the richer Metadata API (/v1/entities) is used to extract warehouse and column-level lineage, which is merged with the structural metadata from /v1/meta. When disabled, only the /v1/meta endpoint is used. Has no effect for Cube Core deployments. Default: True |
warehouse_database One of string, null | Database name to prepend to upstream warehouse table references that do not already include one. If unset, it is taken from the Cube data source definition when available. Default: None |
warehouse_env string | Environment of the upstream warehouse datasets referenced by lineage. Default: PROD |
warehouse_platform One of string, null | DataHub platform name of the warehouse that backs the Cube data model (e.g. snowflake, bigquery, postgres). Used to build upstream lineage URNs. If unset, it is auto-detected from the Cube data source type when the Metadata API is available. Default: None |
warehouse_platform_instance One of string, null | Platform instance of the upstream warehouse, used when building lineage URNs. Default: None |
env string | The environment that all assets produced by this connector belong to Default: PROD |
cube_pattern AllowDenyPattern | A class to store allow deny regexes |
cube_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
domain map(str,AllowDenyPattern) | A class to store allow deny regexes |
domain. key.allowarray | List of regex patterns to include in ingestion Default: ['.*'] |
domain. key.allow.stringstring | |
domain. key.ignoreCaseOne of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
domain. key.denyarray | List of regex patterns to exclude from ingestion. Default: [] |
domain. key.deny.stringstring | |
report_pattern AllowDenyPattern | A class to store allow deny regexes |
report_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
view_pattern AllowDenyPattern | A class to store allow deny regexes |
view_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
workbook_pattern AllowDenyPattern | A class to store allow deny regexes |
workbook_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null | Stateful ingestion configuration. Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"AllowDenyPattern": {
"additionalProperties": false,
"description": "A class to store allow deny regexes",
"properties": {
"allow": {
"default": [
".*"
],
"description": "List of regex patterns to include in ingestion",
"items": {
"type": "string"
},
"title": "Allow",
"type": "array"
},
"deny": {
"default": [],
"description": "List of regex patterns to exclude from ingestion.",
"items": {
"type": "string"
},
"title": "Deny",
"type": "array"
},
"ignoreCase": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Whether to ignore case sensitivity during pattern matching.",
"title": "Ignorecase"
}
},
"title": "AllowDenyPattern",
"type": "object"
},
"CubeDeploymentType": {
"enum": [
"CORE",
"CLOUD"
],
"title": "CubeDeploymentType",
"type": "string"
},
"StatefulStaleMetadataRemovalConfig": {
"additionalProperties": false,
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
},
"remove_stale_metadata": {
"default": true,
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"title": "Remove Stale Metadata",
"type": "boolean"
},
"fail_safe_threshold": {
"default": 75.0,
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"maximum": 100.0,
"minimum": 0.0,
"title": "Fail Safe Threshold",
"type": "number"
}
},
"title": "StatefulStaleMetadataRemovalConfig",
"type": "object"
}
},
"additionalProperties": false,
"properties": {
"incremental_lineage": {
"default": false,
"description": "When enabled, emits lineage as incremental to existing lineage already in DataHub. When disabled, re-states lineage on each run.",
"title": "Incremental Lineage",
"type": "boolean"
},
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
"title": "Platform Instance"
},
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Stateful ingestion configuration."
},
"api_url": {
"description": "Base URL of the Cube REST API, including the base path. For Cube Core this is typically `http://localhost:4000/cubejs-api`; for Cube Cloud it looks like `https://<name>.cubecloud.dev/cubejs-api`.",
"title": "Api Url",
"type": "string"
},
"api_token": {
"description": "API token used to authenticate against Cube. For Cube Core this is a JWT signed with `CUBEJS_API_SECRET`; for the Cube Cloud Metadata API use a token obtained from the Control Plane API.",
"format": "password",
"title": "Api Token",
"type": "string",
"writeOnly": true
},
"deployment_type": {
"$ref": "#/$defs/CubeDeploymentType",
"default": "CORE",
"description": "Whether the target is Cube Core (`CORE`) or Cube Cloud (`CLOUD`)."
},
"use_metadata_api": {
"default": true,
"description": "Cube Cloud only. When enabled, the richer Metadata API (`/v1/entities`) is used to extract warehouse and column-level lineage, which is merged with the structural metadata from `/v1/meta`. When disabled, only the `/v1/meta` endpoint is used. Has no effect for Cube Core deployments.",
"title": "Use Metadata Api",
"type": "boolean"
},
"cloud_api_key": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Cube Cloud Control Plane API key (Account \u2192 API keys). When set together with `deployment_id` and `environment_id`, the connector automatically mints a metadata-scoped JWT via the Control Plane `tokens-for-meta-sync` endpoint to access the Metadata API, instead of requiring a pre-generated token in `api_token`.",
"title": "Cloud Api Key"
},
"cloud_api_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Base URL of the Cube Cloud Control Plane API (e.g. `https://<tenant>.cubecloud.dev`). If unset, it is derived from the scheme and host of `api_url`. Only used when `cloud_api_key` is set.",
"title": "Cloud Api Url"
},
"deployment_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Cube Cloud deployment id, used to mint a Metadata API token via the Control Plane API.",
"title": "Deployment Id"
},
"environment_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Cube Cloud environment id, used to mint a Metadata API token via the Control Plane API.",
"title": "Environment Id"
},
"security_context": {
"additionalProperties": true,
"description": "Security context embedded in the minted Metadata API token. Controls which parts of the data model are visible, following Cube's multi-tenancy rules.",
"title": "Security Context",
"type": "object"
},
"meta_sync_token_expires_in": {
"default": 86400,
"description": "Expiry (in seconds) of the minted Metadata API token. Defaults to 24 hours.",
"title": "Meta Sync Token Expires In",
"type": "integer"
},
"request_timeout_sec": {
"default": 30,
"description": "Per-request timeout, in seconds.",
"title": "Request Timeout Sec",
"type": "integer"
},
"include_cubes": {
"default": true,
"description": "Whether to ingest base cubes as datasets.",
"title": "Include Cubes",
"type": "boolean"
},
"include_views": {
"default": true,
"description": "Whether to ingest views as datasets.",
"title": "Include Views",
"type": "boolean"
},
"include_reports": {
"default": true,
"description": "Cube Cloud only. Whether to ingest saved reports as DataHub charts, with lineage to the cubes/views they query. Requires Platform API access (`cloud_api_key` + `deployment_id`).",
"title": "Include Reports",
"type": "boolean"
},
"include_workbooks": {
"default": true,
"description": "Cube Cloud only. Whether to ingest workbooks as DataHub dashboards containing their reports' charts. Requires Platform API access (`cloud_api_key` + `deployment_id`).",
"title": "Include Workbooks",
"type": "boolean"
},
"cube_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns for filtering cubes to ingest (matched on the cube name)."
},
"view_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns for filtering views to ingest (matched on the view name)."
},
"report_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns for filtering reports to ingest (matched on the report name)."
},
"workbook_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns for filtering workbooks to ingest (matched on the workbook name)."
},
"include_lineage": {
"default": true,
"description": "Whether to emit lineage. This includes view->cube lineage and, where available, lineage from cubes to their upstream warehouse tables.",
"title": "Include Lineage",
"type": "boolean"
},
"include_column_lineage": {
"default": true,
"description": "Whether to emit column-level (fine-grained) lineage. Requires `include_lineage` to be enabled.",
"title": "Include Column Lineage",
"type": "boolean"
},
"parse_sql_for_lineage": {
"default": true,
"description": "Cube Core only. When the `/v1/meta?extended` response includes a cube's SQL definition, parse it to derive upstream warehouse lineage. Requires `warehouse_platform` to be set. The Cloud Metadata API provides lineage directly, so this is ignored for Cube Cloud.",
"title": "Parse Sql For Lineage",
"type": "boolean"
},
"warehouse_platform": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "DataHub platform name of the warehouse that backs the Cube data model (e.g. `snowflake`, `bigquery`, `postgres`). Used to build upstream lineage URNs. If unset, it is auto-detected from the Cube data source type when the Metadata API is available.",
"title": "Warehouse Platform"
},
"warehouse_platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Platform instance of the upstream warehouse, used when building lineage URNs.",
"title": "Warehouse Platform Instance"
},
"warehouse_env": {
"default": "PROD",
"description": "Environment of the upstream warehouse datasets referenced by lineage.",
"title": "Warehouse Env",
"type": "string"
},
"warehouse_database": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Database name to prepend to upstream warehouse table references that do not already include one. If unset, it is taken from the Cube data source definition when available.",
"title": "Warehouse Database"
},
"convert_lineage_urns_to_lowercase": {
"default": true,
"description": "Whether to lowercase upstream warehouse table and column names when building lineage URNs. Must match the `convert_urns_to_lowercase` setting of the warehouse connector (e.g. Snowflake ingests lowercased URNs by default) so that the lineage edges resolve.",
"title": "Convert Lineage Urns To Lowercase",
"type": "boolean"
},
"deployment_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Base URL of the Cube deployment UI, used to build an external link on the deployment container. If unset, it is derived from `api_url` by stripping the API base path.",
"title": "Deployment Url"
},
"tag_measures_and_dimensions": {
"default": true,
"description": "Whether to tag schema fields with `Measure`/`Dimension` (and `Temporal` for time dimensions) so the kinds of Cube members can be distinguished and filtered in DataHub.",
"title": "Tag Measures And Dimensions",
"type": "boolean"
},
"include_hidden": {
"default": false,
"description": "Whether to ingest cubes, views, and members that Cube marks as hidden (`public: false` / `isVisible: false`). Hidden cubes are typically excluded from Cube's own API consumers; enable this to surface them in DataHub anyway.",
"title": "Include Hidden",
"type": "boolean"
},
"emit_member_details": {
"default": true,
"description": "Whether to capture Cube member presentation hints (format, drill-down members, cumulative flag) as schema-field `jsonProps`, and structural metadata (joins, hierarchies, folders, pre-aggregations) as dataset custom properties.",
"title": "Emit Member Details",
"type": "boolean"
},
"enable_meta_mapping": {
"default": true,
"description": "Whether to process `meta_mapping` and `column_meta_mapping` rules.",
"title": "Enable Meta Mapping",
"type": "boolean"
},
"meta_mapping": {
"additionalProperties": {
"additionalProperties": true,
"type": "object"
},
"description": "Mapping rules applied to the `meta` of each cube/view to derive tags, glossary terms, owners, domains, and documentation links. Uses the same syntax as the dbt connector's `meta_mapping`.",
"title": "Meta Mapping",
"type": "object"
},
"column_meta_mapping": {
"additionalProperties": {
"additionalProperties": true,
"type": "object"
},
"description": "Mapping rules applied to the `meta` of each measure/dimension to derive schema-field tags and glossary terms.",
"title": "Column Meta Mapping",
"type": "object"
},
"tag_prefix": {
"default": "",
"description": "Prefix added to tags created via `meta_mapping`.",
"title": "Tag Prefix",
"type": "string"
},
"strip_user_ids_from_email": {
"default": false,
"description": "Whether to strip the email domain from owners derived via `meta_mapping`.",
"title": "Strip User Ids From Email",
"type": "boolean"
},
"domain": {
"additionalProperties": {
"$ref": "#/$defs/AllowDenyPattern"
},
"description": "Regex patterns matched against a cube/view name to assign it to a DataHub domain (keyed by domain id or urn).",
"title": "Domain",
"type": "object"
}
},
"required": [
"api_url",
"api_token"
],
"title": "CubeSourceConfig",
"type": "object"
}
Capabilities
The connector extracts the following metadata:
- Cubes and views as datasets, grouped under a container representing the deployment. The container links back to the deployment UI (derived from
api_url, or setdeployment_url). - Schema — each measure and dimension becomes a schema field. Measures carry their aggregation type (e.g.
count,sum) in the native data type; primary-key dimensions are flagged as part of the key. Fields are taggedMeasureorDimension— andTemporalfor time dimensions (disable withtag_measures_and_dimensions: false). - Descriptions and properties — titles, descriptions, segment names, source file name, and any custom
metadefined in the model. - Structural metadata — joins (with relationship), hierarchies (with levels), folders/nested folders (with members), and pre-aggregation names are captured as dataset custom properties (disable with
emit_member_details: false). - Measure presentation hints — each measure's
format, drill-down members, and cumulative flag are stored on the schema field asjsonProps. - Hidden members — cubes, views, and members marked
public: false/isVisible: falseare skipped by default; setinclude_hidden: trueto ingest them. - Tags, glossary terms, owners, domains, and documentation links — derived from the
metadefined on cubes/views viameta_mapping, and from membermetaviacolumn_meta_mapping(same syntax as the dbt connector). Domains can also be assigned by name pattern via thedomainconfig. - Reports and workbooks (Cube Cloud only) — saved reports become DataHub charts with input lineage to the cubes/views they query, and workbooks become DataHub dashboards containing those charts. Owners and titles are carried across. Disable with
include_reports: false/include_workbooks: false, and filter withreport_pattern/workbook_pattern.
Lineage
Lineage is emitted when include_lineage is enabled (the default):
- View to cube — views are linked to the cubes they are built on, including column-level lineage derived from each member's
aliasMember. - Cube to warehouse — on Cube Cloud with the Metadata API, table and column references are read directly. On Cube Core, table-level lineage is parsed from each cube's SQL definition when
parse_sql_for_lineageandwarehouse_platformare set. Column-level lineage on Cube Core is best-effort: since/v1/metadoes not expose per-member SQL, members are matched by name against the upstream table's columns as found in DataHub (so the warehouse must be ingested first, and members whose name differs from the underlying column — e.g. aggregate measures — are not linked). - Report and workbook to view — on Cube Cloud, charts (reports) carry input lineage to the cubes/views in their query, and dashboards (workbooks) contain those charts, extending the chain to
warehouse → cube → view → chart → dashboard.
Disable column-level lineage with include_column_lineage: false.
Cube Cloud authentication and metadata merging
On Cube Cloud the connector reads both endpoints and merges them: /v1/meta supplies the structural and presentation metadata (joins, hierarchies, folders, formats, visibility), while the Metadata API (/v1/entities, /v1/data-sources) supplies warehouse and column-level lineage. This gives a Cloud ingestion the union of both.
The Metadata API requires a metadata-scoped JWT. You can either:
- Provide a pre-generated token in
api_token, or - Let the connector mint one automatically: set
cloud_api_key(a Cube Cloud API key from Account → API keys) together withdeployment_idandenvironment_id. The connector calls the Control Planetokens-for-meta-syncendpoint to obtain a short-lived, metadata-only token. Override the Control Plane host withcloud_api_urlif it differs from theapi_urlhost, and embed asecurity_contextto scope multi-tenant visibility.
If the Metadata API cannot be reached, the connector logs a warning and continues with /v1/meta only (structural metadata and view-to-cube lineage, but no warehouse lineage).
Reports and workbooks (Cube Cloud Platform API)
Reports and workbooks are read from the Cube Cloud Platform API, which is authenticated with a Cube Cloud API key as a Bearer token. Set cloud_api_key and deployment_id to enable this (environment_id is not required for reports/workbooks — it is only needed when minting a Metadata API token). When these are absent, or for Cube Core, report/workbook ingestion is skipped silently. A failed Platform API call logs a warning and does not abort the run.
Multi-tenancy and context variables
Cube context variables (COMPILE_CONTEXT, SECURITY_CONTEXT, FILTER_PARAMS, FILTER_GROUP, SQL_UTILS) are data-model authoring constructs, not metadata the APIs expose as structured fields — there is nothing separate to ingest. They affect the connector only indirectly:
COMPILE_CONTEXT(multi-tenancy). Cube compiles a different data model per security context. The connector ingests the single compiled model that matches the security context carried by its token: setsecurity_contextwhen minting a token via the Control Plane API, or rely on the claims baked into a directly-suppliedapi_token. To catalog multiple tenants, run one ingestion per tenant — but their cubes and views share names, so distinguish them withplatform_instance/env(orcube_pattern/view_pattern) to avoid URN collisions.FILTER_PARAMS/SQL_UTILSin cube SQL. The SQL returned by/v1/metais already compiled (FILTER_PARAMSrender to their defaults andCOMPILE_CONTEXTis resolved), so Cube Core SQL lineage parsing operates on the resolved SQL and is wrapped defensively if a template still cannot be parsed. On Cube Cloud the Metadata API returns resolvedtable_references/column_references, so templating is irrelevant there.
Limitations
- The
/v1/metaendpoint does not return cubes or views markedpublic: false. On Cube Cloud the Metadata API may still return them (and the connector merges them in); on Cube Core such cubes are not ingested as datasets, though lineage edges to them are still emitted. - Warehouse lineage on Cube Cloud requires a metadata-scoped token for the Metadata API (supplied via
api_token, or minted automatically withcloud_api_key+deployment_id+environment_id). Without it, the connector falls back to/v1/metaand only view-to-cube lineage is available. - The Control Plane audit-logs export and Orchestration API (pre-aggregation build jobs) are intentionally not used — they are operational/governance surfaces rather than data-catalog metadata, and the audit-logs export is an Enterprise-only CSV stream.
- Column-level lineage on Cube Core relies on member names matching the warehouse column names (Cube's default convention) and on the upstream table's schema already being present in DataHub. Members backed by a renamed or computed expression (e.g.
total_amountoveramount, or any aggregate measure) are not column-linked, since Cube Core's/v1/metadoes not expose the underlying member SQL. Cube Cloud's Metadata API provides exact references and has no such limitation. - Usage statistics and query profiling are not ingested. Cube does not expose query history through a pull API — it is only available via Query History export, which pushes logs to an external sink (e.g. S3). Ingesting that exported data would be a separate pipeline rather than a Metadata API feature.
- Pre-aggregation definitions are not exposed by Cube Core's
/v1/meta(it returns onlymeasures,dimensions,segments,hierarchies, andfolders); they are an internal caching concern. Where a payload does include them, their names are captured as custom properties.
Troubleshooting
"Required scope is missing" / Metadata API falls back to /v1/meta
The configured api_token is a regular REST/data token rather than a metadata-scoped token. Either set cloud_api_key + deployment_id + environment_id so the connector mints a metadata token via the Control Plane API, supply a pre-generated metadata token in api_token, or set use_metadata_api: false to silence the fallback warning.
No warehouse lineage appears
Confirm warehouse_platform is set (or auto-detected), and that the upstream datasets were ingested with the same warehouse_platform_instance and warehouse_env you configured here.
Warehouse lineage edges do not connect to existing datasets
When run against a DataHub instance (the usual case), the connector reconciles the casing of upstream warehouse table URNs and column names against what the warehouse connector actually ingested — it looks up the real schema in DataHub and snaps Cube's reported identifiers to it. This handles platforms that fold identifiers differently (Postgres/Redshift lower-case, Snowflake upper-case, BigQuery case-sensitive) without per-platform configuration.
When the upstream schema is not yet in DataHub (e.g. the warehouse has not been ingested, or a dry run with no server), there is nothing to reconcile against, so the connector falls back to its configured behaviour: it lowercases upstream warehouse table and column names by default. If the warehouse connector was configured with convert_urns_to_lowercase: false, set convert_lineage_urns_to_lowercase: false here so the fallback URNs match. Ingesting the warehouse first is the most reliable fix.
Code Coordinates
- Class Name:
datahub.ingestion.source.cube.cube.CubeSource - Browse on GitHub
If you've got any questions on configuring ingestion for Cube, feel free to ping us on our Slack.
This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.
Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.