Fabric Data Factory
Overview
Microsoft Fabric Data Factory is a cloud-based data integration service within the Microsoft Fabric platform. Learn more in the official Microsoft Fabric Data Factory documentation.
The DataHub integration for Fabric Data Factory covers pipeline and orchestration entities such as workspaces, data pipelines, and activities. Depending on module capabilities, it can also capture features such as lineage, execution history, platform instance mapping, and stateful deletion detection.
Concept Mapping
| Fabric Data Factory Concept | DataHub Entity | Notes |
|---|---|---|
| Workspace | Container (subtype: Fabric Workspace) | Top-level organizational unit |
| Data Pipeline | DataFlow | Orchestration pipeline containing activities |
| Activity | DataJob | Individual task within a pipeline (Copy, Lookup, Spark, etc.) |
| Pipeline Run | DataProcessInstance | Execution record for a pipeline run |
| Activity Run | DataProcessInstance | Execution record for an individual activity within a pipeline |
| Connection | (resolved to external Dataset) | Used for lineage resolution to datasets on external platforms |
Hierarchy Structure
Platform (fabric-data-factory)
└── Workspace (Container)
└── Data Pipeline (DataFlow)
└── Activity (DataJob)
├── Pipeline Run (DataProcessInstance)
└── Activity Run (DataProcessInstance)
Module fabric-data-factory
Important Capabilities
| Capability | Status | Notes |
|---|---|---|
| Asset Containers | ✅ | Enabled by default. |
| Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion config. |
| Platform Instance | ✅ | Enabled by default. |
| Table-Level Lineage | ✅ | Enabled by default via Copy and InvokePipeline activities. |
Overview
The fabric-data-factory module ingests metadata from Microsoft Fabric Data Factory into DataHub. It extracts workspaces, data pipelines, activities, and execution history, and resolves lineage from Copy activities to external datasets.
- Set up authentication — Configure Azure credentials (see Prerequisites)
- Enable API access — Ensure a Fabric admin has enabled service principal API access (if using SP or managed identity)
- Grant permissions — Add your identity as a workspace Contributor (required for pipeline definitions and lineage)
- Configure recipe — Use
fabric-data-factory_recipe.ymlas a template - Run ingestion — Execute
datahub ingest -c fabric-data-factory_recipe.yml
Key Features
- Workspaces as containers, data pipelines as DataFlows (DataHub entity type), activities as DataJobs
- Dataset-level lineage from Copy and InvokePipeline activities
- Pipeline and activity execution history as DataProcessInstances
- Cross-recipe lineage via
platform_instance_mapfor connecting to externally ingested datasets - Pattern-based filtering for workspaces and pipelines
- Stateful ingestion for stale entity removal
- Multiple authentication methods (Service Principal, Managed Identity, Azure CLI, DefaultAzureCredential)
References
Azure Authentication
- Register an application with Microsoft Entra ID
- Azure Identity Library
- Service Principal Authentication
- Managed Identities
Fabric Data Factory Concepts
Prerequisites
Required Permissions
The connector requires Contributor role on each workspace. Contributor is needed to fetch pipeline definitions without it. With Reader role only, the connector will list workspaces and pipelines but will not extract pipeline activities, activity run details, or lineage.
Delegated (on behalf of a user) authentication
If using delegated auth (e.g., Azure CLI), the signed-in user's existing Fabric permissions apply directly. The connector requires the following delegated scopes:
Workspace.Read.AllorWorkspace.ReadWrite.All— for listing workspaces and itemsItem.ReadWrite.AllorDataPipeline.ReadWrite.All— for Get Item Definition, List Item Connections, and Query Activity Runs (Item.Read.Allis not sufficient for definitions and connections)Item.Read.AllorDataPipeline.Read.All— sufficient for List Item Job Instances (execution history)
The Azure CLI token includes the necessary Fabric API scopes by default.
Service Principal and Managed Identity authentication
Service principals and managed identities do not inherit any permissions by default. You need to:
- Enable API access: A Fabric admin must enable the service principal tenant settings (see Fabric Admin Settings below)
- Grant workspace access: Add the SP or MI as a workspace Contributor for each workspace you want to ingest
Fabric Admin Settings
For service principal and managed identity authentication, a Fabric administrator must enable API access for service principals in the Fabric admin portal. Without this, API calls will fail with 401 errors even if workspace permissions are correctly assigned.
As of mid-2025, Microsoft split the original single tenant setting into two separate settings. Configure them as follows:
- Go to the Fabric Admin Portal > Tenant settings
- Under Developer settings, enable the applicable setting(s):
- Service principals can call Fabric public APIs — Controls access to CRUD APIs protected by the Fabric permission model (e.g., reading workspaces and items). This is enabled by default for new tenants since August 2025.
- Service principals can create workspaces, connections, and deployment pipelines — Controls access to global APIs not protected by Fabric permissions. This is disabled by default. Enable only if needed.
- Restrict access to a dedicated security group containing only the service principals that need API access. This is the recommended approach.
If you are on an older tenant where the legacy single setting Service principals can use Fabric APIs is still visible, enable that instead. It will be automatically migrated to the two new settings.
Tenant setting changes can take up to 15 minutes to propagate. If you receive 401 errors immediately after enabling, wait and retry.
For detailed instructions, see Developer admin settings and Identity support for Fabric REST APIs.
Authentication
The connector supports four authentication methods via the shared credential config block. All methods use Azure's TokenCredential interface.
Service Principal (recommended for production)
Register an application in Microsoft Entra ID and note the client_id, client_secret, and tenant_id. Then:
- Ensure the Fabric admin has enabled service principal API access (see Fabric Admin Settings above)
- Create a security group in Entra ID and add the service principal as a member
- Add the security group as Contributor in each target workspace (Contributor role grants access to pipeline definitions and item connections for lineage)
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
All three fields are required when using this method.
Managed Identity (for Azure-hosted deployments)
Use this when running DataHub ingestion on an Azure VM, AKS, App Service, or other Azure compute that supports managed identities. The managed identity must be added as a workspace Contributor in Fabric. A Fabric admin must also enable the tenant settings described in Fabric Admin Settings above — these settings govern API access for both service principals and managed identities, despite the setting name referencing only service principals.
# System-assigned managed identity (no additional config needed)
credential:
authentication_method: managed_identity
For user-assigned managed identity, provide the client ID:
credential:
authentication_method: managed_identity
managed_identity_client_id: "<your-managed-identity-client-id>"
Azure CLI (for local development and testing)
Uses the credentials from your local az login session. The signed-in user's existing Fabric permissions apply directly — no additional setup needed beyond workspace access.
credential:
authentication_method: cli
Run az login before starting ingestion. For remote servers without a browser, use az login.
DefaultAzureCredential (flexible auto-detection)
Uses Azure's DefaultAzureCredential chain, which tries multiple credential sources in order: environment variables, workload identity, managed identity, shared token cache, Azure CLI, Azure PowerShell, Azure Developer CLI, and more.
credential:
authentication_method: default
You can exclude specific credential sources from the chain to speed up detection or avoid unintended auth in mixed environments:
credential:
authentication_method: default
exclude_cli_credential: true # Skip Azure CLI (recommended in production)
exclude_environment_credential: false
exclude_managed_identity_credential: false
Setup
- Choose an authentication method from above and configure the
credentialblock. - If using service principal or managed identity:
- Ensure the Fabric admin has enabled the appropriate developer settings (see Fabric Admin Settings)
- Create a security group, add your identity, and grant Contributor on target workspaces
- If using Azure CLI, run
az login(oraz login --use-device-codeon remote servers). - Configure the ingestion recipe with optional workspace and pipeline filters.
Install the Plugin
pip install 'acryl-datahub[fabric-data-factory]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
# Example recipe for Fabric Data Factory source
# See README.md for full configuration options
source:
type: fabric-data-factory
config:
# Authentication (using service principal)
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
# Optional: Filter workspaces by name pattern
workspace_pattern:
allow:
- ".*" # Allow all workspaces by default
deny: []
# Optional: Filter pipelines by name pattern
pipeline_pattern:
allow:
- ".*" # Allow all pipelines by default
deny: []
# Feature flags
extract_pipelines: true
include_lineage: true
include_execution_history: true
execution_history_days: 7 # 1-90 days
# Optional: Map Fabric connection names to platform instances for accurate lineage
# platform_instance_map:
# "my-snowflake-connection": "prod_snowflake"
# "my-bigquery-connection": "analytics_project"
# Optional: Platform instance for this Fabric Data Factory connector
# platform_instance: "my-fabric-tenant"
# Environment
env: PROD
# Optional: Stateful ingestion for stale entity removal
# stateful_ingestion:
# enabled: true
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
token: ${DATAHUB_GMS_TOKEN}
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description |
|---|---|
api_timeout integer | Timeout for REST API calls in seconds. Default: 30 |
execution_history_days integer | Number of days of execution history to extract. Only used when include_execution_history is True. Higher values increase ingestion time. Note: Fabric API returns at most 100 recently completed runs per pipeline. Default: 7 |
extract_pipelines boolean | Whether to extract Data Pipelines and their activities. Default: True |
include_execution_history boolean | Extract pipeline and activity execution history as DataProcessInstance. Includes run status, duration, and parameters. Enables lineage extraction from parameterized activities using actual runtime values. Default: True |
include_lineage boolean | Extract lineage from activity inputs/outputs. Maps Fabric connections to DataHub datasets based on connection type. Default: True |
platform_instance One of string, null | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None |
platform_instance_map map(str,string) | |
env string | The environment that all assets produced by this connector belong to Default: PROD |
credential AzureCredentialConfig | Unified Azure authentication configuration. This class provides a reusable authentication configuration that can be composed into any Azure connector's configuration. It supports multiple authentication methods and returns a TokenCredential that works with any Azure SDK client. Example usage in a connector config: class MyAzureConnectorConfig(ConfigModel): credential: AzureCredentialConfig = Field( default_factory=AzureCredentialConfig, description="Azure authentication configuration" ) subscription_id: str = Field(...) |
credential.authentication_method Enum | One of: "default", "service_principal", "managed_identity", "cli" |
credential.client_id One of string, null | Azure Application (client) ID. Required for service_principal authentication. Find this in Azure Portal > App registrations > Your app > Overview. Default: None |
credential.client_secret One of string(password), null | Azure client secret. Required for service_principal authentication. Create in Azure Portal > App registrations > Your app > Certificates & secrets. Default: None |
credential.exclude_cli_credential boolean | When using 'default' authentication, exclude Azure CLI credential. Useful in production to avoid accidentally using developer credentials. Default: False |
credential.exclude_environment_credential boolean | When using 'default' authentication, exclude environment variables. Environment variables checked: AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID. Default: False |
credential.exclude_managed_identity_credential boolean | When using 'default' authentication, exclude managed identity. Useful during local development when managed identity is not available. Default: False |
credential.managed_identity_client_id One of string, null | Client ID for user-assigned managed identity. Leave empty to use system-assigned managed identity. Only used when authentication_method is 'managed_identity'. Default: None |
credential.tenant_id One of string, null | Azure tenant (directory) ID. Required for service_principal authentication. Find this in Azure Portal > Microsoft Entra ID > Overview. Default: None |
pipeline_pattern AllowDenyPattern | A class to store allow deny regexes |
pipeline_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
workspace_pattern AllowDenyPattern | A class to store allow deny regexes |
workspace_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null | Configuration for stateful ingestion and stale entity removal. When enabled, tracks ingested entities and removes those that no longer exist in Fabric. Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"AllowDenyPattern": {
"additionalProperties": false,
"description": "A class to store allow deny regexes",
"properties": {
"allow": {
"default": [
".*"
],
"description": "List of regex patterns to include in ingestion",
"items": {
"type": "string"
},
"title": "Allow",
"type": "array"
},
"deny": {
"default": [],
"description": "List of regex patterns to exclude from ingestion.",
"items": {
"type": "string"
},
"title": "Deny",
"type": "array"
},
"ignoreCase": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Whether to ignore case sensitivity during pattern matching.",
"title": "Ignorecase"
}
},
"title": "AllowDenyPattern",
"type": "object"
},
"AzureAuthenticationMethod": {
"description": "Supported Azure authentication methods.\n\n- DEFAULT: Uses DefaultAzureCredential which auto-detects credentials from\n environment variables, managed identity, Azure CLI, etc.\n- SERVICE_PRINCIPAL: Uses client ID, client secret, and tenant ID\n- MANAGED_IDENTITY: Uses Azure Managed Identity (system or user-assigned)\n- CLI: Uses Azure CLI credential (requires `az login`)",
"enum": [
"default",
"service_principal",
"managed_identity",
"cli"
],
"title": "AzureAuthenticationMethod",
"type": "string"
},
"AzureCredentialConfig": {
"additionalProperties": false,
"description": "Unified Azure authentication configuration.\n\nThis class provides a reusable authentication configuration that can be\ncomposed into any Azure connector's configuration. It supports multiple\nauthentication methods and returns a TokenCredential that works with\nany Azure SDK client.\n\nExample usage in a connector config:\n class MyAzureConnectorConfig(ConfigModel):\n credential: AzureCredentialConfig = Field(\n default_factory=AzureCredentialConfig,\n description=\"Azure authentication configuration\"\n )\n subscription_id: str = Field(...)",
"properties": {
"authentication_method": {
"$ref": "#/$defs/AzureAuthenticationMethod",
"default": "default",
"description": "Authentication method to use. Options: 'default' (auto-detects from environment), 'service_principal' (client ID + secret + tenant), 'managed_identity' (Azure Managed Identity), 'cli' (Azure CLI credential). Recommended: Use 'default' which tries multiple methods automatically."
},
"client_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Azure Application (client) ID. Required for service_principal authentication. Find this in Azure Portal > App registrations > Your app > Overview.",
"title": "Client Id"
},
"client_secret": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Azure client secret. Required for service_principal authentication. Create in Azure Portal > App registrations > Your app > Certificates & secrets.",
"title": "Client Secret"
},
"tenant_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Azure tenant (directory) ID. Required for service_principal authentication. Find this in Azure Portal > Microsoft Entra ID > Overview.",
"title": "Tenant Id"
},
"managed_identity_client_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Client ID for user-assigned managed identity. Leave empty to use system-assigned managed identity. Only used when authentication_method is 'managed_identity'.",
"title": "Managed Identity Client Id"
},
"exclude_cli_credential": {
"default": false,
"description": "When using 'default' authentication, exclude Azure CLI credential. Useful in production to avoid accidentally using developer credentials.",
"title": "Exclude Cli Credential",
"type": "boolean"
},
"exclude_environment_credential": {
"default": false,
"description": "When using 'default' authentication, exclude environment variables. Environment variables checked: AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID.",
"title": "Exclude Environment Credential",
"type": "boolean"
},
"exclude_managed_identity_credential": {
"default": false,
"description": "When using 'default' authentication, exclude managed identity. Useful during local development when managed identity is not available.",
"title": "Exclude Managed Identity Credential",
"type": "boolean"
}
},
"title": "AzureCredentialConfig",
"type": "object"
},
"StatefulStaleMetadataRemovalConfig": {
"additionalProperties": false,
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
},
"remove_stale_metadata": {
"default": true,
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"title": "Remove Stale Metadata",
"type": "boolean"
},
"fail_safe_threshold": {
"default": 75.0,
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"maximum": 100.0,
"minimum": 0.0,
"title": "Fail Safe Threshold",
"type": "number"
}
},
"title": "StatefulStaleMetadataRemovalConfig",
"type": "object"
}
},
"additionalProperties": false,
"description": "Configuration for Fabric Data Factory source.\n\nThis connector extracts metadata from Microsoft Fabric Data Factory items:\n- Workspaces as Containers\n- Data Pipelines as DataFlows with Activities as DataJobs\n- Copy Jobs as DataFlows with dataset-level lineage\n- Dataflow Gen2 as DataFlows (metadata only)",
"properties": {
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
"title": "Platform Instance"
},
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Configuration for stateful ingestion and stale entity removal. When enabled, tracks ingested entities and removes those that no longer exist in Fabric."
},
"credential": {
"$ref": "#/$defs/AzureCredentialConfig",
"description": "Azure authentication configuration. Supports service principal, managed identity, Azure CLI, or auto-detection (DefaultAzureCredential). See AzureCredentialConfig for detailed options."
},
"workspace_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter workspaces by name. Example: allow=['prod-.*'], deny=['.*-test']"
},
"pipeline_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter data pipelines by name. Applied to all workspaces matching workspace_pattern."
},
"extract_pipelines": {
"default": true,
"description": "Whether to extract Data Pipelines and their activities.",
"title": "Extract Pipelines",
"type": "boolean"
},
"include_lineage": {
"default": true,
"description": "Extract lineage from activity inputs/outputs. Maps Fabric connections to DataHub datasets based on connection type.",
"title": "Include Lineage",
"type": "boolean"
},
"include_execution_history": {
"default": true,
"description": "Extract pipeline and activity execution history as DataProcessInstance. Includes run status, duration, and parameters. Enables lineage extraction from parameterized activities using actual runtime values.",
"title": "Include Execution History",
"type": "boolean"
},
"execution_history_days": {
"default": 7,
"description": "Number of days of execution history to extract. Only used when include_execution_history is True. Higher values increase ingestion time. Note: Fabric API returns at most 100 recently completed runs per pipeline.",
"maximum": 90,
"minimum": 1,
"title": "Execution History Days",
"type": "integer"
},
"platform_instance_map": {
"additionalProperties": {
"type": "string"
},
"description": "Map connection names to DataHub platform instances. Example: {'my-snowflake-connection': 'prod_snowflake'}. Used for accurate lineage resolution to existing datasets.",
"title": "Platform Instance Map",
"type": "object"
},
"api_timeout": {
"default": 30,
"description": "Timeout for REST API calls in seconds.",
"maximum": 300,
"minimum": 1,
"title": "Api Timeout",
"type": "integer"
}
},
"title": "FabricDataFactorySourceConfig",
"type": "object"
}
Capabilities
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
Lineage Extraction
Which Activities Produce Lineage?
The connector extracts dataset-level lineage from these Fabric activity types:
| Activity Type | Lineage Behavior |
|---|---|
| Copy | Creates lineage from input dataset(s) to output dataset |
| InvokePipeline | Creates pipeline-to-pipeline lineage to the child pipeline |
Lineage is enabled by default (include_lineage: true).
How Lineage Resolution Works
For lineage to connect properly to datasets ingested from other sources (e.g., Snowflake, BigQuery), the connector resolves Fabric connections to DataHub platforms.
Step 1: Automatic Connection Mapping
The connector automatically maps Fabric connection types to DataHub platforms (e.g., a Snowflake connection maps to the snowflake platform). See FABRIC_CONNECTION_PLATFORM_MAP for the full list of supported mappings. Unsupported connection types fall back to using the connection type string as the platform name.
Step 2: Platform Instance Mapping (for cross-recipe lineage)
If you're ingesting the same data sources with other DataHub connectors (e.g., Snowflake, BigQuery), you need to ensure the platform_instance values match. Use platform_instance_map to map your Fabric connection names to the platform instance used in your other recipes:
# Fabric Data Factory Recipe
source:
type: fabric-data-factory
config:
credential:
authentication_method: service_principal
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
platform_instance_map:
# Key: Your Fabric connection name (exact match required)
# Value: The platform_instance from your other source recipe
"snowflake-prod-connection": "prod_warehouse"
"bigquery-analytics": "analytics_project"
# Corresponding Snowflake Recipe (platform_instance must match)
source:
type: snowflake
config:
platform_instance: "prod_warehouse" # Must match the value in platform_instance_map
# ... other config
Without matching platform_instance values, lineage will create separate dataset entities instead of connecting to your existing ingested datasets.
Execution History
Pipeline and activity runs are extracted as DataProcessInstance entities by default:
source:
type: fabric-data-factory
config:
include_execution_history: true # default
execution_history_days: 7 # 1-90 days
This provides run status, duration, timestamps, invoke type, and activity-level details including error messages and retry attempts.
The Fabric API returns at most 100 recently completed runs per pipeline. Run ingestion more frequently to capture deeper history.
Advanced: Multi-Tenant Setup
When to Use platform_instance
Use the connector's platform_instance config to distinguish separate Fabric tenants when ingesting from multiple environments:
| Scenario | Risk | Solution |
|---|---|---|
| Single tenant | None | Not needed |
| Multiple tenants | High - name collision risk | Required |
# Multi-tenant example
source:
type: fabric-data-factory
config:
platform_instance: "contoso-tenant" # Prevents URN collisions
Different Fabric tenants could have identically-named workspaces and pipelines. Use platform_instance to prevent entity overwrites.
URN Format
Pipeline URNs follow this format:
urn:li:dataFlow:(fabric-data-factory,{workspace_id}.{pipeline_id},{env})
With platform_instance:
urn:li:dataFlow:(fabric-data-factory,{platform_instance}.{workspace_id}.{pipeline_id},{env})
Limitations
- Run history limit: The Fabric API returns at most 100 recently completed runs per pipeline. If
execution_history_dayscovers more runs than this limit, only the most recent 100 are returned. Run ingestion more frequently to capture deeper history. - No Dataflow Gen2 support: Dataflow Gen2 items (standalone workspace-level items with transformation logic) are not extracted.
- No CopyJob support: Standalone CopyJob items at the workspace level are not extracted. Only Copy activities embedded within pipelines produce lineage.
- No trigger/schedule metadata: Pipeline triggers and schedules are not extracted.
- ExecutePipeline not supported: The
ExecutePipelineactivity type is marked as legacy in Fabric and is not supported for cross-pipeline lineage.
Lineage
- Lineage scope: Only Copy and InvokePipeline activities produce dataset or pipeline lineage. Other activity types (Lookup, Wait, ForEach, Script, etc.) are ingested as DataJobs without dataset-level lineage.
- InvokePipeline Activity operation types: Only the
InvokeFabricPipelineoperation type is supported for cross-pipeline lineage. Other operation types (InvokeAdfPipeline,InvokeExternalPipeline) are not resolved and will be skipped. - Query-based Copy sources: When a Copy activity uses
sqlReaderQueryorsqlReaderStoredProcedureNameinstead of a direct table reference, lineage is not extracted. - No column-level lineage: The connector extracts dataset-level lineage only. Column-to-column mappings from Copy activity translator configurations are not extracted.
- No Notebook/SparkJobDefinition lineage: Notebook and SparkJobDefinition activities are ingested as DataJobs but their lineage is not resolved.
- Connection resolution: Unmapped connection types fall back to using the connection type string as the platform name, which may not match your existing DataHub platform names. Use
platform_instance_mapto explicitly map connection names.
Troubleshooting
- 401/403 errors: Ensure the service principal has the correct Fabric API permissions and is added as a workspace member.
- Empty results: Check that
workspace_patternandpipeline_patternare not filtering out all items. - Missing lineage: Verify that
include_lineage: trueis set and that Fabric connections are properly configured for the pipelines. Also review the Lineage limitations section for unsupported activity types and scenarios. - Stale entities: Enable
stateful_ingestionto automatically remove entities that no longer exist in Fabric.
Code Coordinates
- Class Name:
datahub.ingestion.source.fabric.data_factory.source.FabricDataFactorySource - Browse on GitHub
If you've got any questions on configuring ingestion for Fabric Data Factory, feel free to ping us on our Slack.
This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.
Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.