Informatica

Overview

Informatica Intelligent Data Management Cloud (IDMC) is a cloud-native data integration and management platform. Learn more in the official Informatica documentation.

The DataHub integration for Informatica covers projects and folders as containers; Mapping Tasks as DataFlows with a transform DataJob per task; Taskflows as DataFlows with a single orchestrate DataJob that chains the step order via inputDatajobs; and resolves table-level lineage across the data estate from mapping source/target connections. It also supports ownership extraction and stateful deletion detection.

Concept Mapping

Source Concept	DataHub Concept	Notes
`"informatica"`	Data Platform
Project	Container	SubType `"Project"`
Folder	Container	SubType `"Folder"`
Taskflow	DataFlow + one `orchestrate` DataJob	SubTypes `"Taskflow"` / `"Taskflow Orchestration"`; the orchestrate sits at the end of the chain with `inputDatajobs = [last MT]`
Mapping Task	DataFlow + inner `transform` DataJob	SubTypes `"Mapping Task"` / `"Task Logic"`; MTs chain to each other via `inputDatajobs` in Taskflow step order
Mapping	not emitted as a standalone entity	Only Mapping Tasks (runnable schedules) are emitted; the Mapping reference is surfaced via customProperties on the Task
Mapplet	not emitted	Internal sub-mappings included in other mappings; skipped
Source/Target	Dataset	Upstream/downstream lineage; external dataset URNs receive a minimal `Status` stub so they resolve in lineage search

Module `informatica`

Important Capabilities

Capability	Status	Notes
Asset Containers	✅	Projects and folders as containers.
Detect Deleted Entities	✅	Via stateful ingestion.
Extract Ownership	✅	From IDMC object createdBy/updatedBy.
Extract Tags	✅	IDMC object tags emitted as DataHub GlobalTags.
Platform Instance	✅	Enabled by default.
Table-Level Lineage	✅	Table-level lineage via v3 Export API.
Test Connection	✅	Enabled by default.

Overview

The informatica module ingests metadata from Informatica Cloud (IDMC) into DataHub. It extracts projects, folders, Mapping Tasks, and Taskflows, and resolves table-level lineage from the Mapping each Task references. Standalone Mappings (ones without a Mapping Task) and Mapplets are not emitted.

Quick Start

Create a service account — Use a dedicated IDMC user with minimum permissions (see Required Permissions)
Identify your pod URL — Determine the IDMC regional login URL (US, US2, EMEA, or APAC)
Configure recipe — Use informatica_recipe.yml as a template
Run ingestion — Execute datahub ingest -c informatica_recipe.yml

Key Features

Projects and folders as Containers
Mapping Tasks as DataFlows with a transform DataJob each; Taskflows as DataFlows with one orchestrate DataJob that chains the MTs in step order
Table-level lineage (source → mapping → target) resolved via the v3 Export API and connection metadata; Mapping Tasks chain to each other in Taskflow step order and the Taskflow orchestrate DataJob anchors the end of the chain
Three-layer filtering: tag-based (recommended for large orgs), project/folder pattern, and mapping/taskflow name pattern
Cross-source lineage to datasets ingested by other connectors (Snowflake, Oracle, BigQuery, etc.) via connection type mapping
Manual connection type overrides for unusual or custom connectors
Stateful ingestion for stale entity removal
Ownership extraction from createdBy/updatedBy

Concept Mapping

IDMC concept	DataHub entity	Subtype
Project	Container	`Project`
Folder	Container	`Folder`
Taskflow	DataFlow and one `orchestrate` DataJob	`Taskflow` / `Taskflow Orchestration`
Mapping Task	DataFlow and one `transform` DataJob	`Mapping Task` / `Task Logic`
Mapping	not emitted — see notes	—
Mapplet	not emitted — see notes	—
Source/target	Dataset (upstream/downstream lineage)	—

Mapping Tasks are the runnable schedules in IDMC, and that's what we emit as first-class entities. Each MT's inner transform DataJob carries the dataJobInputOutput aspect with the source/target tables resolved from the Mapping it references — so cross-source lineage lands on the thing users actually schedule and operate.

Mappings without a Mapping Task are not emitted (they're not runnable on their own). Mapplets are not emitted either — they're internal sub-mappings included in other mappings. The referenced Mapping's friendly name, v2 id, and v3 GUID are still surfaced as customProperties.mappingName / mappingId / mappingV3Id on every MT so you can cross-reference back to IDMC without leaving DataHub.

Taskflow step DAG

The Taskflow step order is resolved from the v3 Export API (.TASKFLOW.xml), parsed from the IDMC taskflowModel <eventContainer> / <service> / <link> graph. All Taskflow GUIDs for a single ingestion run are submitted as one export job for efficiency.

Rather than emitting a separate DataJob per step, the connector collapses step references into the MT they run and chains the MT transform DataJobs directly via dataJobInputOutput.inputDatajobs. A single orchestrate DataJob is emitted per Taskflow and anchored at the end of the chain: inputDatajobs = [last MT], outputDatasets mirrors the last MT's outputs.

The resulting Taskflow lineage reads cleanly end to end:

input_dataset → MT1.transform → MT2.transform → … → MTn.transform → orchestrate → output_dataset

Non-data steps (command / decision / notification / …) don't participate in the chain but are summarized in customProperties.stepSummary on the orchestrate DataJob for auditing.

Prerequisites

Required Permissions

Capability	IDMC privilege	Notes
Authenticate	Any active IDMC user	Uses the v2 login endpoint
List projects, folders, taskflows	`Asset - read` (or the Observer role)	Needed for all container/flow emission
List mappings / mapping tasks	`Asset - read`	Mapping Tasks are optional and skipped with a warning if 403
Extract table-level lineage	`Asset - export`	Submits v3 export jobs; skip by setting `extract_lineage: false`
List connections	`Connection - read`	Needed for lineage to resolve to dataset URNs

Set login_url to your IDMC pod's regional URL (not the API runtime URL — the connector discovers that from the login response):

Region	`login_url`
US	`https://dm-us.informaticacloud.com`
US2	`https://dm2-us.informaticacloud.com`
EMEA	`https://dm-em.informaticacloud.com`
APAC	`https://dm-ap.informaticacloud.com`

References

Install the Plugin

pip install 'acryl-datahub[informatica]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
  type: informatica
  config:
    # -------------------------------------------------------------------------
    # Connection
    # -------------------------------------------------------------------------

    # Regional login URL for your IDMC pod. The connector discovers the runtime
    # API URL from the login response. Common values:
    #   US     https://dm-us.informaticacloud.com
    #   US2    https://dm2-us.informaticacloud.com
    #   EMEA   https://dm-em.informaticacloud.com
    #   APAC   https://dm-ap.informaticacloud.com
    login_url: "https://dm-us.informaticacloud.com"

    # IDMC service account. Prefer a dedicated user with the Observer role
    # plus "Asset - export" (required for lineage, see README).
    username: "${IDMC_USERNAME}"
    password: "${IDMC_PASSWORD}"

    # Optional: group entities into a platform instance if you ingest more than
    # one IDMC org/pod into the same DataHub instance.
    # platform_instance: "idmc_prod"
    # env: "PROD"

    # -------------------------------------------------------------------------
    # Filtering — combine any or all three layers
    # -------------------------------------------------------------------------

    # Layer 1 (recommended for large orgs): only ingest objects tagged in IDMC
    # with at least one of these names. Tags are matched exactly.
    # Applies to Projects, Folders, Taskflows, and Mapping Tasks only —
    # Mappings and Connections are always fetched in full regardless of this filter.
    # tag_filter_names: ["datahub", "critical"]

    # Layer 2: filter by project/folder name (regex).
    # project_pattern:
    #   allow:
    #     - "^Production_.*"
    #   deny:
    #     - ".*_sandbox$"
    # folder_pattern:
    #   allow:
    #     - ".*"

    # Layer 3: filter by mapping/taskflow name (regex, applied across all matches).
    # mapping_pattern:
    #   allow:
    #     - ".*"
    # taskflow_pattern:
    #   allow:
    #     - ".*"

    # -------------------------------------------------------------------------
    # Features
    # -------------------------------------------------------------------------

    # Requires the "Asset - export" privilege on the service account.
    extract_lineage: true

    # Derives owners from IDMC createdBy/updatedBy fields.
    extract_ownership: true

    # Emits IDMC object tags as DataHub GlobalTags on Projects, Folders,
    # Taskflows, and Mapping Tasks. Defaults to true — tags will be ingested
    # even if this field is not specified.
    extract_tags: true

    # -------------------------------------------------------------------------
    # Connection → platform overrides
    # -------------------------------------------------------------------------

    # Use when IDMC reports a connection type the connector doesn't know about.
    # Keys are IDMC connection IDs; values are DataHub platform names.
    # connection_type_overrides:
    #   "01DM180B000000000008": "snowflake"

    # -------------------------------------------------------------------------
    # Performance (tune for large orgs)
    # -------------------------------------------------------------------------

    # page_size: 200                 # v3 objects per page (max 200)
    # export_batch_size: 1000        # mappings per export job (max 1000)
    # export_poll_timeout_secs: 300  # seconds to wait for an export job
    # export_poll_interval_secs: 5   # seconds between export polls

    # -------------------------------------------------------------------------
    # Stateful ingestion — recommended for automatic stale-entity removal
    # -------------------------------------------------------------------------

    stateful_ingestion:
      enabled: true

sink:
  type: datahub-rest
  config:
    server: "http://localhost:8080"

Config Details

Options
Schema

Note that a . is used to denote nested fields in the YAML recipe.

Field	Description
password ✅ string(password)	Informatica Cloud password.
username ✅ string	Informatica Cloud username (email or service account name).
connection_to_platform_instance map(str,string)
connection_type_overrides map(str,string)
connection_type_platform_map map(str,string)
convert_urns_to_lowercase boolean	Lowercase the dataset qualifier in emitted upstream URNs to match the default behavior of the Snowflake, Postgres, and BigQuery sources (which lowercase by default). Set to False only if you've disabled lowercasing on every source this connector produces lineage to. Default: True
export_batch_size integer	Number of mappings per v3 export batch job (max 1000). Default: 1000
export_poll_interval_secs integer	Interval in seconds between export job status polls. Default: 5
export_poll_timeout_secs integer	Timeout in seconds for polling export job completion. Default: 300
extract_lineage boolean	Whether to extract table-level lineage from mapping definitions. Requires the 'Asset - export' privilege on the service account. When enabled, uses the v3 Export API to fetch full mapping definitions. Default: True
extract_ownership boolean	Whether to extract ownership from IDMC object createdBy/updatedBy fields. Default: True
extract_tags boolean	Emit IDMC object tags as DataHub GlobalTags on Projects, Folders, Taskflows, and Mapping Tasks. Set to False to skip tag extraction. Default: True
login_url string	Informatica Cloud login URL. This is the regional pod URL, not the runtime serverUrl. After login, the connector discovers the actual API base URL from the login response. Common values: https://dm-us.informaticacloud.com (US), https://dm2-us.informaticacloud.com (US2), https://dm-em.informaticacloud.com (EMEA), https://dm-ap.informaticacloud.com (APAC). Default: https://dm-us.informaticacloud.com
max_concurrent_export_jobs integer	Maximum number of v3 export jobs to run concurrently. Each job covers one batch of mappings. Increase to reduce lineage wall-clock time on large orgs; decrease if hitting IDMC rate limits. Default: 4
page_size integer	Number of objects to fetch per API page (max 200 for v3 objects). Default: 200
platform_instance One of string, null	The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None
request_timeout_secs integer	HTTP timeout in seconds for IDMC API requests. Raise this for large deployments where /api/v2/mapping or /api/v2/connection returns many records and the default 60s is insufficient. Default: 60
strip_user_email_domain boolean	Strip the domain from IDMC user identifiers before forming the CorpUser URN (e.g. `alice@acme.com` → `urn:li:corpuser:alice`). Enable when your Okta/AzureAD source ingests users without the email domain so ownership edges align with existing CorpUser URNs. Default: False
env string	The environment that all assets produced by this connector belong to Default: PROD
folder_pattern AllowDenyPattern	A class to store allow deny regexes
folder_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
mapping_task_pattern AllowDenyPattern	A class to store allow deny regexes
mapping_task_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
project_pattern AllowDenyPattern	A class to store allow deny regexes
project_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
tag_filter_names array	List of literal IDMC tag names. When set, only objects tagged with at least one of these tags will be ingested. Tags are matched exactly (not regex). This is the recommended filtering approach for large orgs — IDMC admins tag objects in the UI and the connector picks them up. Default: []
tag_filter_names.string string
taskflow_pattern AllowDenyPattern	A class to store allow deny regexes
taskflow_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null	Configuration for stateful ingestion and stale entity removal. Default: None
stateful_ingestion.enabled boolean	Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False Default: False
stateful_ingestion.fail_safe_threshold number	Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0
stateful_ingestion.remove_stale_metadata boolean	Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True

The JSONSchema for this configuration is inlined below.

{
  "$defs": {
    "AllowDenyPattern": {
      "additionalProperties": false,
      "description": "A class to store allow deny regexes",
      "properties": {
        "allow": {
          "default": [
            ".*"
          ],
          "description": "List of regex patterns to include in ingestion",
          "items": {
            "type": "string"
          },
          "title": "Allow",
          "type": "array"
        },
        "deny": {
          "default": [],
          "description": "List of regex patterns to exclude from ingestion.",
          "items": {
            "type": "string"
          },
          "title": "Deny",
          "type": "array"
        },
        "ignoreCase": {
          "anyOf": [
            {
              "type": "boolean"
            },
            {
              "type": "null"
            }
          ],
          "default": true,
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "title": "Ignorecase"
        }
      },
      "title": "AllowDenyPattern",
      "type": "object"
    },
    "StatefulStaleMetadataRemovalConfig": {
      "additionalProperties": false,
      "description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
      "properties": {
        "enabled": {
          "default": false,
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "title": "Enabled",
          "type": "boolean"
        },
        "remove_stale_metadata": {
          "default": true,
          "description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
          "title": "Remove Stale Metadata",
          "type": "boolean"
        },
        "fail_safe_threshold": {
          "default": 75.0,
          "description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
          "maximum": 100.0,
          "minimum": 0.0,
          "title": "Fail Safe Threshold",
          "type": "number"
        }
      },
      "title": "StatefulStaleMetadataRemovalConfig",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "description": "Configuration for Informatica Cloud (IDMC) ingestion source.",
  "properties": {
    "stateful_ingestion": {
      "anyOf": [
        {
          "$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Configuration for stateful ingestion and stale entity removal."
    },
    "env": {
      "default": "PROD",
      "description": "The environment that all assets produced by this connector belong to",
      "title": "Env",
      "type": "string"
    },
    "platform_instance": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
      "title": "Platform Instance"
    },
    "login_url": {
      "default": "https://dm-us.informaticacloud.com",
      "description": "Informatica Cloud login URL. This is the regional pod URL, not the runtime serverUrl. After login, the connector discovers the actual API base URL from the login response. Common values: https://dm-us.informaticacloud.com (US), https://dm2-us.informaticacloud.com (US2), https://dm-em.informaticacloud.com (EMEA), https://dm-ap.informaticacloud.com (APAC).",
      "title": "Login Url",
      "type": "string"
    },
    "username": {
      "description": "Informatica Cloud username (email or service account name).",
      "title": "Username",
      "type": "string"
    },
    "password": {
      "description": "Informatica Cloud password.",
      "format": "password",
      "title": "Password",
      "type": "string",
      "writeOnly": true
    },
    "project_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns to filter IDMC projects by name. Only projects matching these patterns will be ingested. Example: allow: ['Production.*'], deny: ['.*_sandbox']"
    },
    "folder_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns to filter IDMC folders by name. Only folders matching these patterns will be ingested."
    },
    "mapping_task_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns to filter Mapping Tasks. Matched against '<folder_path>/<name>' so same-named tasks in different folders can be targeted independently (e.g. allow: ['.*/ProjectA/MyTask'])."
    },
    "taskflow_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns to filter Taskflows. Matched against '<folder_path>/<name>' so same-named taskflows in different folders can be targeted independently (e.g. allow: ['.*/ProjectA/MyFlow'])."
    },
    "tag_filter_names": {
      "default": [],
      "description": "List of literal IDMC tag names. When set, only objects tagged with at least one of these tags will be ingested. Tags are matched exactly (not regex). This is the recommended filtering approach for large orgs \u2014 IDMC admins tag objects in the UI and the connector picks them up.",
      "items": {
        "type": "string"
      },
      "title": "Tag Filter Names",
      "type": "array"
    },
    "extract_lineage": {
      "default": true,
      "description": "Whether to extract table-level lineage from mapping definitions. Requires the 'Asset - export' privilege on the service account. When enabled, uses the v3 Export API to fetch full mapping definitions.",
      "title": "Extract Lineage",
      "type": "boolean"
    },
    "extract_ownership": {
      "default": true,
      "description": "Whether to extract ownership from IDMC object createdBy/updatedBy fields.",
      "title": "Extract Ownership",
      "type": "boolean"
    },
    "strip_user_email_domain": {
      "default": false,
      "description": "Strip the domain from IDMC user identifiers before forming the CorpUser URN (e.g. ``alice@acme.com`` \u2192 ``urn:li:corpuser:alice``). Enable when your Okta/AzureAD source ingests users without the email domain so ownership edges align with existing CorpUser URNs.",
      "title": "Strip User Email Domain",
      "type": "boolean"
    },
    "extract_tags": {
      "default": true,
      "description": "Emit IDMC object tags as DataHub GlobalTags on Projects, Folders, Taskflows, and Mapping Tasks. Set to False to skip tag extraction.",
      "title": "Extract Tags",
      "type": "boolean"
    },
    "connection_type_overrides": {
      "additionalProperties": {
        "type": "string"
      },
      "default": {},
      "description": "Per-connection-ID override mapping IDMC connection id \u2192 DataHub platform name. Use when a single connection can't be auto-resolved (e.g., a one-off custom connector). Example: {'01DM180B000000000008': 'snowflake'}. Takes priority over `connection_type_platform_map` and the built-in CONNECTION_TYPE_MAP.",
      "title": "Connection Type Overrides",
      "type": "object"
    },
    "connection_type_platform_map": {
      "additionalProperties": {
        "type": "string"
      },
      "default": {},
      "description": "Extend the built-in connection-type \u2192 DataHub-platform map with custom entries. Keys are the IDMC `connParams[\"Connection Type\"]` string (or the connection's top-level `type` as a fallback), values are DataHub platform names. Useful for new IDMC marketplace connectors that aren't in the built-in map yet. Example: {'MyCustomConnector_v3': 'snowflake', 'CompanyDW': 'postgres'}. Entries here are merged with (and override) CONNECTION_TYPE_MAP.",
      "title": "Connection Type Platform Map",
      "type": "object"
    },
    "convert_urns_to_lowercase": {
      "default": true,
      "description": "Lowercase the dataset qualifier in emitted upstream URNs to match the default behavior of the Snowflake, Postgres, and BigQuery sources (which lowercase by default). Set to False only if you've disabled lowercasing on every source this connector produces lineage to.",
      "title": "Convert Urns To Lowercase",
      "type": "boolean"
    },
    "connection_to_platform_instance": {
      "additionalProperties": {
        "type": "string"
      },
      "default": {},
      "description": "Map IDMC connection ID \u2192 DataHub `platform_instance` to use when building upstream/downstream dataset URNs. Required whenever the target source was ingested with a non-default platform_instance; otherwise lineage edges will point at URNs that don't exist in DataHub. Example: {'01DM180B000000000008': 'prod_sf'}.",
      "title": "Connection To Platform Instance",
      "type": "object"
    },
    "page_size": {
      "default": 200,
      "description": "Number of objects to fetch per API page (max 200 for v3 objects).",
      "maximum": 200,
      "minimum": 1,
      "title": "Page Size",
      "type": "integer"
    },
    "export_batch_size": {
      "default": 1000,
      "description": "Number of mappings per v3 export batch job (max 1000).",
      "maximum": 1000,
      "minimum": 1,
      "title": "Export Batch Size",
      "type": "integer"
    },
    "export_poll_timeout_secs": {
      "default": 300,
      "description": "Timeout in seconds for polling export job completion.",
      "maximum": 3600,
      "minimum": 30,
      "title": "Export Poll Timeout Secs",
      "type": "integer"
    },
    "export_poll_interval_secs": {
      "default": 5,
      "description": "Interval in seconds between export job status polls.",
      "maximum": 600,
      "minimum": 1,
      "title": "Export Poll Interval Secs",
      "type": "integer"
    },
    "request_timeout_secs": {
      "default": 60,
      "description": "HTTP timeout in seconds for IDMC API requests. Raise this for large deployments where /api/v2/mapping or /api/v2/connection returns many records and the default 60s is insufficient.",
      "maximum": 600,
      "minimum": 5,
      "title": "Request Timeout Secs",
      "type": "integer"
    },
    "max_concurrent_export_jobs": {
      "default": 4,
      "description": "Maximum number of v3 export jobs to run concurrently. Each job covers one batch of mappings. Increase to reduce lineage wall-clock time on large orgs; decrease if hitting IDMC rate limits.",
      "maximum": 8,
      "minimum": 1,
      "title": "Max Concurrent Export Jobs",
      "type": "integer"
    }
  },
  "required": [
    "username",
    "password"
  ],
  "title": "InformaticaSourceConfig",
  "type": "object"
}

Capabilities

Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.

Filtering

Three filter layers can be combined, applied in order:

Tag-based (tag_filter_names, recommended for large orgs) — an allowlist of IDMC tags; only tagged objects are ingested.
Path-based (project_pattern, folder_pattern) — regex allow/deny on project and folder names.
Name-based (mapping_pattern, taskflow_pattern) — regex allow/deny on mapping and taskflow names.

Connection Type Mapping

When emitting lineage, each IDMC connection is mapped to a DataHub platform (e.g. Snowflake_Cloud_Data_Warehouse → snowflake). The mapping is driven by connParams["Connection Type"]. If IDMC returns an unknown type (or a customer-specific connector), set connection_type_overrides to map that connection ID to a DataHub platform name. The connector will warn about unknown platforms at config-parse time.

External Dataset Stubs

Every input/output dataset URN referenced by mapping lineage receives a minimal Status aspect when it is first seen. Without this stub, DataHub treats URNs that no other connector has ingested as non-existent and searchAcrossLineage filters them out of results — which would leave the left-side chevron on a Mapping Task's transform DataJob unable to expand upstream datasets. The stub is idempotent and does not override Schema, Ownership, or other metadata written by the source platform's own connector when it runs.

Limitations

No column-level lineage — the v3 export gives us transformation-level source/target tables but not column mappings.
No execution history — the connector does not ingest Activity Monitor runs as DataProcessInstances.
Taskflow step DAG requires Asset - export — Taskflow step ordering lives in a taskflowModel XML document fetched via the v3 Export API. Ingestion will silently no-op the step chain for Taskflows the user can't export (the Taskflow itself is still emitted as a DataFlow with its orchestrate DataJob, but that orchestrate won't have an inputDatajobs chain). The report includes taskflows_with_steps so you can confirm coverage.
Single-user auth only — service-principal / federated SSO login is not supported; use a native IDMC user.
v2 API endpoints are not paginated — /api/v2/mapping and /api/v2/connection return all records in a single response; the IDMC v2 API does not honour limit, skip, or maxRecordsCount parameters (verified against a live instance). For orgs with very large numbers of mappings (>10k) or connections (>1k), the single call may exceed request_timeout_secs or produce a very large response. Mitigations: raise request_timeout_secs, or use tag_filter_names to scope ingestion to a tagged subset of objects.

Troubleshooting

`IDMC login failed` at startup

The connector raises this when the v2 login endpoint returns non-200 or a body without icSessionId/serverUrl. Common causes:

Wrong login_url for your pod (see the region table in the Prerequisites section).
Service account locked out, MFA-protected, or password-expired. Use a dedicated IDMC user without interactive MFA.
Firewall blocking egress to *.informaticacloud.com.

The raised error includes the HTTP status, a truncated response body, and the login_url used.

`connections_unresolved` entries in the report

The connector resolves lineage dataset URNs by matching the mapping's connectionId (e.g. saas:@fed-xyz) against the IDMC connection catalog. If a connection cannot be mapped to a DataHub platform, the lineage edge is dropped and the connection is recorded in connections_unresolved. Two typical causes:

The connection uses a type not in the built-in CONNECTION_TYPE_MAP (e.g. a custom connector). Add it to connection_type_overrides with the connection ID → DataHub platform.
The Connection - read privilege is missing from the service account, so list_connections fetches an empty or partial catalog.

`Failed to fetch mapping tasks` warning

Mapping Tasks live at /api/v2/mttask, which is often restricted to specific roles. The connector treats this as a warning (not a failure) because mapping and lineage ingestion can still complete without it. Grant Asset - read on mapping tasks if you need them.

Export job timed out

The v3 Export API is asynchronous; for very large orgs, the default export_poll_timeout_secs: 300 may be too short. Try:

Reduce export_batch_size (default 1000) — smaller batches finish faster individually.
Raise export_poll_timeout_secs (max 3600).
Use tag_filter_names to scope the export to tagged mappings only.

The connector emits a report warning titled "IDMC export job timed out" for each timed-out batch and records it under export_jobs_failed.

Add-On Bundles showing up

IDMC ships several marketplace bundles (e.g. Cloud Data Integration templates). The connector filters these out automatically by checking path.startswith("Add-On Bundles/") or updated_by == "bundle-license-notifier". If you see bundle mappings leaking through, open an issue with the offending object's path.

Code Coordinates

Class Name: datahub.ingestion.source.informatica.source.InformaticaSource
Browse on GitHub

Questions?

If you've got any questions on configuring ingestion for Informatica, feel free to ping us on our Slack.

💡 Contributing to this documentation

This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.

Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.

Informatica

Overview​

Concept Mapping​

Module informatica​

Important Capabilities​

Overview​

Key Features​

Concept Mapping​

Taskflow step DAG​

Prerequisites​

Required Permissions​

Regional login URLs​

References​

Install the Plugin​

Starter Recipe​

Config Details​

Capabilities​

Filtering​

Connection Type Mapping​

External Dataset Stubs​

Limitations​

Troubleshooting​

IDMC login failed at startup​

connections_unresolved entries in the report​

Failed to fetch mapping tasks warning​

Export job timed out​

Add-On Bundles showing up​

Code Coordinates​

Overview

Concept Mapping

Module `informatica`

Important Capabilities

Overview

Key Features

Concept Mapping

Taskflow step DAG

Prerequisites

Required Permissions

Regional login URLs

References

Install the Plugin

Starter Recipe

Config Details

Capabilities

Filtering

Connection Type Mapping

External Dataset Stubs

Limitations

Troubleshooting

`IDMC login failed` at startup

`connections_unresolved` entries in the report

`Failed to fetch mapping tasks` warning

Export job timed out

Add-On Bundles showing up

Code Coordinates