Airbyte
Overview
Airbyte is an open-source data integration platform that syncs data from sources to destinations through configurable connections. It supports hundreds of pre-built connectors and lets you build custom ones.
This integration extracts metadata from Airbyte to give DataHub visibility into your data pipelines — including connections, sources, destinations, streams, and job execution history. It captures lineage between source and destination datasets at both the table and column level.
Concept Mapping
Here's a table for Concept Mapping between Airbyte and DataHub to provide a clear overview of how entities and concepts in Airbyte are mapped to corresponding entities in DataHub:
| Source Concept | DataHub Concept | Notes |
|---|---|---|
| Workspace | DataFlow | Top-level container for Airbyte resources |
| Connection | DataFlow | Represents an Airbyte connection between source and destination |
| Source | Dataset | Source datasets are mapped to DataHub datasets |
| Destination | Dataset | Destination datasets are mapped to DataHub datasets |
| Stream | DataJob | Each stream is represented as a DataJob within the Connection DataFlow |
| Connection Job | DataProcessInstance | Execution information for a connection run |
| Source Schema | SchemaMetadata | Schema information from source datasets |
| Column Mapping | FineGrainedLineage | Column-level lineage between source and destination |
Module airbyte
Important Capabilities
| Capability | Status | Notes |
|---|---|---|
| Column-level Lineage | ✅ | Enabled by default. |
| Detect Deleted Entities | ✅ | Enabled by default when stateful ingestion is turned on. |
| Extract Tags | ✅ | Requires recipe configuration. |
| Platform Instance | ✅ | Enabled by default. |
| Table-Level Lineage | ✅ | Enabled by default. |
Overview
This integration extracts metadata from Airbyte's API to capture information about your connections, sources, destinations, and the lineage between them.
Prerequisites
You'll need to have an Airbyte instance running with configured sources and destinations, and access to the Airbyte API.
Steps to Get the Required Information
Determine Your Deployment Type:
- Open Source (OSS): If you're running a self-hosted Airbyte instance
- Cloud: If you're using Airbyte Cloud
Authentication Credentials:
For Open Source (OSS):
- The URL of your Airbyte instance (host and port)
- OAuth2 client credentials (Airbyte 1.0+) - obtain via:
- UI: Navigate to User > User settings > Applications to create an application and copy credentials
- CLI: Run
abctl local credentials(abctl v0.11.0+)
- Username and password if basic authentication is enabled
- API token if available
For Airbyte Cloud:
- OAuth2 client ID and client secret (required)
- OAuth2 refresh token (optional — omit to use
client_credentialsgrant; provide to userefresh_tokengrant) - Your Airbyte Cloud workspace ID
API Access:
- For OSS users, ensure the API is accessible at
/api/public/v1path prefix - Verify connectivity by testing the health endpoint:
http://localhost:8000/api/public/v1/health - Ensure you have proper network connectivity between your DataHub instance and the Airbyte API
- For OSS users, ensure the API is accessible at
Permissions:
- The authentication credentials should have permissions to:
- Read workspace information
- List and read sources, destinations, and connections
- Access connection schemas and sync catalogs
- View job execution history (if extracting job statuses)
- The authentication credentials should have permissions to:
Install the Plugin
pip install 'acryl-datahub[airbyte]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: airbyte
config:
# Deployment type - required
deployment_type: oss # Options: "oss" (self-hosted) or "cloud" (Airbyte Cloud)
# Connection details for OSS deployment
host_port: http://localhost:8000 # Airbyte API endpoint URL
# Authentication for OSS deployment
username: your_username # Username for basic auth
password: your_password # Password for basic auth
# api_key: your_api_key # Alternative: API token if available
# Authentication for Cloud deployment - uncomment if using Airbyte Cloud
#deployment_type: cloud
#oauth2_client_id: your_client_id # OAuth2 client ID for Airbyte Cloud
#oauth2_client_secret: your_client_secret # OAuth2 client secret
#oauth2_refresh_token: your_refresh_token # OAuth2 refresh token
#cloud_workspace_id: your_workspace_id # Airbyte Cloud workspace ID
# SSL configuration
verify_ssl: false # Whether to verify SSL certificates
#ssl_ca_cert: /path/to/cert.pem # Path to CA certificate file (optional)
# Data extraction options
extract_column_level_lineage: true # Extract column-level lineage information
include_statuses: true # Include connection job statuses
job_statuses_limit: 100 # Max number of job statuses to retrieve
# Lineage emission mode
incremental_lineage: true # Emit lineage as patch (incremental) rather than full replacement
# Set to false to re-state all lineage on each run
# Optional: Extract tags
extract_tags: false # Extract tags from Airbyte metadata
# Filtering options - uncomment to use
#workspace_pattern:
# allow:
# - ".*" # Pattern to filter workspaces
#connection_pattern:
# allow:
# - ".*" # Pattern to filter connections
#source_pattern:
# allow:
# - ".*MySQL.*" # Pattern to filter sources
#destination_pattern:
# allow:
# - ".*Postgres.*" # Pattern to filter destinations
# Platform instance configuration
platform_instance: airbyte-instance # Custom platform instance name
# Performance settings
request_timeout: 30 # Timeout for API requests in seconds
max_retries: 3 # Max retries for failed requests
retry_backoff_factor: 0.5 # Backoff factor for retries
page_size: 20 # Items per page in API requests
sink:
type: datahub-rest
config:
server: http://localhost:8080
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description |
|---|---|
api_key One of string(password), null | API key or Personal Access Token for authentication (OSS deployment) Default: None |
cloud_api_url string | Base URL for Airbyte Cloud API (defaults to production URL) Default: https://api.airbyte.com/v1 |
cloud_oauth_token_url string | OAuth token URL for Airbyte Cloud (defaults to production URL) Default: https://auth.airbyte.com/oauth/token |
cloud_workspace_id One of string, null | Workspace ID for Airbyte Cloud (required for cloud deployment) Default: None |
deployment_type Enum | One of: "oss", "cloud" |
extra_headers One of string, null | Additional HTTP headers to send with each request Default: None |
extract_column_level_lineage boolean | Extract column-level lineage Default: True |
extract_tags boolean | Extract tags from Airbyte metadata Default: False |
host_port One of string, null | Airbyte API host and port (e.g., http://localhost:8000) - required for OSS deployment Default: None |
include_statuses boolean | Whether to ingest run statuses Default: True |
incremental_lineage boolean | When enabled, emits lineage as incremental to existing lineage already in DataHub. When disabled, re-states lineage on each run. Default: False |
job_status_end_date One of string, null | End date for job status retrieval (format: yyyy-mm-ddTHH:MM:SSZ). Default is current time. Default: None |
job_status_start_date One of string, null | Start date for job status retrieval (format: yyyy-mm-ddTHH:MM:SSZ). Default is 7 days ago. Default: None |
job_statuses_limit integer | Maximum number of job statuses to retrieve per connection Default: 100 |
max_retries integer | Maximum number of retries for failed API requests Default: 3 |
oauth2_client_id One of string, null | OAuth2 client ID for OSS (Airbyte 1.0+) and Cloud deployments Default: None |
oauth2_client_secret One of string(password), null | OAuth2 client secret for OSS (Airbyte 1.0+) and Cloud deployments Default: None |
oauth2_refresh_token One of string(password), null | OAuth2 refresh token (Cloud only). If provided, uses refresh_token grant; otherwise uses client_credentials Default: None |
page_size integer | Number of items to fetch per page in API requests Default: 20 |
password One of string(password), null | Password for basic authentication (OSS deployment) Default: None |
platform_instance One of string, null | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None |
request_timeout integer | Timeout for API requests in seconds Default: 30 |
retry_backoff_factor number | Backoff factor for retries (wait time is {factor} * (2 ^ retry_number)) Default: 0.5 |
source_type_mapping map(str,string) | |
ssl_ca_cert One of string, null | Path to CA certificate file (.pem) for SSL verification Default: None |
username One of string, null | Username for basic authentication (OSS deployment) Default: None |
verify_ssl boolean | Whether to verify SSL certificates Default: True |
env string | The environment that all assets produced by this connector belong to Default: PROD |
connection_pattern AllowDenyPattern | A class to store allow deny regexes |
connection_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
destination_pattern AllowDenyPattern | A class to store allow deny regexes |
destination_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
destinations_to_platform_instance map(str,PlatformDetail) | Configuration for mapping a specific Airbyte source/destination to DataHub URNs. |
destinations_to_platform_instance. key.platformOne of string, null | Override the platform type detection (e.g., 'postgres', 'mysql') Default: None |
destinations_to_platform_instance. key.convert_urns_to_lowercaseboolean | Whether to convert dataset urns to lowercase. Recommended for case-insensitive platforms to ensure lineage compatibility. Note: For Snowflake destinations, this also lowercases column names in lineage to match DataHub's native Snowflake connector behavior. For other platforms (MSSQL, Postgres, BigQuery, etc.), only dataset names are lowercased, not column names. Default: True |
destinations_to_platform_instance. key.include_schema_in_urnOne of boolean, null | Include schema in the dataset URN when database is present. If None (default), automatically detects 2-tier vs 3-tier platforms by checking if schema equals database. Set to True to force 3-tier (database.schema.table), or False to force 2-tier (database.table). Default: None |
destinations_to_platform_instance. key.platform_instanceOne of string, null | The instance of the platform that all assets belong to Default: None |
destinations_to_platform_instance. key.envOne of string, null | Environment to use for dataset URNs (e.g., PROD, DEV, STAGING) Default: None |
source_pattern AllowDenyPattern | A class to store allow deny regexes |
source_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
sources_to_platform_instance map(str,PlatformDetail) | Configuration for mapping a specific Airbyte source/destination to DataHub URNs. |
sources_to_platform_instance. key.platformOne of string, null | Override the platform type detection (e.g., 'postgres', 'mysql') Default: None |
sources_to_platform_instance. key.convert_urns_to_lowercaseboolean | Whether to convert dataset urns to lowercase. Recommended for case-insensitive platforms to ensure lineage compatibility. Note: For Snowflake destinations, this also lowercases column names in lineage to match DataHub's native Snowflake connector behavior. For other platforms (MSSQL, Postgres, BigQuery, etc.), only dataset names are lowercased, not column names. Default: True |
sources_to_platform_instance. key.include_schema_in_urnOne of boolean, null | Include schema in the dataset URN when database is present. If None (default), automatically detects 2-tier vs 3-tier platforms by checking if schema equals database. Set to True to force 3-tier (database.schema.table), or False to force 2-tier (database.table). Default: None |
sources_to_platform_instance. key.platform_instanceOne of string, null | The instance of the platform that all assets belong to Default: None |
sources_to_platform_instance. key.envOne of string, null | Environment to use for dataset URNs (e.g., PROD, DEV, STAGING) Default: None |
workspace_pattern AllowDenyPattern | A class to store allow deny regexes |
workspace_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null | Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"AirbyteDeploymentType": {
"enum": [
"oss",
"cloud"
],
"title": "AirbyteDeploymentType",
"type": "string"
},
"AllowDenyPattern": {
"additionalProperties": false,
"description": "A class to store allow deny regexes",
"properties": {
"allow": {
"default": [
".*"
],
"description": "List of regex patterns to include in ingestion",
"items": {
"type": "string"
},
"title": "Allow",
"type": "array"
},
"deny": {
"default": [],
"description": "List of regex patterns to exclude from ingestion.",
"items": {
"type": "string"
},
"title": "Deny",
"type": "array"
},
"ignoreCase": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Whether to ignore case sensitivity during pattern matching.",
"title": "Ignorecase"
}
},
"title": "AllowDenyPattern",
"type": "object"
},
"PlatformDetail": {
"additionalProperties": false,
"description": "Configuration for mapping a specific Airbyte source/destination to DataHub URNs.",
"properties": {
"platform": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Override the platform type detection (e.g., 'postgres', 'mysql')",
"title": "Platform"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets belong to",
"title": "Platform Instance"
},
"env": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Environment to use for dataset URNs (e.g., PROD, DEV, STAGING)",
"title": "Env"
},
"include_schema_in_urn": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"description": "Include schema in the dataset URN when database is present. If None (default), automatically detects 2-tier vs 3-tier platforms by checking if schema equals database. Set to True to force 3-tier (database.schema.table), or False to force 2-tier (database.table).",
"title": "Include Schema In Urn"
},
"convert_urns_to_lowercase": {
"default": true,
"description": "Whether to convert dataset urns to lowercase. Recommended for case-insensitive platforms to ensure lineage compatibility. Note: For Snowflake destinations, this also lowercases column names in lineage to match DataHub's native Snowflake connector behavior. For other platforms (MSSQL, Postgres, BigQuery, etc.), only dataset names are lowercased, not column names.",
"title": "Convert Urns To Lowercase",
"type": "boolean"
}
},
"title": "PlatformDetail",
"type": "object"
},
"StatefulStaleMetadataRemovalConfig": {
"additionalProperties": false,
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
},
"remove_stale_metadata": {
"default": true,
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"title": "Remove Stale Metadata",
"type": "boolean"
},
"fail_safe_threshold": {
"default": 75.0,
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"maximum": 100.0,
"minimum": 0.0,
"title": "Fail Safe Threshold",
"type": "number"
}
},
"title": "StatefulStaleMetadataRemovalConfig",
"type": "object"
}
},
"additionalProperties": false,
"description": "Airbyte source configuration for metadata ingestion",
"properties": {
"incremental_lineage": {
"default": false,
"description": "When enabled, emits lineage as incremental to existing lineage already in DataHub. When disabled, re-states lineage on each run.",
"title": "Incremental Lineage",
"type": "boolean"
},
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
"title": "Platform Instance"
},
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
},
{
"type": "null"
}
],
"default": null
},
"deployment_type": {
"$ref": "#/$defs/AirbyteDeploymentType",
"default": "oss",
"description": "Type of Airbyte deployment ('oss' or 'cloud')"
},
"host_port": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Airbyte API host and port (e.g., http://localhost:8000) - required for OSS deployment",
"title": "Host Port"
},
"username": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Username for basic authentication (OSS deployment)",
"title": "Username"
},
"password": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Password for basic authentication (OSS deployment)",
"title": "Password"
},
"api_key": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "API key or Personal Access Token for authentication (OSS deployment)",
"title": "Api Key"
},
"oauth2_client_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "OAuth2 client ID for OSS (Airbyte 1.0+) and Cloud deployments",
"title": "Oauth2 Client Id"
},
"oauth2_client_secret": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "OAuth2 client secret for OSS (Airbyte 1.0+) and Cloud deployments",
"title": "Oauth2 Client Secret"
},
"oauth2_refresh_token": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "OAuth2 refresh token (Cloud only). If provided, uses refresh_token grant; otherwise uses client_credentials",
"title": "Oauth2 Refresh Token"
},
"verify_ssl": {
"default": true,
"description": "Whether to verify SSL certificates",
"title": "Verify Ssl",
"type": "boolean"
},
"ssl_ca_cert": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to CA certificate file (.pem) for SSL verification",
"title": "Ssl Ca Cert"
},
"extra_headers": {
"anyOf": [
{
"additionalProperties": {
"type": "string"
},
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "Additional HTTP headers to send with each request",
"title": "Extra Headers"
},
"request_timeout": {
"default": 30,
"description": "Timeout for API requests in seconds",
"title": "Request Timeout",
"type": "integer"
},
"max_retries": {
"default": 3,
"description": "Maximum number of retries for failed API requests",
"title": "Max Retries",
"type": "integer"
},
"retry_backoff_factor": {
"default": 0.5,
"description": "Backoff factor for retries (wait time is {factor} * (2 ^ retry_number))",
"title": "Retry Backoff Factor",
"type": "number"
},
"page_size": {
"default": 20,
"description": "Number of items to fetch per page in API requests",
"title": "Page Size",
"type": "integer"
},
"cloud_workspace_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Workspace ID for Airbyte Cloud (required for cloud deployment)",
"title": "Cloud Workspace Id"
},
"cloud_api_url": {
"default": "https://api.airbyte.com/v1",
"description": "Base URL for Airbyte Cloud API (defaults to production URL)",
"title": "Cloud Api Url",
"type": "string"
},
"cloud_oauth_token_url": {
"default": "https://auth.airbyte.com/oauth/token",
"description": "OAuth token URL for Airbyte Cloud (defaults to production URL)",
"title": "Cloud Oauth Token Url",
"type": "string"
},
"extract_column_level_lineage": {
"default": true,
"description": "Extract column-level lineage",
"title": "Extract Column Level Lineage",
"type": "boolean"
},
"workspace_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter workspaces. Use the pattern format as in other DataHub sources."
},
"connection_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter connections. Use the pattern format as in other DataHub sources."
},
"source_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter sources. Use the pattern format as in other DataHub sources."
},
"destination_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns to filter destinations. Use the pattern format as in other DataHub sources."
},
"source_type_mapping": {
"additionalProperties": {
"type": "string"
},
"description": "Mapping from Airbyte sourceType/destinationType to DataHub platform names. Use this to normalize Airbyte's source types to DataHub platform names. Example: {'PostgreSQL': 'postgres', 'MySQL': 'mysql'}. If not specified, the sourceType/destinationType from Airbyte is sanitized and used directly.",
"title": "Source Type Mapping",
"type": "object"
},
"sources_to_platform_instance": {
"additionalProperties": {
"$ref": "#/$defs/PlatformDetail"
},
"description": "A mapping from Airbyte source ID to its platform/instance/env/database details. Use this to override platform details for specific sources. Example: {'11111111-1111-1111-1111-111111111111': {'platform': 'postgres', 'platform_instance': 'prod-postgres', 'env': 'PROD'}}",
"title": "Sources To Platform Instance",
"type": "object"
},
"destinations_to_platform_instance": {
"additionalProperties": {
"$ref": "#/$defs/PlatformDetail"
},
"description": "A mapping from Airbyte destination ID to its platform/instance/env/database details. Use this to override platform details for specific destinations.",
"title": "Destinations To Platform Instance",
"type": "object"
},
"include_statuses": {
"default": true,
"description": "Whether to ingest run statuses",
"title": "Include Statuses",
"type": "boolean"
},
"job_status_start_date": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Start date for job status retrieval (format: yyyy-mm-ddTHH:MM:SSZ). Default is 7 days ago.",
"title": "Job Status Start Date"
},
"job_status_end_date": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "End date for job status retrieval (format: yyyy-mm-ddTHH:MM:SSZ). Default is current time.",
"title": "Job Status End Date"
},
"job_statuses_limit": {
"default": 100,
"description": "Maximum number of job statuses to retrieve per connection",
"title": "Job Statuses Limit",
"type": "integer"
},
"extract_tags": {
"default": false,
"description": "Extract tags from Airbyte metadata",
"title": "Extract Tags",
"type": "boolean"
}
},
"title": "AirbyteSourceConfig",
"type": "object"
}
Capabilities
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
Lineage
Column-level lineage is extracted from Airbyte's sync catalog when field mapping information is available in the connection configuration. Table-level lineage is always captured between source and destination datasets.
Job History
Connection job execution history is ingested as DataProcessInstance entities, capturing run status, start time, and duration for each sync job.
Limitations
Module behavior is constrained by source APIs, permissions, and metadata exposed by Airbyte.
- Schema information is only available for sources that expose a sync catalog. Sources without schema discovery will produce datasets without schema metadata.
- Column-level lineage requires field mapping to be configured in the Airbyte connection.
- Job history depth is limited by the Airbyte API's pagination and retention settings.
- The Airbyte Public API only supports
limit+offsetpagination on list endpoints; cursor pagination is not exposed. Ingestion runs against an actively-mutating Airbyte instance may therefore skip or double-count entries inserted or deleted mid-scan. Schedule ingestion during quiet periods if exactness is required.
Troubleshooting
If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.
Authentication Errors
Verify that your OAuth2 client credentials are correct and have not expired. For OSS deployments, confirm the API is reachable at the /api/public/v1 path prefix.
Missing Schema Metadata
If datasets are ingested without schema information, confirm that the Airbyte source supports schema discovery and that the sync catalog is populated in the connection settings.
Code Coordinates
- Class Name:
datahub.ingestion.source.airbyte.source.AirbyteSource - Browse on GitHub
If you've got any questions on configuring ingestion for Airbyte, feel free to ping us on our Slack.
This page is auto-generated from the underlying source code. To make changes, please edit the relevant source files in the metadata-ingestion directory.
Tip: For quick typo fixes or documentation updates, you can click the ✏️ Edit icon directly in the GitHub UI to open a Pull Request. For larger changes and PR naming conventions, please refer to our Contributing Guide.