Confluence

Important Capabilities

Capability	Status	Notes
Detect Deleted Entities	✅	Enabled by default.
Platform Instance	✅	Enabled by default.
Test Connection	✅	Enabled by default.

Overview

The Confluence source ingests pages and spaces from Confluence workspaces (Cloud or Data Center) as DataHub Document entities with optional semantic embeddings for semantic search.

Key Features

1. Content Extraction

Page Content: Full text extraction from Confluence pages including all content types
Space Discovery: Automatic discovery of all pages within specified spaces
Hierarchical Structure: Maintains parent-child relationships between pages
Metadata Extraction: Captures creation/modification timestamps, authors, labels, and custom properties

2. Hierarchical Relationships

Parent-Child Links: Preserves Confluence page hierarchy in DataHub
Recursive Discovery: Recursively discovers nested pages starting from root pages or entire spaces
Space Organization: Maintains space context as custom properties
Flexible Navigation: Browse documentation structure in DataHub UI

3. Embedding Generation

Optional semantic search support with sensible defaults:

Supported providers: Cohere (API key), AWS Bedrock (IAM roles)
Automatic chunking: Documents are automatically chunked for optimal embedding generation
Automatic deduplication: Prevents duplicate chunk embeddings

See Semantic Search Configuration for detailed setup and advanced options.

4. Stateful Ingestion

Supports smart incremental updates via stateful ingestion:

Content Change Detection: Only reprocesses documents when content or embeddings config changes
Deletion Detection: Automatically removes stale entities from DataHub
Flexible Discovery: Ingest entire spaces, specific pages, or page trees
State Persistence: Maintains processing state between runs to skip unchanged documents

Prerequisites

1. Confluence API Access

For Confluence Cloud

Create an API token:

Go to https://id.atlassian.com/manage-profile/security/api-tokens
Click "Create API token"
Give it a name (e.g., "DataHub Integration")
Copy the token (you won't be able to see it again)

You'll need:

Base URL: Your Confluence Cloud URL (e.g., https://your-domain.atlassian.net/wiki)
Username: Your Atlassian account email
API Token: The token you just created

For Confluence Data Center / Server

Create a Personal Access Token:

Go to your Confluence → Profile → Personal Access Tokens
Click "Create token"
Give it a name and set expiration
Copy the token

You'll need:

Base URL: Your Confluence server URL (e.g., https://confluence.company.com)
Personal Access Token: The token you created

Note: For Data Center, you can also use username/password, but Personal Access Tokens are recommended.

2. Required Permissions

The API credentials must have:

Read access to all spaces and pages you want to ingest
For Cloud: User must be added to spaces or have site-wide read access
For Data Center: User must have "View" permissions on spaces

3. Embedding Provider (Optional)

If you want semantic search capabilities, configure an embedding provider in your DataHub instance.

Supported providers include Cohere (API key) and AWS Bedrock (IAM roles). The connector will use sensible defaults for chunking and embedding configuration.

See Semantic Search Configuration for detailed provider setup and configuration options.

Common Use Cases

1. Auto-Discover All Spaces (Default)

By default, the connector discovers and ingests all accessible spaces:

source:
  type: confluence
  config:
    # Confluence Cloud
    cloud: true
    url: "https://your-domain.atlassian.net/wiki"
    username: "user@company.com"
    api_token: "${CONFLUENCE_API_TOKEN}"

    # No filtering - discovers all accessible spaces
    # Optional: limit number of spaces for large instances
    max_spaces: 100

2. Include Specific Spaces

Ingest only specific Confluence spaces:

source:
  type: confluence
  config:
    cloud: true
    url: "https://your-domain.atlassian.net/wiki"
    username: "user@company.com"
    api_token: "${CONFLUENCE_API_TOKEN}"

    # Include only these spaces
    spaces:
      allow:
        - "ENGINEERING"
        - "PRODUCT"
        - "DESIGN"

3. Exclude Personal and Archive Spaces

Ingest all spaces except specific ones:

source:
  type: confluence
  config:
    cloud: true
    url: "https://your-domain.atlassian.net/wiki"
    username: "user@company.com"
    api_token: "${CONFLUENCE_API_TOKEN}"

    # Exclude personal spaces and archived content
    spaces:
      deny:
        - "~john.doe"
        - "~jane.smith"
        - "ARCHIVE"
        - "OLD_DOCS"

4. Specific Page Trees Only

Ingest specific pages and their descendants:

source:
  type: confluence
  config:
    cloud: true
    url: "https://your-domain.atlassian.net/wiki"
    username: "user@company.com"
    api_token: "${CONFLUENCE_API_TOKEN}"

    # Start from specific pages
    pages:
      allow:
        - "123456789" # API Documentation page tree
        - "987654321" # User Guides page tree
    recursive: true # Include all child pages

5. Combined Space and Page Filtering

Combine space and page filters for fine-grained control:

source:
  type: confluence
  config:
    cloud: true
    url: "https://your-domain.atlassian.net/wiki"
    username: "user@company.com"
    api_token: "${CONFLUENCE_API_TOKEN}"

    # Include specific spaces
    spaces:
      allow:
        - "ENGINEERING"
        - "PRODUCT"
      # Exclude personal spaces even if in allow list
      deny:
        - "~admin"

    # Exclude specific pages (e.g., drafts, archived content)
    pages:
      deny:
        - "999999" # Draft page
        - "888888" # Archived page

6. Data Center / Server Setup

Connect to Confluence Data Center or Server:

source:
  type: confluence
  config:
    # Data Center / Server
    cloud: false
    url: "https://confluence.company.com"
    personal_access_token: "${CONFLUENCE_PAT}"

    spaces:
      allow:
        - "WIKI"
        - "DOCS"

7. Production Setup with Stateful Ingestion

Enterprise setup with incremental updates:

source:
  type: confluence
  config:
    cloud: true
    url: "https://your-domain.atlassian.net/wiki"
    username: "user@company.com"
    api_token: "${CONFLUENCE_API_TOKEN}"

    spaces:
      allow:
        - "COMPANY"
        - "PUBLIC"

    # Enable stateful ingestion for incremental updates
    stateful_ingestion:
      enabled: true

Note: Embedding configuration is managed by your DataHub instance. See Semantic Search Configuration for setup.

8. Using URLs for Allow/Deny

You can specify spaces and pages using full URLs for both allow and deny lists:

source:
  type: confluence
  config:
    cloud: true
    url: "https://your-domain.atlassian.net/wiki"
    username: "user@company.com"
    api_token: "${CONFLUENCE_API_TOKEN}"

    # Use full URLs - connector extracts keys/IDs automatically
    spaces:
      allow:
        - "https://your-domain.atlassian.net/wiki/spaces/ENG"
        - "https://your-domain.atlassian.net/wiki/spaces/PRODUCT"
      deny:
        - "https://your-domain.atlassian.net/wiki/spaces/ARCHIVE"
        - "~john.doe" # Can mix URLs and keys

    pages:
      allow:
        - "https://your-domain.atlassian.net/wiki/spaces/ENG/pages/123456/Getting+Started"
      deny:
        - "https://your-domain.atlassian.net/wiki/spaces/ENG/pages/999999/Draft"

Filtering Content

The connector provides flexible filtering options through allow and deny lists for both spaces and pages.

Space Filtering

Control which Confluence spaces are ingested:

spaces.allow: Include only specific spaces (by default, all accessible spaces are discovered)

spaces:
  allow:
    - "ENGINEERING" # Space key
    - "PRODUCT"
    - "https://your-domain.atlassian.net/wiki/spaces/DESIGN" # Or full URL

spaces.deny: Exclude specific spaces (applied after spaces.allow)

spaces:
  deny:
    - "~john.doe" # Personal space
    - "ARCHIVE" # Archived content
    - "TEST" # Test space

Page Filtering

Control which pages are ingested:

pages.allow: Include only specific pages (triggers page-based mode, bypasses space discovery)

pages:
  allow:
    - "123456789" # Page ID
    - "987654321"
    - "https://your-domain.atlassian.net/wiki/spaces/ENG/pages/111111/API+Docs" # Or full URL
recursive: true # Include child pages

pages.deny: Exclude specific pages (works in both space-based and page-based modes)

pages:
  deny:
    - "999999" # Draft page
    - "888888" # Archived page

Filtering Rules

Precedence:

Deny lists always take precedence over allow lists
If a space/page is in both allow and deny lists, it will be excluded

Modes:

Space-based mode (default): Discovers spaces, then ingests all pages within allowed spaces
Page-based mode: When page_allow is specified, bypasses space discovery and fetches specific page trees

Format Support:

Space keys: "ENGINEERING", "~username" (for personal spaces)
Page IDs: "123456789" (numeric string)
Full URLs: Both space URLs and page URLs are automatically parsed

Common Filtering Patterns

Exclude all personal spaces:

spaces:
  deny:
    - "~*" # Note: Use explicit user IDs, wildcard not supported
    # Instead, list specific personal spaces:
    - "~john.doe"
    - "~jane.smith"

Ingest only documentation spaces:

spaces:
  allow:
    - "DOCS"
    - "API_DOCS"
    - "USER_GUIDES"

Focus on specific documentation trees:

pages:
  allow:
    - "123456" # API Documentation root page
    - "789012" # User Guides root page
recursive: true

Exclude drafts and WIP pages:

pages:
  deny:
    - "999999" # Draft page ID
    - "888888" # WIP page ID

How It Works

Processing Pipeline

Discovery: Confluence API discovers spaces and pages
Download: Downloads page content via Confluence REST API
Extraction: Extracts text, metadata, and hierarchy from pages
Chunking: Splits documents into semantic chunks (if embeddings enabled)
Embedding: Generates vector embeddings for each chunk (if embeddings enabled)
Emission: Emits Document entities with SemanticContent aspects to DataHub

URL Format Support

The connector supports multiple input formats for spaces and pages in allow/deny lists:

Space Identifiers:

Space key: "ENGINEERING", "~username" (for personal spaces)
Full URL: "https://your-domain.atlassian.net/wiki/spaces/ENGINEERING"

Page Identifiers:

Page ID: "123456789" (numeric string)
Full URL (Cloud): "https://your-domain.atlassian.net/wiki/spaces/ENG/pages/123456/Page+Title"
Full URL (Data Center): "https://confluence.company.com/pages/viewpage.action?pageId=123456"

The connector automatically extracts space keys and page IDs from URLs, so you can use either format interchangeably in space_allow, space_deny, page_allow, and page_deny lists.

Stateful Ingestion Details

The source uses content-based change detection:

Calculates SHA-256 hash of document content + embedding configuration
Compares hash with previous run to detect changes
Only reprocesses documents when hash changes
Tracks all emitted URNs to detect deletions

This means:

First run: Processes all documents
Subsequent runs: Only processes new/changed documents
Deleted pages: Automatically soft-deleted from DataHub

Limitations and Considerations

Confluence API Limits

Rate Limits: Confluence enforces rate limits (Cloud: varies by plan, Data Center: configurable)
Content Types: Complex macros may not extract perfectly (e.g., embedded content, custom macros)
Attachments: File attachments are not ingested (only page content)

Performance Considerations

Large Spaces: First run may take significant time for large spaces (1000+ pages)
Embedding Generation: Adds processing time proportional to content volume
API Costs: Embedding providers may incur costs based on usage

Content Extraction

Supported Content: Text, headings, lists, code blocks, tables, panels
Limited Support: Some macros extract as text/links
Not Supported: Attachments, complex custom macros, embedded Jira issues (content only)

Troubleshooting

Common Issues

"401 Unauthorized" or "Authentication failed" errors:

Cloud: Verify username (email) and api_token are correct
Data Center: Verify personal_access_token is valid and not expired
Check that cloud: true/false matches your Confluence type
Ensure the URL includes /wiki suffix for Cloud (e.g., https://domain.atlassian.net/wiki)

"403 Forbidden" or "Space not found" errors:

Verify the user has read access to the specified spaces
Check that space keys are correct (case-sensitive)
For Cloud, ensure user is added to private spaces
For Data Center, verify "View Space" permissions

Empty or missing content:

Verify pages contain text (empty pages are skipped by default with skip_empty_documents: true)
Check min_text_length filter setting (default: 50 characters)
Ensure recursive: true if expecting child pages
Check that pages are not restricted or have special permissions

Slow ingestion:

Increase processing.parallelism.num_processes (default: 2)
Consider filtering specific spaces instead of all spaces
First run is always slower - subsequent runs use incremental updates
Large spaces with 1000+ pages may take several minutes

Embedding generation failures:

Verify provider API key is correct
Check provider-specific rate limits (Cohere: 10k requests/min)
Ensure embedding model name is valid for your provider
For Bedrock: verify IAM permissions and model access is enabled in AWS Console

Stateful ingestion not working:

Ensure stateful_ingestion.enabled: true in config
Check DataHub connection (source needs to query previous state)
Verify state file path is writable (if using file-based state)
Look for state persistence logs in ingestion output

Missing hierarchy/parent relationships:

Verify hierarchy.enabled: true (default)
Check that parent pages are being ingested
Ensure recursive: true to discover parent-child relationships
Parent pages must be accessible to the API credentials

Page IDs not working:

For Cloud, use the numeric page ID from the URL (after /pages/)
For Data Center, page IDs may differ - use the ID from the page URL or query param ?pageId=
Alternatively, use full page URLs instead of IDs in page_allow or page_deny

How to find space keys and page IDs:

Space key: Visible in the space URL: https://domain.atlassian.net/wiki/spaces/ENGINEERING → key is ENGINEERING
Page ID (Cloud): In the page URL after /pages/: https://domain.atlassian.net/wiki/spaces/ENG/pages/123456/Title → ID is 123456
Page ID (Data Center): In the URL query parameter: https://confluence.company.com/pages/viewpage.action?pageId=123456 → ID is 123456
Personal space key: Format is ~username (e.g., ~john.doe for user john.doe)

Performance Tuning

Parallelism Settings

processing:
  parallelism:
    num_processes: 4 # Increase for faster processing (default: 2)
    max_connections: 20 # Concurrent API connections (default: 10)

Guidelines:

Small spaces (<100 pages): num_processes: 2
Medium spaces (100-500 pages): num_processes: 4
Large spaces (>500 pages): num_processes: 8

Filtering

filtering:
  min_text_length: 100 # Skip short pages (default: 50)
  skip_empty_documents: true # Skip empty pages (default: true)

Space Selection

Instead of ingesting all spaces, select specific ones:

spaces:
  allow:
    - "ENGINEERING" # High-value documentation space
    - "PRODUCT" # Product requirements space
  deny:
    - "~*" # Exclude personal spaces (list specific users)
    - "ARCHIVE" # Exclude archived content
    - "TEST" # Exclude test spaces

CLI based Ingestion

Config Details

Options
Schema

Note that a . is used to denote nested fields in the YAML recipe.

Field	Description
url ✅ string	Base URL of your Confluence instance. Examples: 'https://your-domain.atlassian.net/wiki' (Cloud) or 'https://confluence.your-company.com' (Data Center)
api_token One of string(password), null	API token for Confluence Cloud authentication. Generate at: https://id.atlassian.com/manage-profile/security/api-tokens Default: None
cloud boolean	Whether this is a Confluence Cloud instance (True) or Data Center/Server (False). Default: True
max_pages_per_space integer	Maximum number of pages to ingest per space. Default: 1000
max_spaces integer	Maximum number of spaces to ingest when auto-discovering (applies when urls is not set). Default: 100
personal_access_token One of string(password), null	Personal Access Token for Confluence Data Center authentication. Generate from: User Profile > Settings > Personal Access Tokens Default: None
platform_instance One of string, null	Optional human-readable identifier for this Confluence instance (e.g., 'mycompany-prod', 'team-a-confluence'). If not provided, automatically generated by hashing the base URL, which guarantees global uniqueness across all Confluence installations (both Cloud and Data Center). Use explicit values for more readable URNs, but auto-generated hashes are perfectly fine and require no manual configuration. Default: None
recursive boolean	Whether to recursively fetch child pages (applies to page URLs only). Default: True
username One of string, null	Username for Confluence Cloud authentication (required for Cloud). Default: None
advanced AdvancedConfig	Advanced configuration options.
advanced.continue_on_failure boolean	Default: True
advanced.max_errors integer	Default: 10
advanced.output_format Enum	One of: "json", "xml" Default: json
advanced.preserve_outputs boolean	Default: False
advanced.raise_on_error boolean	Default: False
advanced.work_dir string	Default: /tmp/unstructured_datahub
advanced.cache CacheConfig	Cache configuration.
advanced.cache.cache_dir string	Default: ~/.cache/unstructured_datahub
advanced.cache.enabled boolean	Default: True
advanced.cache.ttl integer	Cache TTL in seconds Default: 86400
advanced.retry RetryConfig	Retry configuration.
advanced.retry.backoff_factor integer	Default: 2
advanced.retry.enabled boolean	Default: True
advanced.retry.max_attempts integer	Default: 3
advanced.retry.retry_on_timeout boolean	Default: True
chunking ChunkingConfig	Chunking strategy configuration.
chunking.combine_text_under_n_chars integer	Combine chunks smaller than this size Default: 100
chunking.max_characters integer	Maximum characters per chunk Default: 500
chunking.overlap integer	Character overlap between chunks Default: 0
chunking.strategy Enum	One of: "basic", "by_title" Default: by_title
document_mapping DocumentMappingConfig	Document entity mapping configuration.
document_mapping.id_pattern string	Pattern for generating document IDs Default: {source_type}-{directory}-{basename}
document_mapping.status Enum	One of: "PUBLISHED", "UNPUBLISHED" Default: PUBLISHED
document_mapping.id_normalization IdNormalizationConfig	Document ID normalization rules.
document_mapping.id_normalization.lowercase boolean	Convert to lowercase Default: True
document_mapping.id_normalization.max_length integer	Maximum ID length Default: 200
document_mapping.id_normalization.remove_special_chars boolean	Remove special characters except _ and - Default: True
document_mapping.id_normalization.replace_spaces_with string	Replace spaces with this character Default: -
document_mapping.source SourceConfig	Document source configuration.
document_mapping.source.include_external_id boolean	Include external ID in DocumentSource Default: True
document_mapping.source.include_external_url boolean	Include external URL in DocumentSource Default: True
document_mapping.source.type Enum	One of: "NATIVE", "EXTERNAL" Default: EXTERNAL
document_mapping.title TitleExtractionConfig	Title extraction configuration.
document_mapping.title.extract_from_content boolean	Try to extract title from document content Default: True
document_mapping.title.fallback_to_filename boolean	Use filename as title if not found in content Default: True
document_mapping.title.max_length integer	Maximum title length Default: 500
embedding EmbeddingConfig	Embedding generation configuration. Default behavior: Fetches configuration from DataHub server automatically. Override behavior: Validates local config against server when explicitly set.
embedding.allow_local_embedding_config boolean	BREAK-GLASS: Allow local config without server validation. NOT RECOMMENDED - may break semantic search. Default: False
embedding.api_key One of string, null	API key for Cohere (not needed for Bedrock with IAM roles) Default: None
embedding.aws_region One of string, null	AWS region for Bedrock. If not set, loads from server. Default: None
embedding.batch_size integer	Batch size for embedding API calls Default: 25
embedding.input_type One of string, null	Input type for Cohere embeddings Default: search_document
embedding.model One of string, null	Model name. If not set, loads from server. Default: None
embedding.model_embedding_key One of string, null	Storage key for embeddings (e.g., 'cohere_embed_v3'). Required if overriding server config. If not set, loads from server. Default: None
embedding.provider One of Enum, null	Embedding provider (bedrock uses AWS, cohere/openai use API key). If not set, loads from server. Default: None
filtering FilteringConfig	File filtering configuration.
filtering.max_file_size One of integer, null	Maximum file size in bytes Default: None
filtering.min_file_size One of integer, null	Minimum file size in bytes Default: None
filtering.min_text_length integer	Minimum text length in characters Default: 50
filtering.modified_after One of string, null	Only files modified after this date (ISO format) Default: None
filtering.modified_before One of string, null	Only files modified before this date (ISO format) Default: None
filtering.skip_empty_documents boolean	Skip documents with no text content Default: True
filtering.exclude_patterns array	Glob patterns to exclude
filtering.exclude_patterns.string string
filtering.include_patterns array	Glob patterns to include
filtering.include_patterns.string string
hierarchy HierarchyConfig	Hierarchy configuration.
hierarchy.enabled boolean	Enable parent-child relationships Default: True
hierarchy.parent_strategy Enum	One of: "folder", "none", "custom", "notion", "confluence" Default: folder
hierarchy.custom_mapping One of CustomMappingConfig, null	Custom mapping configuration Default: None
hierarchy.custom_mapping.rules array	Custom parent mapping rules
hierarchy.custom_mapping.rules.CustomParentRule CustomParentRule	Custom parent mapping rule.
hierarchy.custom_mapping.rules.CustomParentRule.parent_id ❓ string	Parent document ID for matching files
hierarchy.custom_mapping.rules.CustomParentRule.pattern ❓ string	Glob pattern to match file paths
hierarchy.folder_mapping FolderMappingConfig	Folder hierarchy mapping configuration.
hierarchy.folder_mapping.create_parent_docs boolean	Create Document entities for folders Default: True
hierarchy.folder_mapping.max_depth integer	Maximum hierarchy depth Default: 10
hierarchy.folder_mapping.parent_id_pattern string	Pattern for parent document IDs Default: {source_type}-{directory}
hierarchy.folder_mapping.root_parent One of string, null	Optional root document URN Default: None
pages PageFilterConfig	Configuration for filtering Confluence pages.
pages.allow One of array, null	List of specific Confluence pages to include in ingestion. By default, all pages in discovered spaces are included. Specify page IDs or URLs to limit ingestion to specific pages and their children. Examples: - Page IDs: ['123456', '789012'] - Page URLs: ['https://domain.atlassian.net/wiki/spaces/ENG/pages/123456/API-Docs'] When specified, only these page trees will be ingested (if recursive=true). This allows focusing on specific documentation sections. Default: None
pages.allow.string string
pages.deny One of array, null	List of specific Confluence pages to exclude from ingestion. Applies after allow filtering. Examples: - Exclude specific pages: ['123456', '789012'] - Page URLs: ['https://domain.atlassian.net/wiki/spaces/ENG/pages/999999/Draft'] Useful for excluding specific pages within otherwise included spaces. Default: None
pages.deny.string string
processing ProcessingConfig	Processing configuration (partitioning only, no chunking).
processing.parallelism ParallelismConfig	Parallelism configuration.
processing.parallelism.disable_parallelism boolean	Disable all parallelism Default: False
processing.parallelism.max_connections integer	Max concurrent connections for async operations Default: 10
processing.parallelism.num_processes integer	Number of worker processes Default: 2
processing.partition PartitionConfig	Unstructured partitioning configuration.
processing.partition.additional_args object	Additional partition arguments
processing.partition.api_key One of string, null	Unstructured API key Default: None
processing.partition.partition_by_api boolean	Use Unstructured API for partitioning Default: False
processing.partition.split_pdf_concurrency_level integer	Number of parallel requests for PDF pages Default: 5
processing.partition.split_pdf_page boolean	Enable page-level splitting for large PDFs Default: False
processing.partition.strategy Enum	One of: "auto", "hi_res", "fast", "ocr_only" Default: auto
processing.partition.ocr_languages array	Languages for OCR Default: ['eng']
processing.partition.ocr_languages.string string
spaces SpaceFilterConfig	Configuration for filtering Confluence spaces.
spaces.allow One of array, null	List of Confluence spaces to include in ingestion. By default, all accessible spaces are discovered. Specify space keys or URLs to limit ingestion to specific spaces. Examples: - Space keys: ['ENGINEERING', 'PRODUCT', 'DESIGN'] - Space URLs: ['https://domain.atlassian.net/wiki/spaces/TEAM'] - Mixed: ['ENGINEERING', 'https://domain.atlassian.net/wiki/spaces/PRODUCT'] If specified, only these spaces will be ingested. Use deny to exclude specific spaces from discovery. Default: None
spaces.allow.string string
spaces.deny One of array, null	List of Confluence spaces to exclude from ingestion. Applies after allow filtering. Examples: - Exclude personal spaces: ['~user1', '~user2'] - Exclude specific spaces: ['ARCHIVE', 'OLD_DOCS'] - Space URLs: ['https://domain.atlassian.net/wiki/spaces/TEST'] Useful for excluding personal spaces or archived content. Default: None
spaces.deny.string string
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null	Stateful Ingestion Config Default: None
stateful_ingestion.enabled boolean	Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False Default: False
stateful_ingestion.fail_safe_threshold number	Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0
stateful_ingestion.remove_stale_metadata boolean	Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True

The JSONSchema for this configuration is inlined below.

{
  "$defs": {
    "AdvancedConfig": {
      "additionalProperties": false,
      "description": "Advanced configuration options.",
      "properties": {
        "work_dir": {
          "default": "/tmp/unstructured_datahub",
          "title": "Work Dir",
          "type": "string"
        },
        "preserve_outputs": {
          "default": false,
          "title": "Preserve Outputs",
          "type": "boolean"
        },
        "output_format": {
          "default": "json",
          "enum": [
            "json",
            "xml"
          ],
          "title": "Output Format",
          "type": "string"
        },
        "raise_on_error": {
          "default": false,
          "title": "Raise On Error",
          "type": "boolean"
        },
        "max_errors": {
          "default": 10,
          "title": "Max Errors",
          "type": "integer"
        },
        "continue_on_failure": {
          "default": true,
          "title": "Continue On Failure",
          "type": "boolean"
        },
        "retry": {
          "$ref": "#/$defs/RetryConfig"
        },
        "cache": {
          "$ref": "#/$defs/CacheConfig"
        }
      },
      "title": "AdvancedConfig",
      "type": "object"
    },
    "CacheConfig": {
      "additionalProperties": false,
      "description": "Cache configuration.",
      "properties": {
        "enabled": {
          "default": true,
          "title": "Enabled",
          "type": "boolean"
        },
        "cache_dir": {
          "default": "~/.cache/unstructured_datahub",
          "title": "Cache Dir",
          "type": "string"
        },
        "ttl": {
          "default": 86400,
          "description": "Cache TTL in seconds",
          "title": "Ttl",
          "type": "integer"
        }
      },
      "title": "CacheConfig",
      "type": "object"
    },
    "ChunkingConfig": {
      "additionalProperties": false,
      "description": "Chunking strategy configuration.",
      "properties": {
        "strategy": {
          "default": "by_title",
          "description": "Chunking strategy to use",
          "enum": [
            "basic",
            "by_title"
          ],
          "title": "Strategy",
          "type": "string"
        },
        "max_characters": {
          "default": 500,
          "description": "Maximum characters per chunk",
          "title": "Max Characters",
          "type": "integer"
        },
        "overlap": {
          "default": 0,
          "description": "Character overlap between chunks",
          "title": "Overlap",
          "type": "integer"
        },
        "combine_text_under_n_chars": {
          "default": 100,
          "description": "Combine chunks smaller than this size",
          "title": "Combine Text Under N Chars",
          "type": "integer"
        }
      },
      "title": "ChunkingConfig",
      "type": "object"
    },
    "CustomMappingConfig": {
      "additionalProperties": false,
      "description": "Custom parent mapping configuration.",
      "properties": {
        "rules": {
          "description": "Custom parent mapping rules",
          "items": {
            "$ref": "#/$defs/CustomParentRule"
          },
          "title": "Rules",
          "type": "array"
        }
      },
      "title": "CustomMappingConfig",
      "type": "object"
    },
    "CustomParentRule": {
      "additionalProperties": false,
      "description": "Custom parent mapping rule.",
      "properties": {
        "pattern": {
          "description": "Glob pattern to match file paths",
          "title": "Pattern",
          "type": "string"
        },
        "parent_id": {
          "description": "Parent document ID for matching files",
          "title": "Parent Id",
          "type": "string"
        }
      },
      "required": [
        "pattern",
        "parent_id"
      ],
      "title": "CustomParentRule",
      "type": "object"
    },
    "DocumentMappingConfig": {
      "additionalProperties": false,
      "description": "Document entity mapping configuration.",
      "properties": {
        "id_pattern": {
          "default": "{source_type}-{directory}-{basename}",
          "description": "Pattern for generating document IDs",
          "title": "Id Pattern",
          "type": "string"
        },
        "id_normalization": {
          "$ref": "#/$defs/IdNormalizationConfig",
          "description": "ID normalization rules"
        },
        "title": {
          "$ref": "#/$defs/TitleExtractionConfig",
          "description": "Title extraction configuration"
        },
        "source": {
          "$ref": "#/$defs/SourceConfig",
          "description": "Source configuration"
        },
        "status": {
          "default": "PUBLISHED",
          "description": "Default publication status",
          "enum": [
            "PUBLISHED",
            "UNPUBLISHED"
          ],
          "title": "Status",
          "type": "string"
        }
      },
      "title": "DocumentMappingConfig",
      "type": "object"
    },
    "EmbeddingConfig": {
      "additionalProperties": false,
      "description": "Embedding generation configuration.\n\nDefault behavior: Fetches configuration from DataHub server automatically.\nOverride behavior: Validates local config against server when explicitly set.",
      "properties": {
        "provider": {
          "anyOf": [
            {
              "enum": [
                "bedrock",
                "cohere",
                "openai"
              ],
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Embedding provider (bedrock uses AWS, cohere/openai use API key). If not set, loads from server.",
          "title": "Provider"
        },
        "model": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Model name. If not set, loads from server.",
          "title": "Model"
        },
        "model_embedding_key": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Storage key for embeddings (e.g., 'cohere_embed_v3'). Required if overriding server config. If not set, loads from server.",
          "title": "Model Embedding Key"
        },
        "aws_region": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "AWS region for Bedrock. If not set, loads from server.",
          "title": "Aws Region"
        },
        "api_key": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "API key for Cohere (not needed for Bedrock with IAM roles)",
          "title": "Api Key"
        },
        "batch_size": {
          "default": 25,
          "description": "Batch size for embedding API calls",
          "title": "Batch Size",
          "type": "integer"
        },
        "input_type": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": "search_document",
          "description": "Input type for Cohere embeddings",
          "title": "Input Type"
        },
        "allow_local_embedding_config": {
          "default": false,
          "description": "BREAK-GLASS: Allow local config without server validation. NOT RECOMMENDED - may break semantic search.",
          "title": "Allow Local Embedding Config",
          "type": "boolean"
        }
      },
      "title": "EmbeddingConfig",
      "type": "object"
    },
    "FilteringConfig": {
      "additionalProperties": false,
      "description": "File filtering configuration.",
      "properties": {
        "include_patterns": {
          "description": "Glob patterns to include",
          "items": {
            "type": "string"
          },
          "title": "Include Patterns",
          "type": "array"
        },
        "exclude_patterns": {
          "description": "Glob patterns to exclude",
          "items": {
            "type": "string"
          },
          "title": "Exclude Patterns",
          "type": "array"
        },
        "min_file_size": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Minimum file size in bytes",
          "title": "Min File Size"
        },
        "max_file_size": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Maximum file size in bytes",
          "title": "Max File Size"
        },
        "modified_after": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Only files modified after this date (ISO format)",
          "title": "Modified After"
        },
        "modified_before": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Only files modified before this date (ISO format)",
          "title": "Modified Before"
        },
        "skip_empty_documents": {
          "default": true,
          "description": "Skip documents with no text content",
          "title": "Skip Empty Documents",
          "type": "boolean"
        },
        "min_text_length": {
          "default": 50,
          "description": "Minimum text length in characters",
          "title": "Min Text Length",
          "type": "integer"
        }
      },
      "title": "FilteringConfig",
      "type": "object"
    },
    "FolderMappingConfig": {
      "additionalProperties": false,
      "description": "Folder hierarchy mapping configuration.",
      "properties": {
        "create_parent_docs": {
          "default": true,
          "description": "Create Document entities for folders",
          "title": "Create Parent Docs",
          "type": "boolean"
        },
        "parent_id_pattern": {
          "default": "{source_type}-{directory}",
          "description": "Pattern for parent document IDs",
          "title": "Parent Id Pattern",
          "type": "string"
        },
        "max_depth": {
          "default": 10,
          "description": "Maximum hierarchy depth",
          "maximum": 50,
          "minimum": 1,
          "title": "Max Depth",
          "type": "integer"
        },
        "root_parent": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Optional root document URN",
          "title": "Root Parent"
        }
      },
      "title": "FolderMappingConfig",
      "type": "object"
    },
    "HierarchyConfig": {
      "additionalProperties": false,
      "description": "Hierarchy configuration.",
      "properties": {
        "enabled": {
          "default": true,
          "description": "Enable parent-child relationships",
          "title": "Enabled",
          "type": "boolean"
        },
        "parent_strategy": {
          "default": "folder",
          "description": "Parent document creation strategy. 'notion' extracts parent from Notion API metadata. 'confluence' extracts parent from Confluence page ancestors.",
          "enum": [
            "folder",
            "none",
            "custom",
            "notion",
            "confluence"
          ],
          "title": "Parent Strategy",
          "type": "string"
        },
        "folder_mapping": {
          "$ref": "#/$defs/FolderMappingConfig",
          "description": "Folder mapping configuration"
        },
        "custom_mapping": {
          "anyOf": [
            {
              "$ref": "#/$defs/CustomMappingConfig"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Custom mapping configuration"
        }
      },
      "title": "HierarchyConfig",
      "type": "object"
    },
    "IdNormalizationConfig": {
      "additionalProperties": false,
      "description": "Document ID normalization rules.",
      "properties": {
        "lowercase": {
          "default": true,
          "description": "Convert to lowercase",
          "title": "Lowercase",
          "type": "boolean"
        },
        "replace_spaces_with": {
          "default": "-",
          "description": "Replace spaces with this character",
          "title": "Replace Spaces With",
          "type": "string"
        },
        "remove_special_chars": {
          "default": true,
          "description": "Remove special characters except _ and -",
          "title": "Remove Special Chars",
          "type": "boolean"
        },
        "max_length": {
          "default": 200,
          "description": "Maximum ID length",
          "title": "Max Length",
          "type": "integer"
        }
      },
      "title": "IdNormalizationConfig",
      "type": "object"
    },
    "PageFilterConfig": {
      "additionalProperties": false,
      "description": "Configuration for filtering Confluence pages.",
      "properties": {
        "allow": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of specific Confluence pages to include in ingestion. By default, all pages in discovered spaces are included. Specify page IDs or URLs to limit ingestion to specific pages and their children.\n\nExamples:\n  - Page IDs: ['123456', '789012']\n  - Page URLs: ['https://domain.atlassian.net/wiki/spaces/ENG/pages/123456/API-Docs']\n\nWhen specified, only these page trees will be ingested (if recursive=true). This allows focusing on specific documentation sections.",
          "title": "Allow"
        },
        "deny": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of specific Confluence pages to exclude from ingestion. Applies after allow filtering.\n\nExamples:\n  - Exclude specific pages: ['123456', '789012']\n  - Page URLs: ['https://domain.atlassian.net/wiki/spaces/ENG/pages/999999/Draft']\n\nUseful for excluding specific pages within otherwise included spaces.",
          "title": "Deny"
        }
      },
      "title": "PageFilterConfig",
      "type": "object"
    },
    "ParallelismConfig": {
      "additionalProperties": false,
      "description": "Parallelism configuration.",
      "properties": {
        "num_processes": {
          "default": 2,
          "description": "Number of worker processes",
          "maximum": 32,
          "minimum": 1,
          "title": "Num Processes",
          "type": "integer"
        },
        "disable_parallelism": {
          "default": false,
          "description": "Disable all parallelism",
          "title": "Disable Parallelism",
          "type": "boolean"
        },
        "max_connections": {
          "default": 10,
          "description": "Max concurrent connections for async operations",
          "title": "Max Connections",
          "type": "integer"
        }
      },
      "title": "ParallelismConfig",
      "type": "object"
    },
    "PartitionConfig": {
      "additionalProperties": false,
      "description": "Unstructured partitioning configuration.",
      "properties": {
        "strategy": {
          "default": "auto",
          "description": "Partitioning strategy",
          "enum": [
            "auto",
            "hi_res",
            "fast",
            "ocr_only"
          ],
          "title": "Strategy",
          "type": "string"
        },
        "partition_by_api": {
          "default": false,
          "description": "Use Unstructured API for partitioning",
          "title": "Partition By Api",
          "type": "boolean"
        },
        "api_key": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Unstructured API key",
          "title": "Api Key"
        },
        "split_pdf_page": {
          "default": false,
          "description": "Enable page-level splitting for large PDFs",
          "title": "Split Pdf Page",
          "type": "boolean"
        },
        "split_pdf_concurrency_level": {
          "default": 5,
          "description": "Number of parallel requests for PDF pages",
          "title": "Split Pdf Concurrency Level",
          "type": "integer"
        },
        "ocr_languages": {
          "default": [
            "eng"
          ],
          "description": "Languages for OCR",
          "items": {
            "type": "string"
          },
          "title": "Ocr Languages",
          "type": "array"
        },
        "additional_args": {
          "additionalProperties": true,
          "description": "Additional partition arguments",
          "title": "Additional Args",
          "type": "object"
        }
      },
      "title": "PartitionConfig",
      "type": "object"
    },
    "ProcessingConfig": {
      "additionalProperties": false,
      "description": "Processing configuration (partitioning only, no chunking).",
      "properties": {
        "partition": {
          "$ref": "#/$defs/PartitionConfig",
          "description": "Partition configuration"
        },
        "parallelism": {
          "$ref": "#/$defs/ParallelismConfig",
          "description": "Parallelism configuration"
        }
      },
      "title": "ProcessingConfig",
      "type": "object"
    },
    "RetryConfig": {
      "additionalProperties": false,
      "description": "Retry configuration.",
      "properties": {
        "enabled": {
          "default": true,
          "title": "Enabled",
          "type": "boolean"
        },
        "max_attempts": {
          "default": 3,
          "title": "Max Attempts",
          "type": "integer"
        },
        "backoff_factor": {
          "default": 2,
          "title": "Backoff Factor",
          "type": "integer"
        },
        "retry_on_timeout": {
          "default": true,
          "title": "Retry On Timeout",
          "type": "boolean"
        }
      },
      "title": "RetryConfig",
      "type": "object"
    },
    "SourceConfig": {
      "additionalProperties": false,
      "description": "Document source configuration.",
      "properties": {
        "type": {
          "default": "EXTERNAL",
          "description": "Source type (always EXTERNAL for ingested docs)",
          "enum": [
            "NATIVE",
            "EXTERNAL"
          ],
          "title": "Type",
          "type": "string"
        },
        "include_external_url": {
          "default": true,
          "description": "Include external URL in DocumentSource",
          "title": "Include External Url",
          "type": "boolean"
        },
        "include_external_id": {
          "default": true,
          "description": "Include external ID in DocumentSource",
          "title": "Include External Id",
          "type": "boolean"
        }
      },
      "title": "SourceConfig",
      "type": "object"
    },
    "SpaceFilterConfig": {
      "additionalProperties": false,
      "description": "Configuration for filtering Confluence spaces.",
      "properties": {
        "allow": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of Confluence spaces to include in ingestion. By default, all accessible spaces are discovered. Specify space keys or URLs to limit ingestion to specific spaces.\n\nExamples:\n  - Space keys: ['ENGINEERING', 'PRODUCT', 'DESIGN']\n  - Space URLs: ['https://domain.atlassian.net/wiki/spaces/TEAM']\n  - Mixed: ['ENGINEERING', 'https://domain.atlassian.net/wiki/spaces/PRODUCT']\n\nIf specified, only these spaces will be ingested. Use deny to exclude specific spaces from discovery.",
          "title": "Allow"
        },
        "deny": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "List of Confluence spaces to exclude from ingestion. Applies after allow filtering.\n\nExamples:\n  - Exclude personal spaces: ['~user1', '~user2']\n  - Exclude specific spaces: ['ARCHIVE', 'OLD_DOCS']\n  - Space URLs: ['https://domain.atlassian.net/wiki/spaces/TEST']\n\nUseful for excluding personal spaces or archived content.",
          "title": "Deny"
        }
      },
      "title": "SpaceFilterConfig",
      "type": "object"
    },
    "StatefulStaleMetadataRemovalConfig": {
      "additionalProperties": false,
      "description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
      "properties": {
        "enabled": {
          "default": false,
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "title": "Enabled",
          "type": "boolean"
        },
        "remove_stale_metadata": {
          "default": true,
          "description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
          "title": "Remove Stale Metadata",
          "type": "boolean"
        },
        "fail_safe_threshold": {
          "default": 75.0,
          "description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
          "maximum": 100.0,
          "minimum": 0.0,
          "title": "Fail Safe Threshold",
          "type": "number"
        }
      },
      "title": "StatefulStaleMetadataRemovalConfig",
      "type": "object"
    },
    "TitleExtractionConfig": {
      "additionalProperties": false,
      "description": "Title extraction configuration.",
      "properties": {
        "extract_from_content": {
          "default": true,
          "description": "Try to extract title from document content",
          "title": "Extract From Content",
          "type": "boolean"
        },
        "fallback_to_filename": {
          "default": true,
          "description": "Use filename as title if not found in content",
          "title": "Fallback To Filename",
          "type": "boolean"
        },
        "max_length": {
          "default": 500,
          "description": "Maximum title length",
          "title": "Max Length",
          "type": "integer"
        }
      },
      "title": "TitleExtractionConfig",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "description": "Configuration for Confluence source connector.",
  "properties": {
    "stateful_ingestion": {
      "anyOf": [
        {
          "$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Stateful Ingestion Config"
    },
    "url": {
      "description": "Base URL of your Confluence instance. Examples: 'https://your-domain.atlassian.net/wiki' (Cloud) or 'https://confluence.your-company.com' (Data Center)",
      "title": "Url",
      "type": "string"
    },
    "platform_instance": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Optional human-readable identifier for this Confluence instance (e.g., 'mycompany-prod', 'team-a-confluence'). If not provided, automatically generated by hashing the base URL, which guarantees global uniqueness across all Confluence installations (both Cloud and Data Center). Use explicit values for more readable URNs, but auto-generated hashes are perfectly fine and require no manual configuration.",
      "title": "Platform Instance"
    },
    "cloud": {
      "default": true,
      "description": "Whether this is a Confluence Cloud instance (True) or Data Center/Server (False).",
      "title": "Cloud",
      "type": "boolean"
    },
    "username": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Username for Confluence Cloud authentication (required for Cloud).",
      "title": "Username"
    },
    "api_token": {
      "anyOf": [
        {
          "format": "password",
          "type": "string",
          "writeOnly": true
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "API token for Confluence Cloud authentication. Generate at: https://id.atlassian.com/manage-profile/security/api-tokens",
      "title": "Api Token"
    },
    "personal_access_token": {
      "anyOf": [
        {
          "format": "password",
          "type": "string",
          "writeOnly": true
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Personal Access Token for Confluence Data Center authentication. Generate from: User Profile > Settings > Personal Access Tokens",
      "title": "Personal Access Token"
    },
    "spaces": {
      "$ref": "#/$defs/SpaceFilterConfig"
    },
    "pages": {
      "$ref": "#/$defs/PageFilterConfig"
    },
    "max_spaces": {
      "default": 100,
      "description": "Maximum number of spaces to ingest when auto-discovering (applies when urls is not set).",
      "title": "Max Spaces",
      "type": "integer"
    },
    "max_pages_per_space": {
      "default": 1000,
      "description": "Maximum number of pages to ingest per space.",
      "title": "Max Pages Per Space",
      "type": "integer"
    },
    "recursive": {
      "default": true,
      "description": "Whether to recursively fetch child pages (applies to page URLs only).",
      "title": "Recursive",
      "type": "boolean"
    },
    "processing": {
      "$ref": "#/$defs/ProcessingConfig",
      "description": "Document processing configuration (partitioning strategy, OCR, etc.)."
    },
    "document_mapping": {
      "$ref": "#/$defs/DocumentMappingConfig",
      "description": "Configuration for mapping Confluence pages to DataHub documents."
    },
    "hierarchy": {
      "$ref": "#/$defs/HierarchyConfig",
      "description": "Parent-child relationship configuration."
    },
    "filtering": {
      "$ref": "#/$defs/FilteringConfig",
      "description": "Filtering options for document content."
    },
    "chunking": {
      "$ref": "#/$defs/ChunkingConfig",
      "description": "Configuration for document chunking (required for embeddings)."
    },
    "embedding": {
      "$ref": "#/$defs/EmbeddingConfig",
      "description": "Configuration for generating vector embeddings for semantic search."
    },
    "advanced": {
      "$ref": "#/$defs/AdvancedConfig",
      "description": "Advanced ingestion options."
    }
  },
  "required": [
    "url"
  ],
  "title": "ConfluenceSourceConfig",
  "type": "object"
}

Code Coordinates

Class Name: datahub.ingestion.source.confluence.confluence_source.ConfluenceSource
Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Confluence, feel free to ping us on our Slack.

Is this page helpful?

Confluence

Important Capabilities​

Overview​

Key Features​

1. Content Extraction​

2. Hierarchical Relationships​

3. Embedding Generation​

4. Stateful Ingestion​

Prerequisites​

1. Confluence API Access​

For Confluence Cloud​

For Confluence Data Center / Server​

2. Required Permissions​

3. Embedding Provider (Optional)​

Common Use Cases​

1. Auto-Discover All Spaces (Default)​

2. Include Specific Spaces​

3. Exclude Personal and Archive Spaces​

4. Specific Page Trees Only​

5. Combined Space and Page Filtering​

6. Data Center / Server Setup​

7. Production Setup with Stateful Ingestion​

8. Using URLs for Allow/Deny​

Filtering Content​

Space Filtering​

Page Filtering​

Filtering Rules​

Common Filtering Patterns​

How It Works​

Processing Pipeline​

URL Format Support​

Stateful Ingestion Details​

Limitations and Considerations​

Confluence API Limits​

Performance Considerations​

Content Extraction​

Troubleshooting​

Common Issues​

Performance Tuning​

Parallelism Settings​

Filtering​

Space Selection​

Related Documentation​

CLI based Ingestion​

Config Details​

Code Coordinates​