Skip to main content

v1.0.0

Release Availability Date

29-Apr-2026

  • CLI/SDK: v1.5.0.14
  • Remote Executor: v.1.0.0-cloud
  • On-Prem Versions:
    • Helm: v1.6.107
    • API Gateway: v0.7.0
    • Actions: v0.0.3

Known Issues

  • Async APIs - DataHub's asynchronous APIs perform only basic schema validation when receiving MCP requests, similar to direct production to MCP Kafka topics. While requests must conform to the MCP schema to be accepted, actual processing happens later in the pipeline. Any processing failures that occur after the initial acceptance are captured in the Failed MCP topic, but these failures are not immediately surfaced to the API caller since they happen asynchronously.

Release Changelog

v1.0.0

New Feature Highlights

  • Data Observability Agent: AI-powered agent for seamless monitoring setup and management, available in Private Beta via the Data Health dashboard.

  • Observe Assertion Notification Threading: Threaded assertion notifications with feature flag and UI controls

  • Observe Assertion Failure Severity: Computed severity levels for assertion failures

  • Observe False Positive Feedback Loop: Mark smart assertion notifications as false positives right in Slack and Teams

  • Observe Assertion Ownership: Assign owners to assertions, with owner select in the builder drawer and alignment within assignment rule lists

  • Ask DataHub Memory: Persistent user context across conversations, with server-side conversation memory for the Slack bot

  • Contrastive Description Generation v4: Improved AI-generated descriptions using contrastive techniques

  • SMTP Email Notification System: Full email notifications with connection pooling, circuit breaker, and metrics

  • Change History Sidebar: View change history for Dataset, Domain, Glossary Term, and Data Product entities

  • Structured Property Proposals: Ability to propose structured property changes through the governance workflow

  • Full-text search for structured property values including URN-type values

  • Aerospike: New ingestion source

  • Apache Flink: New connector for metadata and lineage

  • Microsoft Fabric Data Factory: New connector

  • Omni BI Platform: New source (INCUBATING)

  • Pinecone Vector DB: New source

  • StarRocks: New source connector

  • All changes in https://github.com/datahub-project/datahub/releases/tag/v1.5.0

Helm

  • Note the following changes for Helm to support this release are as follows:
    global.datahub.systemUpdate.consolidatedUpgrade: false → true
    global.datahub.monitoring.metricsMode: legacy → jmx_and_actuator

Breaking Changes

  • V1 UI has been fully removed. The V2 UI is now the only supported interface.
  • Spring Boot upgraded from 3.5.6 → 4.0.5 (Spring Framework 7.0, Spring Kafka 4.0). This may affect custom plugins or extensions.
  • Deprecated corpUserInfo.active field is no longer considered for session eligibility. Users relying on this field for login gating should migrate to the supported status mechanisms.

Deprecations

  • Dynamic ownership reassignment for proposals is now an opt-in feature. Proposals continue to work as expected — existing asset owners will still receive and can act on proposals. If your workflow depends on ownership reassignment automatically updating who sees proposals, please enable the option in the Automations section.
  • (Operations / Helm) Per-workload monitoring configuration is deprecated in favor of cluster-wide settings (global.datahub.monitoring).
  • (Operations / Helm) Enable global.datahub.systemUpdate.consolidatedUpgrade so upgrades run through the consolidated system-update path and the chart no longer relies on separate one-off setup jobs (e.g. mysql/ES setup jobs) for that flow.

Product

  • DataObservabilityAgent: AI-powered agent for assertion management, wired into the Data Health page chat panel
  • Ask DataHub Memory: Persistent user context across conversations, with server-side conversation memory for the Slack bot
  • Contrastive Description Generation v4: Improved AI-generated descriptions using contrastive techniques
  • SMTP Email Notification System: Full email notifications with connection pooling, circuit breaker, and metrics
  • Change History Sidebar: View change history for Dataset, Domain, Glossary Term, and Data Product entities
  • Structured Property Proposals: Ability to propose structured property changes through the governance workflow
  • Assertion Failure Severity: Computed severity levels for assertion failures
  • Assertion Notification Threading: Threaded assertion notifications with feature flag and UI controls
  • False Positive Feedback Loop: Mark smart assertion notifications as false positives
  • Custom Assertion Grouping: Group related assertions for organized management
  • Assertion Notes: Add notes to assertions
  • Assertion Ownership: Assign owners to assertions, with owner select in the builder drawer and alignment within assignment rule lists
  • Subscription Auto-enrollment: Automatically enroll in subscriptions during assertion creation
  • Assertion Tag Filtering: Filter assertions by tags in the web UI
  • Data Health Incidents Assignee Filter: Filter incidents by assignee on the data health page
  • Lifecycle Stages: New concept for tracking entity lifecycle in DataHub
  • Incident External Links and Notes: Enrich incident records with external links and notes
  • Teams Manifest Bundle Download: Download Microsoft Teams app manifests
  • OAuth2 Authorization Support: Full OAuth2 authorization support added to GMS
  • Configurable OIDC Redirect: Configurable redirect URL for OIDC access_denied errors
  • Structured Login Denial Reasons: Detailed session eligibility feedback on login failures
  • Default Views for Service Accounts: Set default views for service accounts
  • View unsupported warnings for freshness and volume assertions
  • Default descriptions added for freshness and volume assertions
  • Stale assertion cleanup added to assertion assignment rules
  • Hard entity limit for search-based assertion rules to prevent runaway rules
  • Remote executor gating for assertions
  • MCP tools for Ask DataHub with readOnlyHint support and register_feedback for metadata gap tracking
  • Dynamic optional scopes for Slack integration (channel history, group history, IM history)
  • Slack Socket Mode support
  • Smart search keyword expansion for better recall
  • Acronym matching for glossary terms in search
  • TAG_BOOST signal for custom per-tag boosting in search
  • V2.5 cross-entity ranking with DisMax scoring, name-match signals, and diversity promotion
  • V2.5 latency optimization for query understanding, multi_match, and focused fields
  • Full-text search for structured property values including URN-type values
  • Structured property filter support in Ask DataHub search and lineage tools
  • Configurable OAuth client ID and secret in the advanced bot token config
  • Updated DataHub logo with new branding
  • UI Redesign: Full migration to Alchemy design system across entity sidebar, Policies, Groups, Roles & Permissions, Access Tokens, Manage Ownership, Home Page posts, Checkbox/Select components, owners/avatars/pills, and complete semantic color token migration
  • Frontend performance: Phosphor icons tree-shaken and migrated to v2; moment.js replaced with dayjs; Monaco Editor static assets reduced from 23 MB to 3 MB
  • Remote execution actions support with modal for viewing workers, logs, and Kafka metrics in Automations
  • Propagation V2 restructured to support batch processing
  • Applications server-side search and pagination
  • Multi-entity-type search support in SelectItemsPopover
  • Sorting conversations by last message time in chat
  • Chat tabs synced across sibling pages
  • Hide visible properties from summary tab menu
  • Improved "try it out" and disabled "run this assertion" for bucketed assertions
  • Ingestion UX: save and run without browser state; redirect to manage page instead of create source page; message reactions added to ingestion chats; schedule disabled by default

Platform

  • (Operations / Helm) Added global.datahub.monitoring.metricsMode with three modes: legacy (default), jmx_and_actuator, and actuator_only, so JMX vs Spring Boot Actuator scraping can be chosen cluster-wide. See the Micrometer transition plan documentation for more details.
  • Elasticsearch Zero-Downtime Upgrade: Server-side ZDU infrastructure with feature flags, allowDocCountMismatch, background aspect schema version migration sweep, and K8s scale-down with conditional evaluation
  • Cascade Operation Observability: Read-side fan-out observability for lineage search and filter rewriting, cascade context extended to all deleteReferencesTo phases
  • AspectMigrationMutator Framework: Infrastructure for schema version migrations with schemaVersion in systemMetadata
  • Batch-Processing Actions: Support for batch-processing actions with default async Kafka offset commits
  • RelationshipChange Platform Event: Emitted on relationship changes
  • Documentation Change Event Generator: For entity change events
  • Relationship Scroll Endpoints: Generic OpenAPI and GraphQL scroll endpoints for relationships
  • Custom TrustStore and mTLS: For GMS client connections
  • GCS Workload Identity Federation (WIF): Support for GCS connections
  • Sentry Instrumentation: Web metrics on cloud deployments
  • GraphQL Shape Logging: For query analysis and debugging
  • Proposal Reassignment Action: Reassign proposals via automation recipe
  • Aligned metadata_aspect_v2 indexes between Postgres and MySQL, added corpuser/corpGroup partial indexes, added schemaVersionIndex
  • Parallel reindexing with concurrency control and optimization
  • GMS startup time optimized
  • Reduced log volume across DataHub Java services (multi-phase)
  • Set ES_BULK_REFRESH_POLICY=NONE in Docker configs for improved performance
  • Executor: recognizes new Kafka-based channel; supports running actions in workers on startup; cap bootstrap monitor fetch to available slots; max 1 thread per job; Databricks authentication pattern revamp for observe; verbose logging capability
  • Helm pre-delete cleanup upgrade infrastructure
  • K8s upgrade: wait for full deployment rollout during scale-down
  • Spot termination configs for GMS and datahub-frontend
  • searchTier annotations added to glossary and document definitions
  • Patch support for * in path with arrayPrimaryKeys and documentation patch
  • OpenAPI dynamic config for /poll based on application.yaml
  • Actions: expose event source on PipelineContext
  • Configurable connection pool settings for REST emitter
  • Configurable report sample sizes for progress and final reports
  • Fail-fast GMS auth check on integrations service startup
  • MCP mutator to truncate long monitor error messages
  • Various dependency bumps and CVE fixes

Ingestion

  • Aerospike: New ingestion source
  • Apache Flink: New connector for metadata and lineage
  • Microsoft Fabric Data Factory: New connector
  • Omni BI Platform: New source (INCUBATING)
  • Pinecone Vector DB: New source
  • RDF Ingestion: Initial feature
  • StarRocks: New source connector
  • PowerBI: Replaced Lark M-Query parser with Microsoft powerquery-parser; support for Sql.Databases M-Query function, NativeQuery on BigQuery, IfExpression branch walking, external URL for Apps, browse paths V2 with proper hierarchy, column-level lineage enabled by default
  • Snowflake: Emulate tag inheritance in-memory to eliminate N+1 queries (major performance improvement)
  • BigQuery: Enrich external table metadata (source format, URIs, compression, max bad records)
  • dbt: Extract and emit stats from catalog.json; support glob patterns in run_results_paths for S3 and local paths; support dbt semantic models
  • Dataplex: Parallelized entry detail fetching and lineage lookups; support cross-project lineage with explicit region scan pairs; support Vertex AI datasets; additional entry groups and hierarchy mapping; Dataplex metadata sync integration
  • Glue: JDBC upstream lineage for Glue jobs; Iceberg lineage; updateTime as lastModified
  • Trino: Column-level lineage on upstreamLineage
  • Redshift: Extract table, view, and schema ownership from catalog; redshift-slim extra added
  • Teradata: Performance and scalability improvements
  • Kafka Connect: Support Debezium and Confluent JDBC sink connectors; ClickHouse Sink Connector; BigQuery topics2TableMap; JVM bundled via jdk4py to remove system Java dependency
  • Mode: Performance improvements with threading, SQL caching, and instrumentation
  • MSSQL: Skip inaccessible databases when database=null; convert_column_urns_to_lowercase config
  • Dagster: Emit StatusClass aspect for soft-deleted asset handling
  • Hex: Allow category filters
  • VertexAI: ExperimentRun→TrainingJob and Dataset→ModelVersion lineage with improved type annotations
  • JSON Schema: dataset_name_strategy for display name control
  • Sigma: Convert to using ownerID and email
  • Delta Lake: Azure connection support with domain assignment
  • Iceberg: Ingestion-time domain assignment
  • Data Factory: Support linkedService typeProperties for OneLake lineage resolution
  • Multi-entity domain and ownership transformers
  • Airflow: Fallback for MappedOperator import
  • Databricks Sync: OAuth and unified auth support
  • SQLAlchemy profiler achieves feature parity with GE profiler
  • sqlglot upgraded to v30.0.3
  • Ingestion migrated from setup.py to pyproject.toml (PEP 621) with uv.lock and constraints.txt for reproducible builds
  • Configurable report sample sizes and failure logging
  • Schema resolver bulk-fetch caching fix for performance
  • Parallel processing of resources
  • DataHub CLI improvements: datahub init --sso for browser-based SSO login; datahub lineage command for upstream/downstream traversal; datahub search with semantic search, projection, and agent context; datahub graphql agent-friendly improvements; datahub datapack experimental command; datahub delete undo-by-filter prompt; --context / -C flag for event properties; caller context in User-Agent header; GraphQL query projection system for schema comparison; version-based cache invalidation for datapack index files
  • Agent Context Kit: Cloud tools for Ask DataHub, Databricks integration, Cursor + Claude Code + Microsoft Copilot guides, Vertex AI builder, improved LangChain and ADK example agents, Google ADK agent-context setup, DataHub Skills Registry tutorial and connector development guide

Bug Fixes

  • Fixed patch stability by reverting template attribution changes; clients now use generic JSON patch
  • Fixed propagation_v2_action bugs and UI issues
  • Fixed smart assertion error surfacing when no operations reported on freshness assertions
  • Fixed smart assertion errors when online pipeline is disabled
  • Fixed on-demand smart assertion returning INIT/Training status incorrectly
  • Fixed Databricks freshness assertion validation failure
  • Fixed assertion monitor training period sync with backfill config
  • Fixed assertion notes not preserved when saving settings tab changes
  • Fixed assertion source created bug for missing sources
  • Fixed floor/ceiling value check in smart assertion predictions
  • Fixed deleted assertions from being executed
  • Fixed only first owner saved on assertion rule create
  • Fixed handling of legacy filter format when editing assertion rules
  • Fixed inverted tooltip message on disabled Create button in Quality tab
  • Fixed silent null in AssertionStdParameter.value
  • Fixed column metric assertion operator validation to use metric result type
  • Fixed circular init in MonitorInfoPatchBuilder causing NPE
  • Fixed nightly batch job errors not surfacing in assertion assignment rule UI
  • Fixed semantic inversion in SystemMetadataUtils breaking multi-platform connections
  • Fixed GraphQL enum mapping for assertions
  • Fixed minor bucketing bugs in time-series assertions
  • Fixed dbt freshness assertion custom type
  • Fixed monitoring rule ownership graceful failure and GraphQL nullness
  • Fixed glossary term hover crash on Summary Panel (JSON parse error in HoverCard)
  • Fixed glossary crash on certain operations
  • Fixed copy name on glossary term returning UUID instead of term name
  • Fixed nested select component behavior
  • Fixed ownership type creation form retaining previous name
  • Fixed duplication of owners in applications
  • Fixed data product updates on asset profile page inconsistency
  • Fixed occasional stale cache when adding/removing multiple data products
  • Fixed removing data products in the select
  • Fixed Impact Analysis sort crash
  • Fixed product tour button
  • Fixed toolbar focus bug and allowed file uploads in summary page widgets
  • Fixed loading state of entity sidebar header
  • Fixed percent-encoding preserved in URL params during sort option change
  • Fixed conversation ordering in chat
  • Fixed disappearing conversation titles in chat
  • Fixed chat send button not appearing after pasting text
  • Fixed chat agent name resolution (release blocking)
  • Fixed Slack cached thinking termination to prevent echo
  • Fixed double-adding of user messages to agent history
  • Fixed chat URN links formatting for assertion URLs
  • Fixed AI Memory tab hidden when AI features are disabled
  • Fixed Slack user_urn passed to agent so AI memories are loaded in bot integrations
  • Fixed column level lineage search failing for nested fields
  • Fixed orFilters usage for degree filter in getSearchAcrossLineageCounts
  • Fixed special character escaping in CONTAINS_STR predicate queries
  • Fixed user filters applied to V2.5 supplementary non-dataset queries
  • Fixed GraphQL minified adapted query to avoid 200K whitespace-token limit
  • Fixed missing applications field on glossary term and data product GraphQL queries
  • Fixed OIDC redirect cookie creation
  • Fixed OIDC clientSecret exposure via globalSettings GraphQL query
  • Fixed hasClientSecret and __typename leaking into OIDC mutation payload
  • Fixed login path exact matcher
  • Fixed SCIM config flag to include non-SCIM entities in API responses
  • Fixed MANAGE_GLOBAL_SETTINGS enforcement for globalSettings REST mutations
  • Fixed Spring Boot URL encoding to allow // and other encoded URN properties in URLs
  • Fixed OAuth authentication on /auth/oauth2/clients endpoint
  • Fixed custom theming for FIS
  • Fixed governance inbox scroll-based pagination (listActionRequests)
  • Fixed stale view URN when selected view no longer exists
  • Fixed context path overflow on entity header
  • Fixed glossary section added to policy view-only modal
  • Fixed selected glossaries shown in policy dropdown
  • Fixed selected alignment for bulk editing
  • Fixed data product icons position
  • Fixed stale asset summary properties display
  • Fixed document editor toolbar disappearing when clicked
  • Fixed form on snowflake / bq / databricks automation syncs
  • Fixed issue with resetting remote executor in advanced settings
  • Fixed ingestion form: unable to switch back to form tab in edit source; discard changes shown without changes; navigating to source from executions; schedule disabled by default
  • Fixed SchemaAspects always resulting in MCLs even when no change
  • Fixed delete-refs cascade stall when deleting entities referenced by >5000 entities
  • Fixed race condition and broken timeout in GraphQueryPITDAO
  • Fixed drain slice futures before PIT/scroll cleanup
  • Fixed VersionPropertiesSideEffect mutating shared DataMap causing 422 on transaction retry
  • Fixed document count logic for zero-downtime upgrade
  • Fixed correct document filtering in global context when searching documents
  • Fixed MCP plugin setup issues
  • Fixed anomaly state updates for existing run events
  • Fixed actions async_commit_enabled to actually use async offset commits
  • Fixed OTel context propagation for async requests
  • Fixed custom SSLContext from ShimConfiguration used instead of JVM default in search client shims
  • Fixed search index swap using removeIndex action in incremental reindex
  • Fixed Kafka authExceptionRetryInterval for MSK IAM resilience
  • Fixed Kafka replication factor configurable per topic
  • Fixed executor on CLI guard in get_connection_from_entity
  • Fixed executor lazy-load inference_v2 ML stack to cut idle memory
  • Fixed executor ingestion source listing consolidation and inefficiencies
  • Fixed ingestion system source updates not triggering on DataHub upgrades
  • Fixed ingestion JSON schema inference streaming to prevent OOM on large files
  • Fixed MultipleAspectTransformer silently dropping non-matching MCPWs
  • Fixed duplicate [CLI] source creation during executor-managed ingestion
  • Fixed Snowflake conditional multi-table INSERT statement support
  • Fixed Snowflake governance DDL stripped before SQL parsing to preserve lineage
  • Fixed Snowflake COPY query type mapped to INSERT instead of UNKNOWN
  • Fixed Snowflake private_key excluded from config serialization
  • Fixed dbt: always emit sibling relationships to fix duplicate search results for semantic views
  • Fixed dbt: preserved manually added owners during PATCH mode ingestion
  • Fixed dbt: removed hard generate_docs requirement from cloud auto-discovery
  • Fixed PowerBI lowercase normalization applied to all upstream lineage URN code paths
  • Fixed PowerBI Power Query M parser optional whitespace rule (10x speedup)
  • Fixed PowerBI T-SQL control statement cleanup
  • Fixed PowerBI create_corp_user default reverted to True to prevent user soft-deletion during upgrade
  • Fixed Tableau embedded datasource IDs leaking into published datasource filter
  • Fixed Tableau project filters applied to embedded datasources when emit_all_embedded_datasources is enabled
  • Fixed Hive memory regression and slow view lineage in metastore source
  • Fixed Glue Jobs not modifying S3 URNs or overriding datasetProperties
  • Fixed Glue fine-grained lineage generation without a graph
  • Fixed Glue table UpdateTime treated as UTC
  • Fixed DB2 fixes for z/OS and allow using connector on windows/amd64
  • Fixed LookML added glue to 2-part platform name set
  • Fixed MSSQL convert_urns_to_lowercase in default recipes
  • Fixed Teradata convert_urns_to_lowercase applied in get_identifier()
  • Fixed Fivetran Databricks destination starter recipe
  • Fixed VertexAI stateful ingestion AttributeError and unthrottled pagination
  • Fixed Kafka Connect separate session for REST API to prevent credential leak
  • Fixed Kafka Connect JDBC sink datajob production when runtime topics API is empty
  • Fixed Omni stateful ingestion wiring for stale metadata removal
  • Fixed Fabric OneLake split workspace platform
  • Fixed Azure AD groups_pattern filter applied before URN construction
  • Fixed BigQuery explicit credentials passed to GCP clients for thread safety
  • Fixed BigQuery reverted __TABLES__ replacement for observe assertions
  • Fixed SQL statement parsing: split statement support for end/END and IF NOT EXISTS
  • Fixed sqlglot switched to pure Python to avoid memory leak in sqlglot[c]
  • Fixed Airflow fallback for MappedOperator import
  • Various security CVE fixes (Rhino, Logback, Commons Lang3, ion-java, Authlib, pyOpenSSL, litellm, ujson, Jetty, Netty, pyarrow, Spring, tornado, pillow, and more)