v1.1.0
Release Changelog
v1.1.0
Includes all upstream OSS changes through DataHub Core v1.6.0 — see Updating DataHub for breaking changes.
Release Availability Date
01-June-2026
Recommended Versions
- CLI / SDK: v1.6.0
- Remote Executor: v1.1.0-cloud
- On-Prem Versions:
- Helm: 1.6.129
- API Gateway: v0.7.0
- Actions: v0.0.3
New Feature Highlights
- Assertion Failure Severity — Assertion failures now include a
severityclassification that indicates how significant the failure is. Thresholds can be manually configured through Severity Assignment Rules, or automatically determined by DataHub using signals like deviation from expected bounds, asset importance, and potential downstream impact. SDK tutorial. - Slack Assertion Notification Redesign — Notifications now include downstream impact summary, assertion run result chart, severity, and noise-optimization with threading.
- Data Observability Agent — expanded — New capabilities; start chats from the Quality tab, manage subscriptions from the agent, bulk create smart assertions.
- Column-Level Metadata via MCP —
get_entitiesnow returns aspects on schemaField URNs, and column-level structured properties and documentation are exposed to agents — closing a long-standing gap for column-aware agentic workflows. - Ask DataHub — Follow-up suggestions in the UI; external MCP tools (AI Plugins) now usable in Slack.
- OAuth2 Dynamic Client Registration (RFC 7591) — Third-party apps and MCP clients can self-register via standard OAuth2 endpoints without manual setup.
- CDE Governance MCP — New MCP toolset exposing governance, fitness, lineage, and reliability capabilities to AI agents and IDE assistants.
- DataHub Bridge — New Kafka worker in Remote Executor unifying MCL flow across all workloads (removes AWS SQS / SNS dependency).
- Glossary Relationships — Glossary terms can now be linked in new ways through several new relationship types: synonym of, antonym of, and translates to. These relationships, along with the existing ones, are now supported with an updated UI in the Related Terms tab!
Breaking Changes
v1.1.0 inherits breaking changes from upstream OSS v1.6.0. Scan the categories below to see if any apply to your deployment — full details, exact configs, and migration actions are in Updating DataHub — v1.6.0.
Platform & runtime — likely action required for self-hosted, custom plugin, and Compose deployments
- Spring Boot 4 (GMS / Java services) — custom plugins, Spring extensions, and Kafka listener customizations need recompile and retest.
- V1 UI removed — V2 UI is now the only supported interface.
- Play 3 +
DATAHUB_SECRET— secret must be ≥32 bytes ordatahub-frontendfails on startup. - Micrometer / JMX scrape paths — Actuator now on port 4319, JMX agent metrics path now
/metrics. Prometheus and Grafana configs need updating. - Docker custom builds —
BASE_IMAGEandapkRepositoryUrlbuild args required.
Auth & API contracts — affects custom integrations
corpUserInfo.activeno longer considered for session eligibility.- Search filter
value→values— singularvalueremoved on filterCriterion; custom REST, GraphQL, and SDK clients must send avaluesarray. - Actions / Kafka default offset commits now async — up to 25× throughput, but ~10s of events may be redelivered on crash. Non-idempotent custom actions should restore sync commits.
Ingestion — URN & lineage shifts — may break dashboards, saved searches, and lineage queries keyed on old URNs
- Athena — Glue-backed catalogs now emit Glue upstream URNs (previously S3); non-Glue Iceberg catalogs emit Iceberg URNs.
- Fivetran (hybrid multi-destination) — per-destination platform discovery; URNs shift away from the log warehouse's platform.
- Two-Tier SQL stored procedures (MySQL, MariaDB, Hive, ClickHouse, Teradata) —
DataFlow/DataJobURN shape no longer duplicates the database name. - Sigma — multiple changes for warehouse-backed charts: chart
InputFieldsself-references dropped, column field paths now use warehouse-native names, workbook charts emit warehouse-table edges inChartInfo.inputs, and DM element schema fixes may surface previously suppressed columns. Redshift users: setdefault_database/default_schemaper connection in the newconnection_to_platform_map. - Deep-path SQL sources (Dremio and similar) — table references with 5+ path components no longer drop middle components, so corrected URNs replace previously-truncated ones. Old URNs and any tags / owners / glossary terms / lineage attached to them are orphaned unless stateful ingestion with stale-entity removal is enabled.
- Vertex AI — model version set URNs now scoped per project.
Ingestion — config & behavior changes — likely action required for affected recipes
- SQL profiling default →
sqlalchemy— recipes withmethod: geneedpip install 'acryl-datahub[profiling-ge]'. Applies to all SQL connectors including Unity Catalog. - Unity Catalog ownership / properties UPSERT by default — manually-added owners or properties are overwritten on each run. Set
incremental_ownership: true/incremental_properties: trueto preserve them. - BigQuery
extract_policy_tags_from_catalogrewritten onINFORMATION_SCHEMA+ batched Data Catalog API; old path removed with no fallback. - dbt assertion result types — runtime errors now map to
type = ERRORinstead ofFAILURE; newseverityfield (LOWforwarn,HIGHforfail). Alerts filtering ontype == FAILUREwill no longer include infra issues. - Structured properties writes — assignments whose property definition is missing are now silently stripped from the write (server-side WARN) instead of hard-failing the request. Pipelines that relied on hard failures to detect broken property URNs should set
STRUCTURED_PROPERTIES_DROP_MISSING_PROPERTY_VALUES_WITH_WARNING=falseto restore the old behavior. - Dataplex
filter_configfield renames — recipes using old names will fail.
Deprecations
- Helm per-workload monitoring → use cluster-wide
global.datahub.monitoring - Legacy system-update path → use consolidated system-update path
- Great Expectations profiler → SQLAlchemy is the new default
- Glossary Term AI automation
Security & Dependencies
- Spring Boot 4.0.6 (CVE-2026-22751 and related)
langchain-core≥1.2.28 (CVE-2026-40087)nbconvert7.17.1 (CVE-2026-39378)python-dotenv≥1.2.0 (CVE-2026-28684)requests≥2.33.0 (CVE-2026-25645)langsmith≥0.7.31 (GHSA-rr7j-v2q5-chgv)mako≥1.3.11 (CVE-2026-41205)hadoop3.4.1 (CVE-2024-23454)python-multipart0.0.26 (CVE-2026-40347)pycurl,GitPython, Jupyter stack,mistune,observe-models(CVE-2026-1839) — minimum versions raised
Product
- Ask DataHub — Follow-up suggestions in the UI; external MCP tools (AI Plugins) enabled in Slack;
chat_id/ token telemetry linkage; markdown escape sequences handled correctly in conversation titles. - Slack Bot — Responds to bot mentions; dynamic optional scopes for channel, group, and IM history.
- Agents — Run agent tasks as a specific user;
runAsrestricted to service accounts. - MCP Tools — Read-only tools are now correctly annotated as such, letting agents skip approval prompts for safe operations. Entity description truncation is now configurable (default raised to 5000) so operators can balance context completeness against token usage.
- Column-Level Metadata via MCP —
get_entitiesnow returns aspects when called on a schemaField URN, and column-level structured properties and documentation are exposed (behind env vars), giving agents richer column context without dataset-level lookups. - Action Workflows — Workflows can now be created without expiration date or additional notes fields.
- Semantic Search — Elasticsearch 8.18+ semantic search with Vertex AI and local Ollama embedding providers.
- Structured Properties — Stricter GMS validation; CSV enricher support; search enabled on single-select dropdown.
- Duplicate User Resolution — Support finding users with the same email (but differ on casing) and "merge" them into one primary user. Comes with UI support to override this automated merging. Default disabled behind a feature flag.
- Glossary Versioning — This release now supports versioning of glossary terms via Ask DataHub or through the API only for now. Entity versioning feature flag must be enabled.
- Glossary Lifecycle Stages — Support lifecycle stages for glossary terms such as "Draft" or "In Review" and display these in the UI for users. This is only supported through Ask DataHub or API for right now.
DataHub Observe
- Data Observability Agent — Start chatting directly from the Quality tab; subscription management tools added to the agent; bulk smart column assertions flow replaced with AI chat flow.
- Assertion Failure Severity — Assertion failures now include a
severityclassification that indicates how significant the failure is. Thresholds can be manually configured through Severity Assignment Rules, or automatically determined by DataHub using signals like deviation from expected bounds, asset importance, and potential downstream impact. SDK tutorial. - Slack Assertion Notification Redesign — Notifications now include downstream impact summary, assertion run result chart, severity, and noise-optimization with threading.
- Teams Notifications — Redundant pass notifications suppressed; user resolution requirement added.
- Assertion Errors in Alerts — Assertion errors now flow through to the alert UI so on-call engineers see the underlying failure reason without drilling into the assertion.
- Volume Assertions — Decimal threshold values supported;
EQUAL_TOoperator added. - Smart Assertions — Minimum lookback enforced in the tuning modal and executor; INFERRED smart assertions with non-uniform crons rejected; day-aligned backfill validation prevents midnight UTC boundary flips.
- SQL Assertions — Set operations and subqueries now allowed in assertion SQL.
- Databricks Improvements —
TABLE_STATISTICSmetric source added. This runs ANALYZE TABLE ... COMPUTE STATISTICS followed by DESCRIBE TABLE EXTENDED, which reads the cached numRows from the catalog rather than scanning data. On Delta tables this is a metadata-only operation, making it significantly cheaper than a COUNT(*) query on large tables. Read more. - BigQuery Improvements — New
PLATFORM_APImetric source added. Uses the tables.get API to retrieve the current row count for a Table. This is the most cost-effective option as it does not run any SQL queries or scan any data — the row count is retrieved via a free API call. Read more.
Private Beta Preview
These capabilities are in active development and gated behind feature flags. Reach out to your DataHub representative to enable them.
Context Hub
A new workspace for proposing, reviewing, and publishing AI-generated context (documentation, glossary terms, annotations) on assets — with human-in-the-loop governance.
- Detects common SQL patterns from query logs and proposes them as context documents with LLM-generated descriptions, related questions, and summarized logic — improving agentic text-to-SQL accuracy
- Inbox surfaces proposals with smart routing to likely domain experts for validation prior to publishing for agentic use
- Document history tracks lifecycle and annotation changes as business context evolves
- New MCP tools for agentic SQL workflows:
find_sql_context,draft_sql_for_tables,generate_sql_sketch, andnote_metadata_observation
Context Evals
Keep agent-facing context accurate and consistent. Evals continuously check context quality in DataHub and flag divergent or contradictory documentation before it's published.
- Create, edit, and delete eval questions from a dedicated Evals page
- Run evals individually or in groups against a dedicated eval-runner agent
- Surface the documents each eval referenced to debug answer quality
- Attach evals to context doc proposals to validate documentation changes before publishing
Ingestion — New Sources
- Airbyte — Workspaces, connections, sources, destinations, and streams, with asset- and column-level lineage from source → destination.
- dltHub — dlt pipelines and destination tables, with asset-level lineage end-to-end and column-level lineage for direct-copy sql_database pipelines.
- Matillion DPC — Projects, pipelines (orchestration, transformation, streaming/CDC), and execution history, with asset- and column-level lineage.
- Informatica Cloud (IDMC) — Taskflows, Mapping Tasks, and source/target datasets with asset-level lineage.
Ingestion — Expanded Functionality
- Sigma
- Data Model ingestion enabled by default
- Lineage support for formula-resolved charts, workbook element-to-element edges, CustomSQL warehouses, and cross-DM fine-grained lineage
- Map each Sigma connection to its underlying warehouse platform independently — lineage stays accurate when Sigma sits on top of multiple warehouses
- BigQuery
- Faster policy-tag extraction via
INFORMATION_SCHEMA - Richer external table metadata
- Graceful handling of native identifiers to support hyphenated project IDs
- Faster policy-tag extraction via
- Snowflake
- All identifiers quoted to preserve mixed-case names
- Redshift
- All identifiers quoted to preserve mixed-case schema names
- Spurious connection failures resolved; error messaging improved
- Databricks Unity Catalog
- Ingest Metric Views by enabling
include_metric_viewsin ingestion
- Ingest Metric Views by enabling
- Glue
- JDBC upstream lineage and Iceberg lineage supported
- Job entity subtype introduced
- Automated mapping to apply structured properties on schema fields
- Power BI
Sql.DatabasesM-Query supported- PBI entities in DataHub automatically linked to their workspace external URLs
- Kafka Connect
- Column-level lineage for sink connectors
- Fivetran
- Per-destination platform discovery in hybrid API + log mode
- Athena
- Improved automated lineage detection to upstream Glue- and Iceberg-backed tables
- Fabric OneLake
- Column-level lineage supported for View entities
- Query usage extracted from
queryinsights
- dbt
- Configurable URN lowercasing
- Stats extracted from
catalog.json - Assertion
severityand improved ERROR vs FAILURE mapping
- Postgres
- Stored-procedure SQL bodies and lineage improvements
- SQL Profiling (all SQL connectors)
- SQLAlchemy profiler is the default (faster, no Great Expectations dependency)
DataHub Remote Executor
- AWS Secrets Manager & GCP Secret Manager — Native support for secret resolution.
- DataHub Bridge — New Kafka-based worker replacing SQSQ for moving metadata events from Remote Executor back to DataHub Cloud. Streamlines infrastructure dependencies (no AWS SQS / SNS required) and unifies MCL conversion across all workloads.
- OAuth-Secured Kafka — The executor now supports OAuth authentication when producing to Kafka, enabling integration with managed services (e.g., Confluent Cloud OAuth) and self-managed clusters using OIDC.
- Faster, More Reliable Health Checks — Health endpoint reduced to a sub-10ms liveness check. No longer fails during transient GMS outages, eliminating spurious pod restarts.
- Proactive Background Task Monitoring — Fetcher staleness, config resolver retries, scheduler jobs, and sweeper activity now emit Prometheus counters. Alerts fire within minutes of an issue, instead of waiting for retry exhaustion (up to 24h).
Operator note: Use
WORKER_CONFIG_FETCHER_RETRIESfor alerting on sustained retries. The legacyWORKER_CONFIG_FETCHER_ERRORSgauge is preserved for backwards compatibility but no longer increments on every retry, so any existing alerts on it will silently miss issues.
Platform
- Zero-Downtime Upgrade (ZDU) — Optional Elasticsearch side-upgrade path via Helm; overwrites during background sweep prevented; ZDU flags enabled in quickstart.
- Java 25 LTS runtime in official Docker images; Java 21 build toolchain.
- Spring Boot 4.0.6 — Bumped to address CVE-2026-22751 and related advisories.
- Frontend on Play 3 + Apache Pekko — Improved security and maintainability;
DATAHUB_SECRETmust be ≥32 bytes (see Breaking Changes). - Frontend Security Hardening — Content-Security-Policy, sanitized API error responses, URL validation before rendering links, home page template / module scope checks.
- OAuth2 — RFC 7591 Dynamic Client Registration; single-use auth codes; public-client refresh and revoke; MCP token validation; authorize requests with no scope handled correctly.
- Micrometer / Prometheus — Actuator on port 4319 by default; JMX agent 1.0.1 with
/metricsscrape path (see Breaking Changes). - Bootstrap — Startup performance optimized.
- Identity — Siblings-aware CorpUser resolution.
- Transformers — Multi-entity domain and ownership transformers.
- Spring Boot URL Encoding —
//and other encoded URN properties now allowed in URLs.
Miscellaneous Improvements
- Ask DataHub — Duplicate messages in agent history, echo responses in Slack threads, and broken link rendering on chat responses all fixed.
- Ask DataHub Performance — Intermittent 60-second stalls on response reranking eliminated; long-running chat session stability improved.
- Data Health Search — Queries in the Assertions and Incidents tabs now rank by relevance instead of recency, so exact matches surface first and typing more text narrows results.
- Documents — Tags and linked assets now render in the document sidebar; documents can now be added as members of a Data Product.
- Dashboard Usage Statistics — Server errors on multi-day usage queries with partial user-count data resolved.
- Metadata Tests —
setDomainaction failures resolved.