v0.3.16
This contains detailed release notes. Read the highlights in this blog post.
Release Availability Date
January 19, 2024
Recommended Versions
- CLI/SDK: 1.3.1.5
- Remote Executor: v0.3.16-acryl (recommended), v0.3.15-acryl, v0.3.14.1-acryl
- On-Prem Versions:
- Helm: 1.5.173
- API Gateway: 0.6.0
- Actions: 0.0.3
Release Changelog
v0.3.16-acryl
New Features:
Ask DataHub (Public Beta) - Ask DataHub chat assistant is now in Public Beta! Available in DataHub, Slack, & Microsoft Teams. This means it's available for all customers to use. If you do NOT want Ask DataHub enabled for your instance, please reach out to your DataHub customer representative. Learn more about Ask DataHub here.
Context Documents (Public Beta) - Context Documents are now available in Public Beta! Create documents on DataHub - Runbooks, FAQs, Business Definitions, Decision Logs - that can be used to help Ask DataHub (and humans) answer a broader set of questions. Link documents to related data assets & docs, and safely publish documents when you're ready for Ask DataHub and the rest of your team to see them.
Editing Tools in Ask DataHub & MCP Server: Ask DataHub & DataHub MCP Server now include tool enabling you make changes to your data assets. Easily update descriptions, add + remove tags, terms, domains, structured properties, and owners using natural language.
AI Features: Support for Gemini on GCP - For customers hosted on Google Cloud Platform, DataHub AI features including Ask DataHub are now available for use through Gemini. Reach out to your DataHub Customer Representative to enable.
Ask DataHub for Ingestion: Streamline your ingestion setup and troubleshooting with AI-powered assistance. Get step-by-step configuration guidance, instant error interpretation, performance tuning recommendations, and advanced configuration help—all through conversational chat embedded directly in the ingestion UI.
Ingestion Configuration Redesign: Refreshed UI/UX for a more intuitive ingestion setup experience.
Observe Python SDK: Introducing complete support for managing all native assertion types, this includes column value, smart sql, and schema assertions.
On-demand assertions with parameters: Use template parameters within SQL statements of assertions, and trigger them on-demand with dynamic parameters subsituted in at runtime!
Freshness Assertion Tuning: You can now adjust your smart freshness assertions; labelling historical anomalies, adding exclusion windows, and tuning sensitivity to improve future freshness anomaly detection
Observe Query Attribution: Query tags have been added to warehouse queries issued by Observe, to aid in cost attribution. Snowflake query tags and BigQuery job labels are used for those respective platforms, and standard SQL comments are used for RedShift and Databricks.
Observe Data Consistency: Additional validation and checks added for Assertion and Monitor entities
Azure Data Factory Connector - New first-class connector for ingesting metadata from Azure Data Factory pipelines, enabling native tracking of Azure-based data orchestration workflows.
Snowflake Semantic Views Support - Added support for ingesting Snowflake Semantic Views, expanding metadata coverage for Snowflake environments and enabling better discovery of semantic layer assets.
Enhanced Oracle Connector - Oracle connector now supports materialized views, stored procedures, and usage statistics, providing comprehensive metadata extraction for Oracle databases.
Kafka Connect Intelligent Lineage - Kafka Connect connector can now automatically infer lineage from DataHub, significantly reducing manual lineage configuration for Kafka-based data pipelines.
Hive Metastore Upstream Lineage - Added lineage capabilities to the Hive Metastore connector, enabling tracking of data flow through Hive tables and improved data lineage visibility.
S3 File Sink Support - Restored ability to write ingestion output directly to S3, enabling S3-based metadata storage, backup workflows, and integration with S3-native data pipelines.
OAuth Support for Kafka Integrations - Added OAuth callback support for Kafka producers and sinks, providing enterprise-grade authentication for Kafka-based ingestion workflows.
PowerBI Amazon Athena Lineage - PowerBI connector now supports lineage extraction for Amazon Athena data sources, enabling cross-platform lineage tracking from PowerBI dashboards to AWS data.
Redshift Query Tagging - Redshift queries issued by DataHub ingestion now include query tags for improved cost attribution and query tracking.
Hive Thrift Connection with Kerberos - New Thrift connection mode for Hive Metastore with Kerberos authentication support, enabling secure connections in enterprise Hadoop environments.
Context Base Python APIs - New Python SDK APIs with comprehensive documentation for working with context documents, enabling programmatic management of DataHub context.
Tags to Structured Properties Transformer - New transformer to convert tags to structured properties, making it easier to migrate from tags to the more powerful structured properties model.
Ingestion Performance Optimization - Significant performance improvement for high-volume ingestion through compiled regex patterns in the filtering hot path, reducing CPU usage during pattern matching operations.
Improvements & Fixes:
Enhanced SQL Parsing - Multiple improvements to SQL parsing including: MSSQL statement splitting when expressions end with parentheses, support for 3+ part table names in Dremio and other sources, and consistent table-column lineage extraction in Snowflake.
Critical Compatibility Updates - Tactical fix for Confluent Kafka 2.13 breaking changes, Metabase 0.57+ compatibility with legacy-mbql parameter support, and Docker Compose 5.x compatibility for quickstart environments.
Security & Stability Fixes - Fixed SQLAlchemy password masking encoding issues, improved Fivetran REST API error handling and reporting, resolved multiple Grafana connector issues (ownership ingestion, uniqueness bugs, text panel failures), and fixed MSSQL stored procedure lineage extraction.
Connector Reliability Improvements - Fixed Redshift lineage extraction ignoring disabled flags, resolved BigQuery temp table case normalization, corrected Unity Catalog schema pattern handling, added proper HTTP status checking for CSV enricher remote file fetches, and fixed Fivetran quoted identifier handling for database/schema names.
SDK Enhancements - Added support for parametrized assertion runs in Python SDK, fixed DataJob environment defaults when using flow_urn, and improved structured properties with version field support.
Fixes:
- Snowflake - introduces support for private key auth
- Data health dashboard link sharing on the 'By Assertion' tab was broken. This is now patched.
- Slack alerts for legacy assertion failures now correctly route to the assertion.
- In some instances with corrupted legacy assertions, other non-corrupted assertions stopped showing on dataset pages due to failures of an MCP Upgrade Step. This is now resolved.
- Fix view authorization issue with home page domains module where module would fail to load if a user couldn't view one of the domains
- Fixes an error page that can occasionally happen after clicking the executor name in your ingestion list that takes you to the list of executors
- Fixed some assertions not showing up in the dataset quality tab
- Externally reported assertion run events were not triggering notifications. This is now fixed.
- Pagination of inactive metadata-tests is fixed, ensuring processing of all metadata tests in batch mode.
- Various fixes for ElasticSearch setup in system-update flow
- Fix for impact analysis partial results
- CI stability fixes
Known Issues
- TODO