Skip to main content

Data Quality & Observability

Feature Availability
Self-Hosted DataHub
DataHub Cloud

DataHub treats observability as a first-class capability of the metadata platform. Quality signals live alongside the data assets they describe, the lineage they propagate through, and the people accountable for them — so problems are surfaced in context and routed to the right owner.

The capability area is organized around three jobs:

  • Detection — find quality issues before consumers do.
  • Resolution — react fast, communicate clearly, and coordinate a fix.
  • Governance & Improvement — codify expectations and raise the bar over time.

Detection

Catch issues as close to the source as possible.

  • Data Observability Agent — an AI assistant that scans your data landscape and provisions the right assertions for the right tables in minutes, not weeks. Tell it which slice of your data matters most (or let it figure that out from usage and ownership signals), and it creates Freshness, Volume, Field, and other checks automatically — closing coverage gaps without manual setup per table. DataHub Cloud only. Private Beta.
  • Assertions — the core data-quality test primitive in DataHub. Assertions can be active (DataHub Cloud issues queries against your warehouse on a schedule) or ingestion-driven (DataHub Cloud evaluates the assertion against profiles and operations already reported during ingestion, on any platform). Active, ingestion-driven, and anomaly-detection assertions are all DataHub Cloud features. DataHub Core can ingest and display assertion results that you self-report — from dbt, Great Expectations, Snowflake DMFs, or any custom source pushed via the SDK.
    • Freshness — has the table updated recently?
    • Volume — is row count in the expected range?
    • Column — column-level metrics and value constraints (e.g. status must be in active, pending, or closed; or null rate stays below 5%).
    • Custom SQL — arbitrary SQL returning a numeric value.
    • Schema — expected columns and types are present.
    • Anomaly Detection — DataHub Cloud auto-learns normal behavior for freshness, volume, and column metrics. DataHub Cloud only.
  • Data Health Dashboard — a single pane for the health of your data landscape, including Monitoring Rules that automatically apply anomaly-detection monitors and schema assertions across matching datasets as your landscape evolves. DataHub Cloud only.
  • SQL Profiling — dataset and column profiles produced by ingestion. Profiles power ingestion-driven Volume and Column assertions and feed the asset profile pages users already browse.

Resolution

Once an issue is detected, route and fix it.

  • Incidents — formal tracking and triage for data issues. Tied to the affected assets and visible to consumers exploring lineage. Available in DataHub Core; DataHub Cloud adds Slack and Teams notifications.
  • Subscriptions & Notifications — let users and teams subscribe to assets, assertions, incidents, and changes, with delivery via email, Slack, or Microsoft Teams. DataHub Cloud only.
  • Data Observability Agent — root-cause assistance — when an assertion fires or an incident opens, the agent uses DataHub's lineage, ownership, and metadata — together with MCP-style connectors into Snowflake, dbt, and other source systems — to investigate likely causes (recent schema or query changes, upstream failures, pipeline regressions) and propose next steps. DataHub Cloud only. Private Beta.

Governance & Improvement

Encode quality expectations as durable artifacts and track improvement over time.

  • Data Contracts — the verifiable agreement between producers and consumers about what a dataset's freshness, volume, schema, and column-level quality should look like. Assertions are the checks that prove the contract is met. Data contracts are available in DataHub Core.
  • Failure trends in the Data Health Dashboard — filter the By Assertion view by time range and result status to surface which checks are failing repeatedly, which are flaky, and which tables consistently lack coverage. Use these patterns to prioritize the structural fixes — better tests, contract changes, ingestion improvements — that move quality forward over months, not just minutes. DataHub Cloud only.

Bring your own quality signals — integrations

DataHub also captures assertion results from external quality tools you may already run, so the asset view stays unified.

  • dbt — dbt tests are ingested as assertions linked to the corresponding tables; failures show up alongside DataHub-native assertions on the asset page.
  • Great Expectations — GX expectation results are pushed into DataHub as assertions via the action handler.
  • Snowflake Data Metric Functions — author assertions in YAML and compile them to native Snowflake DMFs that run inside your warehouse; results stream back into DataHub as assertion results. Externally managed DMFs you've already created in Snowflake can also be ingested as assertions via the Snowflake source's include_externally_managed_dmfs option.

Programmatic access — APIs & SDKs

Every observability capability is scriptable. Common entry points: