Skip to main content
Version: Next

Assertion

The assertion entity represents a data quality rule that can be applied to one or more datasets. Assertions are the foundation of DataHub's data quality framework, enabling organizations to define, monitor, and enforce expectations about their data. They encompass various types of checks including field-level validation, volume monitoring, freshness tracking, schema validation, and custom SQL-based rules.

Assertions can originate from multiple sources: they can be defined natively within DataHub, ingested from external data quality tools (such as Great Expectations, dbt tests, or Snowflake Data Quality), or inferred by ML-based systems. Each assertion tracks its evaluation history over time, maintaining a complete audit trail of passes, failures, and errors.

Identity

An Assertion is uniquely identified by an assertionId, which is a globally unique identifier that remains constant across runs of the assertion. The URN format is:

urn:li:assertion:<assertionId>

The assertionId is typically a generated GUID that uniquely identifies the assertion definition. For example:

urn:li:assertion:432475190cc846f2894b5b3aa4d55af2

Generating Stable Assertion IDs

The logic for generating stable assertion IDs differs based on the source of the assertion:

  • Native Assertions: Created in DataHub Cloud's UI or API, the platform generates a UUID
  • External Assertions: Each integration tool generates IDs based on its own conventions:
    • Great Expectations: Combines expectation suite name, expectation type, and parameters
    • dbt Tests: Uses the test's unique_id from the manifest
    • Snowflake Data Quality: Uses the native DMF rule ID
  • Inferred Assertions: ML-based systems generate IDs based on the inference model and target

The key requirement is that the same assertion definition should always produce the same assertionId, enabling DataHub to track the assertion's history over time even as it's re-evaluated.

Important Capabilities

Assertion Types

DataHub supports several types of assertions, each designed to validate different aspects of data quality:

1. Field Assertions (FIELD)

Field assertions validate individual columns or fields within a dataset. They come in two sub-types:

Field Values Assertions: Validate that each value in a column meets certain criteria. For example:

  • Values must be within a specific range
  • Values must match a regex pattern
  • Values must be one of a set of allowed values
  • Values must not be null

Field Metric Assertions: Validate aggregated statistics about a column. For example:

  • Null percentage must be less than 5%
  • Unique count must equal row count (uniqueness check)
  • Mean value must be between 0 and 100
  • Standard deviation must be less than 10
Python SDK: Create a field uniqueness assertion
# Inlined from /metadata-ingestion/examples/library/assertion_create_field_uniqueness.py
# metadata-ingestion/examples/library/assertion_field_uniqueness.py
import os

import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AssertionInfoClass,
AssertionStdOperatorClass,
AssertionTypeClass,
FieldAssertionInfoClass,
FieldAssertionTypeClass,
FieldMetricAssertionClass,
FieldMetricTypeClass,
SchemaFieldSpecClass,
)

emitter = DatahubRestEmitter(
gms_server=os.getenv("DATAHUB_GMS_URL", "http://localhost:8080"),
token=os.getenv("DATAHUB_GMS_TOKEN"),
)

dataset_urn = builder.make_dataset_urn(platform="snowflake", name="mydb.myschema.users")

field_assertion_info = FieldAssertionInfoClass(
type=FieldAssertionTypeClass.FIELD_METRIC,
entity=dataset_urn,
fieldMetricAssertion=FieldMetricAssertionClass(
field=SchemaFieldSpecClass(
path="user_id",
type="VARCHAR",
nativeType="VARCHAR",
),
metric=FieldMetricTypeClass.UNIQUE_COUNT,
operator=AssertionStdOperatorClass.EQUAL_TO,
parameters=None,
),
)

assertion_info = AssertionInfoClass(
type=AssertionTypeClass.FIELD,
fieldAssertion=field_assertion_info,
description="User ID must be unique across all rows",
)

assertion_urn = builder.make_assertion_urn(
builder.datahub_guid(
{"entity": dataset_urn, "field": "user_id", "type": "uniqueness"}
)
)

assertion_info_mcp = MetadataChangeProposalWrapper(
entityUrn=assertion_urn,
aspect=assertion_info,
)

emitter.emit_mcp(assertion_info_mcp)
print(f"Created field uniqueness assertion: {assertion_urn}")

2. Volume Assertions (VOLUME)

Volume assertions monitor the amount of data in a dataset. They support several sub-types:

  • ROW_COUNT_TOTAL: Total number of rows must meet expectations
  • ROW_COUNT_CHANGE: Change in row count over time must meet expectations
  • INCREMENTING_SEGMENT_ROW_COUNT_TOTAL: Latest partition/segment row count
  • INCREMENTING_SEGMENT_ROW_COUNT_CHANGE: Change between consecutive partitions

Volume assertions are critical for detecting data pipeline failures, incomplete loads, or unexpected data growth.

Python SDK: Create a row count volume assertion
# Inlined from /metadata-ingestion/examples/library/assertion_create_volume_rows.py
# metadata-ingestion/examples/library/assertion_volume_rows.py
import os

import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AssertionInfoClass,
AssertionStdOperatorClass,
AssertionStdParameterClass,
AssertionStdParametersClass,
AssertionStdParameterTypeClass,
AssertionTypeClass,
RowCountTotalClass,
VolumeAssertionInfoClass,
VolumeAssertionTypeClass,
)

emitter = DatahubRestEmitter(
gms_server=os.getenv("DATAHUB_GMS_URL", "http://localhost:8080"),
token=os.getenv("DATAHUB_GMS_TOKEN"),
)

dataset_urn = builder.make_dataset_urn(
platform="bigquery", name="project.dataset.orders"
)

volume_assertion_info = VolumeAssertionInfoClass(
type=VolumeAssertionTypeClass.ROW_COUNT_TOTAL,
entity=dataset_urn,
rowCountTotal=RowCountTotalClass(
operator=AssertionStdOperatorClass.BETWEEN,
parameters=AssertionStdParametersClass(
minValue=AssertionStdParameterClass(
type=AssertionStdParameterTypeClass.NUMBER,
value="1000",
),
maxValue=AssertionStdParameterClass(
type=AssertionStdParameterTypeClass.NUMBER,
value="1000000",
),
),
),
)

assertion_info = AssertionInfoClass(
type=AssertionTypeClass.VOLUME,
volumeAssertion=volume_assertion_info,
description="Orders table must contain between 1,000 and 1,000,000 rows",
)

assertion_urn = builder.make_assertion_urn(
builder.datahub_guid({"entity": dataset_urn, "type": "row-count-range"})
)

assertion_info_mcp = MetadataChangeProposalWrapper(
entityUrn=assertion_urn,
aspect=assertion_info,
)

emitter.emit_mcp(assertion_info_mcp)
print(f"Created volume assertion: {assertion_urn}")

3. Freshness Assertions (FRESHNESS)

Freshness assertions ensure data is updated within expected time windows. Two types are supported:

  • DATASET_CHANGE: Based on dataset change operations (insert, update, delete) captured from audit logs
  • DATA_JOB_RUN: Based on successful execution of a data job

Freshness assertions define a schedule that specifies when updates should occur (e.g., daily by 9 AM, every 4 hours) and what tolerance is acceptable.

Python SDK: Create a dataset change freshness assertion
# Inlined from /metadata-ingestion/examples/library/assertion_create_freshness.py
# metadata-ingestion/examples/library/assertion_freshness.py
import os

import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AssertionInfoClass,
AssertionTypeClass,
FreshnessAssertionInfoClass,
FreshnessAssertionScheduleClass,
FreshnessAssertionScheduleTypeClass,
FreshnessAssertionTypeClass,
FreshnessCronScheduleClass,
)

emitter = DatahubRestEmitter(
gms_server=os.getenv("DATAHUB_GMS_URL", "http://localhost:8080"),
token=os.getenv("DATAHUB_GMS_TOKEN"),
)

dataset_urn = builder.make_dataset_urn(
platform="redshift", name="prod.analytics.daily_metrics"
)

freshness_assertion_info = FreshnessAssertionInfoClass(
type=FreshnessAssertionTypeClass.DATASET_CHANGE,
entity=dataset_urn,
schedule=FreshnessAssertionScheduleClass(
type=FreshnessAssertionScheduleTypeClass.CRON,
cron=FreshnessCronScheduleClass(
cron="0 9 * * *",
timezone="America/Los_Angeles",
windowStartOffsetMs=None,
),
),
)

assertion_info = AssertionInfoClass(
type=AssertionTypeClass.FRESHNESS,
freshnessAssertion=freshness_assertion_info,
description="Daily metrics table must be updated every day by 9 AM Pacific Time",
)

assertion_urn = builder.make_assertion_urn(
builder.datahub_guid({"entity": dataset_urn, "type": "freshness-daily-9am"})
)

assertion_info_mcp = MetadataChangeProposalWrapper(
entityUrn=assertion_urn,
aspect=assertion_info,
)

emitter.emit_mcp(assertion_info_mcp)
print(f"Created freshness assertion: {assertion_urn}")

4. Schema Assertions (DATA_SCHEMA)

Schema assertions validate that a dataset's structure matches expectations. They verify:

  • Presence or absence of specific columns
  • Column data types
  • Column ordering (optional)
  • Schema compatibility modes:
    • EXACT_MATCH: Schema must match exactly
    • SUPERSET: Actual schema can have additional columns
    • SUBSET: Actual schema can have fewer columns

Schema assertions are valuable for detecting breaking changes in upstream data sources.

Python SDK: Create a schema assertion
# Inlined from /metadata-ingestion/examples/library/assertion_create_schema.py
# metadata-ingestion/examples/library/assertion_schema.py
import os
import time

import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AssertionInfoClass,
AssertionTypeClass,
AuditStampClass,
NumberTypeClass,
SchemaAssertionCompatibilityClass,
SchemaAssertionInfoClass,
SchemaFieldClass,
SchemaFieldDataTypeClass,
SchemalessClass,
SchemaMetadataClass,
StringTypeClass,
)

emitter = DatahubRestEmitter(
gms_server=os.getenv("DATAHUB_GMS_URL", "http://localhost:8080"),
token=os.getenv("DATAHUB_GMS_TOKEN"),
)

dataset_urn = builder.make_dataset_urn(platform="kafka", name="prod.user_events")

current_timestamp = int(time.time() * 1000)
audit_stamp = AuditStampClass(
time=current_timestamp,
actor="urn:li:corpuser:datahub",
)

expected_schema = SchemaMetadataClass(
schemaName="user_events",
platform=builder.make_data_platform_urn("kafka"),
version=0,
created=audit_stamp,
lastModified=audit_stamp,
fields=[
SchemaFieldClass(
fieldPath="user_id",
type=SchemaFieldDataTypeClass(type=StringTypeClass()),
nativeDataType="string",
lastModified=audit_stamp,
),
SchemaFieldClass(
fieldPath="event_type",
type=SchemaFieldDataTypeClass(type=StringTypeClass()),
nativeDataType="string",
lastModified=audit_stamp,
),
SchemaFieldClass(
fieldPath="timestamp",
type=SchemaFieldDataTypeClass(type=NumberTypeClass()),
nativeDataType="long",
lastModified=audit_stamp,
),
SchemaFieldClass(
fieldPath="properties",
type=SchemaFieldDataTypeClass(type=StringTypeClass()),
nativeDataType="string",
lastModified=audit_stamp,
),
],
hash="",
platformSchema=SchemalessClass(),
)

schema_assertion_info = SchemaAssertionInfoClass(
entity=dataset_urn,
schema=expected_schema,
compatibility=SchemaAssertionCompatibilityClass.SUPERSET,
)

assertion_info = AssertionInfoClass(
type=AssertionTypeClass.DATA_SCHEMA,
schemaAssertion=schema_assertion_info,
description="User events stream must have required schema fields (can include additional fields)",
)

assertion_urn = builder.make_assertion_urn(
builder.datahub_guid({"entity": dataset_urn, "type": "schema-check"})
)

assertion_info_mcp = MetadataChangeProposalWrapper(
entityUrn=assertion_urn,
aspect=assertion_info,
)

emitter.emit_mcp(assertion_info_mcp)
print(f"Created schema assertion: {assertion_urn}")

5. SQL Assertions (SQL)

SQL assertions allow custom validation logic using arbitrary SQL queries. Two types:

  • METRIC: Execute SQL and assert the returned metric meets expectations
  • METRIC_CHANGE: Assert the change in a SQL metric over time

SQL assertions provide maximum flexibility for complex validation scenarios that don't fit other assertion types, such as cross-table referential integrity checks or business rule validation.

Python SDK: Create a SQL metric assertion
# Inlined from /metadata-ingestion/examples/library/assertion_create_sql_metric.py
# metadata-ingestion/examples/library/assertion_sql_metric.py
import os

import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
AssertionInfoClass,
AssertionStdOperatorClass,
AssertionStdParameterClass,
AssertionStdParametersClass,
AssertionStdParameterTypeClass,
AssertionTypeClass,
SqlAssertionInfoClass,
SqlAssertionTypeClass,
)

emitter = DatahubRestEmitter(
gms_server=os.getenv("DATAHUB_GMS_URL", "http://localhost:8080"),
token=os.getenv("DATAHUB_GMS_TOKEN"),
)

dataset_urn = builder.make_dataset_urn(platform="postgres", name="public.transactions")

sql_assertion_info = SqlAssertionInfoClass(
type=SqlAssertionTypeClass.METRIC,
entity=dataset_urn,
statement="SELECT SUM(amount) FROM public.transactions WHERE status = 'completed' AND date = CURRENT_DATE",
operator=AssertionStdOperatorClass.GREATER_THAN_OR_EQUAL_TO,
parameters=AssertionStdParametersClass(
value=AssertionStdParameterClass(
type=AssertionStdParameterTypeClass.NUMBER,
value="0",
)
),
)

assertion_info = AssertionInfoClass(
type=AssertionTypeClass.SQL,
sqlAssertion=sql_assertion_info,
description="Total completed transaction amount today must be non-negative",
)

assertion_urn = builder.make_assertion_urn(
builder.datahub_guid(
{"entity": dataset_urn, "type": "sql-completed-transactions-sum"}
)
)

assertion_info_mcp = MetadataChangeProposalWrapper(
entityUrn=assertion_urn,
aspect=assertion_info,
)

emitter.emit_mcp(assertion_info_mcp)
print(f"Created SQL assertion: {assertion_urn}")

6. Custom Assertions (CUSTOM)

Custom assertions provide an extension point for assertion types not directly modeled in DataHub. They're useful when:

  • Integrating third-party data quality tools with unique assertion types
  • Starting integration before fully mapping to DataHub's type system
  • Implementing organization-specific validation logic

Assertion Source

The assertionInfo aspect includes an AssertionSource that identifies the origin of the assertion:

  • NATIVE: Defined directly in DataHub (DataHub Cloud feature)
  • EXTERNAL: Ingested from external tools (Great Expectations, dbt, Snowflake, etc.)
  • INFERRED: Generated by ML-based inference systems (DataHub Cloud feature)

External assertions should have a corresponding dataPlatformInstance aspect that identifies the specific platform instance they originated from.

Assertion Run Events

Assertion evaluations are tracked using the assertionRunEvent timeseries aspect. Each evaluation creates a new event with:

  • timestampMillis: When the evaluation occurred
  • runId: Platform-specific identifier for this evaluation run
  • asserteeUrn: The entity being asserted (typically a dataset)
  • assertionUrn: The assertion being evaluated
  • status: COMPLETE, RUNNING, or ERROR
  • result: SUCCESS, FAILURE, or ERROR with details
  • batchSpec: Optional information about the data batch evaluated
  • runtimeContext: Optional key-value pairs with runtime parameters

Run events enable tracking assertion health over time, identifying trends, and debugging failures.

Assertion Actions

The assertionActions aspect defines automated responses to assertion outcomes:

  • onSuccess: Actions triggered when assertion passes
  • onFailure: Actions triggered when assertion fails

Common actions include:

  • Sending notifications (email, Slack, PagerDuty)
  • Creating incidents
  • Triggering downstream workflows
  • Updating metadata

Tags and Metadata

Like other DataHub entities, assertions support standard metadata capabilities:

  • globalTags: Categorize and organize assertions
  • glossaryTerms: Link assertions to business concepts
  • status: Mark assertions as active or deprecated
Python SDK: Add tags to an assertion
# Inlined from /metadata-ingestion/examples/library/assertion_add_tag.py
# metadata-ingestion/examples/library/assertion_add_tags.py
import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
from datahub.metadata.schema_classes import (
GlobalTagsClass,
TagAssociationClass,
)

graph = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
emitter = DatahubRestEmitter("http://localhost:8080")

assertion_urn = "urn:li:assertion:432475190cc846f2894b5b3aa4d55af2"

existing_tags = graph.get_aspect(
entity_urn=assertion_urn,
aspect_type=GlobalTagsClass,
)

if existing_tags is None:
existing_tags = GlobalTagsClass(tags=[])

tag_to_add = builder.make_tag_urn("data-quality")

tag_association = TagAssociationClass(tag=tag_to_add)

if tag_association not in existing_tags.tags:
existing_tags.tags.append(tag_association)

tags_mcp = MetadataChangeProposalWrapper(
entityUrn=assertion_urn,
aspect=existing_tags,
)

emitter.emit_mcp(tags_mcp)
print(f"Added tag '{tag_to_add}' to assertion {assertion_urn}")
else:
print(f"Tag '{tag_to_add}' already exists on assertion {assertion_urn}")

Standard Operators and Parameters

Assertions use a standard set of operators for comparisons:

Numeric: BETWEEN, LESS_THAN, LESS_THAN_OR_EQUAL_TO, GREATER_THAN, GREATER_THAN_OR_EQUAL_TO, EQUAL_TO, NOT_EQUAL_TO

String: CONTAIN, START_WITH, END_WITH, REGEX_MATCH, IN, NOT_IN

Boolean: IS_TRUE, IS_FALSE, NULL, NOT_NULL

Native: _NATIVE_ for platform-specific operators

Parameters are provided via AssertionStdParameters:

  • value: Single value for most operators
  • minValue, maxValue: Range endpoints for BETWEEN
  • Parameter types: NUMBER, STRING, SET

Standard Aggregations

Field and volume assertions can apply aggregation functions before evaluation:

Statistical: MEAN, MEDIAN, STDDEV, MIN, MAX, SUM

Count-based: ROW_COUNT, COLUMN_COUNT, UNIQUE_COUNT, NULL_COUNT

Proportional: UNIQUE_PROPORTION, NULL_PROPORTION

Identity: IDENTITY (no aggregation), COLUMNS (all columns)

Integration Points

Relationship to Datasets

Assertions have a strong relationship with datasets through the Asserts relationship:

  • Field assertions target specific dataset columns
  • Volume assertions monitor dataset row counts
  • Freshness assertions track dataset update times
  • Schema assertions validate dataset structure
  • SQL assertions query dataset contents

Datasets maintain a reverse relationship, showing all assertions that validate them. This enables users to understand the quality checks applied to any dataset.

Relationship to Data Jobs

Freshness assertions can target data jobs (pipelines) to ensure they execute on schedule. When a FreshnessAssertionInfo has type=DATA_JOB_RUN, the entity field references a dataJob URN rather than a dataset.

Relationship to Data Platforms

External assertions maintain a relationship to their source platform through the dataPlatformInstance aspect. This enables:

  • Filtering assertions by source tool
  • Deep-linking back to the source platform
  • Understanding the assertion's external context

GraphQL API

Assertions are fully accessible via DataHub's GraphQL API:

  • Query assertions and their run history
  • Create and update native assertions
  • Delete assertions
  • Retrieve assertions for a specific dataset

Key GraphQL types:

  • Assertion: The main assertion entity
  • AssertionInfo: Assertion definition and type
  • AssertionRunEvent: Evaluation results
  • AssertionSource: Origin metadata

Integration with dbt

DataHub's dbt integration automatically converts dbt tests into assertions:

  • Schema Tests: Mapped to field assertions (not_null, unique, accepted_values, relationships)
  • Data Tests: Mapped to SQL assertions
  • Test Metadata: Test severity, tags, and descriptions are preserved

Integration with Great Expectations

The Great Expectations integration maps expectations to assertion types:

  • Column expectations → Field assertions
  • Table expectations → Volume or schema assertions
  • Custom expectations → Custom assertions

Each expectation suite becomes a collection of assertions in DataHub.

Integration with Snowflake Data Quality

Snowflake DMF (Data Metric Functions) rules are ingested as assertions:

  • Row count rules → Volume assertions
  • Uniqueness rules → Field metric assertions
  • Freshness rules → Freshness assertions
  • Custom metric rules → SQL assertions

Notable Exceptions

Legacy Dataset Assertion Type

The DATASET assertion type is a legacy format that predates the more specific field, volume, freshness, and schema assertion types. It uses DatasetAssertionInfo with a generic structure. New integrations should use the more specific assertion types (FIELD, VOLUME, FRESHNESS, DATA_SCHEMA, SQL) as they provide better type safety and UI rendering.

Assertion Results vs. Assertion Metrics

While assertions track pass/fail status, DataHub also supports more detailed metrics through the AssertionResult object:

  • actualAggValue: The actual value observed (for numeric assertions)
  • externalUrl: Link to detailed results in the source system
  • nativeResults: Platform-specific result details

This enables richer debugging and understanding of why assertions fail.

Assertion Scheduling

DataHub tracks when assertions run through assertionRunEvent timeseries data, but does not directly schedule assertion evaluations. Scheduling is handled by:

  • Native Assertions: DataHub Cloud's built-in scheduler
  • External Assertions: The source platform's scheduler (dbt, Airflow, etc.)
  • On-Demand: Manual or API-triggered evaluations

DataHub provides monitoring and alerting based on the assertion run events, regardless of the scheduling mechanism.

Assertion vs. Test Results

DataHub has two related concepts:

  • Assertions: First-class entities that define data quality rules
  • Test Results: A simpler aspect that can be attached to datasets

Test results are lightweight pass/fail indicators without the full expressiveness of assertions. Use assertions for production data quality monitoring and test results for simple ingestion-time validation.

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

  • ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
  • Searchable: This field is indexed and can be searched in DataHub's search interface
  • Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
  • → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

assertionInfo

Information about an assertion

FieldTypeRequiredDescriptionAnnotations
customPropertiesmapCustom property bag.Searchable
externalUrlstringURL where the reference existSearchable
typeAssertionTypeType of assertion. Assertion types can evolve to span Datasets, Flows (Pipelines), Models, Featur...Searchable
datasetAssertionDatasetAssertionInfoA Dataset Assertion definition. This field is populated when the type is DATASET.
freshnessAssertionFreshnessAssertionInfoAn Freshness Assertion definition. This field is populated when the type is FRESHNESS.
volumeAssertionVolumeAssertionInfoAn Volume Assertion definition. This field is populated when the type is VOLUME.
sqlAssertionSqlAssertionInfoA SQL Assertion definition. This field is populated when the type is SQL.
fieldAssertionFieldAssertionInfoA Field Assertion definition. This field is populated when the type is FIELD.
schemaAssertionSchemaAssertionInfoAn schema Assertion definition. This field is populated when the type is DATA_SCHEMA
customAssertionCustomAssertionInfoA Custom Assertion definition. This field is populated when type is CUSTOM.
sourceAssertionSourceThe source or origin of the Assertion definition. If the source type of the Assertion is EXTERNA...
lastUpdatedAuditStampThe time at which the assertion was last updated and the actor who updated it. This field is only...
descriptionstringAn optional human-readable description of the assertion

dataPlatformInstance

The specific instance of the data platform that this entity belongs to

FieldTypeRequiredDescriptionAnnotations
platformstringData PlatformSearchable
instancestringInstance of the data platform (e.g. db instance)Searchable (platformInstance)

assertionActions

The Actions about an Assertion

FieldTypeRequiredDescriptionAnnotations
onSuccessAssertionAction[]Actions to be executed on successful assertion run.
onFailureAssertionAction[]Actions to be executed on failed assertion run.

status

The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc. This aspect is used to represent soft deletes conventionally.

FieldTypeRequiredDescriptionAnnotations
removedbooleanWhether the entity has been removed (soft-deleted).Searchable

globalTags

Tag aspect used for applying tags to an entity

FieldTypeRequiredDescriptionAnnotations
tagsTagAssociation[]Tags associated with a given entitySearchable, → TaggedWith

assertionRunEvent (Timeseries)

An event representing the current status of evaluating an assertion on a batch. AssertionRunEvent should be used for reporting the status of a run as an assertion evaluation progresses.

FieldTypeRequiredDescriptionAnnotations
timestampMillislongThe event timestamp field as epoch at UTC in milli seconds.Searchable (lastCompletedTime)
runIdstringNative (platform-specific) identifier for this run
asserteeUrnstring
statusAssertionRunStatusThe status of the assertion run as per this timeseries event.
resultAssertionResultResults of assertion, present if the status is COMPLETE
runtimeContextmapRuntime parameters of evaluation
batchSpecBatchSpecSpecification of the batch which this run is evaluating
assertionUrnstring
eventGranularityTimeWindowSizeGranularity of the event if applicable
partitionSpecPartitionSpecThe optional partition specification.
messageIdstringThe optional messageId, if provided serves as a custom user-defined unique identifier for an aspe...

Common Types

These types are used across multiple aspects in this entity.

AssertionAction

The Actions about an Assertion. In the future, we'll likely extend this model to support additional parameters or options related to the assertion actions.

Fields:

  • type (AssertionActionType): The type of the Action

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

  • time (long): When did the resource/association/sub-resource move into the specific lifecyc...
  • actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
  • impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
  • message (string?): Additional context around how DataHub was informed of the particular change. ...

Relationships

Outgoing

These are the relationships stored in this entity's aspects

  • Asserts

    • Dataset via assertionInfo.datasetAssertion.dataset
    • SchemaField via assertionInfo.datasetAssertion.fields
    • Dataset via assertionInfo.freshnessAssertion.entity
    • DataJob via assertionInfo.freshnessAssertion.entity
    • Dataset via assertionInfo.volumeAssertion.entity
    • Dataset via assertionInfo.sqlAssertion.entity
    • Dataset via assertionInfo.fieldAssertion.entity
    • Dataset via assertionInfo.schemaAssertion.entity
    • DataJob via assertionInfo.schemaAssertion.entity
    • Dataset via assertionInfo.customAssertion.entity
    • SchemaField via assertionInfo.customAssertion.field
  • SchemaFieldTaggedWith

    • Tag via assertionInfo.schemaAssertion.schema.fields.globalTags
  • TaggedWith

    • Tag via assertionInfo.schemaAssertion.schema.fields.globalTags.tags
    • Tag via globalTags.tags
  • SchemaFieldWithGlossaryTerm

    • GlossaryTerm via assertionInfo.schemaAssertion.schema.fields.glossaryTerms
  • TermedWith

    • GlossaryTerm via assertionInfo.schemaAssertion.schema.fields.glossaryTerms.terms.urn
  • ForeignKeyTo

    • SchemaField via assertionInfo.schemaAssertion.schema.foreignKeys.foreignFields
  • ForeignKeyToDataset

    • Dataset via assertionInfo.schemaAssertion.schema.foreignKeys.foreignDataset

Global Metadata Model

Global Graph