Structured Property
Structured Properties are custom, typed metadata fields that can be attached to any entity type in DataHub. They enable organizations to extend the core metadata model with domain-specific attributes that support governance, compliance, and data discovery initiatives.
Identity
Structured Properties are identified by a single piece of information:
- The id or qualified name of the property: This is a unique string identifier that follows a namespace convention, typically using dot notation (e.g.,
io.acryl.privacy.retentionTimeorcompanyName.department.propertyName). The qualified name becomes part of the URN and must be globally unique across all structured properties in your DataHub instance.
An example of a structured property identifier is urn:li:structuredProperty:io.acryl.privacy.retentionTime.
Unlike other entities that may require multiple pieces of information (platform, name, environment), structured properties have a simple, flat identity model based solely on their qualified name.
Important Capabilities
Property Definition
The core metadata for a structured property is captured in the propertyDefinition aspect, which defines:
Type System
Structured properties support strongly-typed values through the valueType field, which must reference a valid dataType entity. The supported types are:
- string: Text values (plain text)
- rich_text: Rich text with formatting
- number: Numeric values (integers or decimals)
- date: Date values
- urn: References to other DataHub entities
The type is specified as a URN: urn:li:dataType:datahub.string, urn:li:dataType:datahub.number, etc.
Type Qualifiers
For URN-type properties, you can restrict which entity types are allowed using the typeQualifier field. This enables creating properties that reference only specific entity types:
{
"valueType": "urn:li:dataType:datahub.urn",
"typeQualifier": {
"allowedTypes": [
"urn:li:entityType:datahub.corpuser",
"urn:li:entityType:datahub.corpGroup"
]
}
}
Cardinality
Structured properties can accept either single or multiple values via the cardinality field:
- SINGLE: Only one value can be assigned (default)
- MULTIPLE: Multiple values can be assigned as an array
Entity Type Restrictions
The entityTypes array specifies which entity types this property can be applied to. For example, a property might only be applicable to:
- Datasets:
urn:li:entityType:datahub.dataset - Schema Fields (columns):
urn:li:entityType:datahub.schemaField - Dashboards:
urn:li:entityType:datahub.dashboard - Any combination of entity types
Validation Through Allowed Values
To enforce data quality and consistency, you can define a whitelist of acceptable values using the allowedValues array. Each allowed value can include:
- value: The actual value (string or number)
- description: An explanation of what this value represents
When allowed values are defined, the system validates that any property value assignment matches one of the allowed values.
Immutability
The immutable field (default: false) determines whether a property value can be changed once set. When true, the property value becomes permanent and cannot be modified or removed, ensuring data integrity for critical metadata.
Versioning
The version field enables breaking schema changes by allowing you to update the property definition in backwards-incompatible ways. Versions must follow the format yyyyMMddhhmmss (e.g., 20240614080000). When a new version is applied:
- Old values with the previous schema become inaccessible
- New values must conform to the updated schema
- The old values are subject to asynchronous deletion
Display Settings
The structuredPropertySettings aspect controls how properties appear in the DataHub UI:
- isHidden: When
true, the property is not visible in the UI (useful for internal metadata) - showInSearchFilters: When
true, users can filter search results by this property's values - showInAssetSummary: When
true, displays the property in the asset's sidebar - hideInAssetSummaryWhenEmpty: When
trueandshowInAssetSummaryis enabled, hides the property from the sidebar when it has no value - showAsAssetBadge: When
true, displays the property value as a badge on the asset card (only available for string/number types with allowed values) - showInColumnsTable: When
true, displays the property as a column in dataset schema tables (useful for column-level properties)
Common Aspects
Structured properties also support standard DataHub aspects:
- status: Enables soft deletion by setting
removed: true, which hides the property without deleting underlying data - institutionalMemory: Allows attaching documentation links explaining the property's purpose and usage
- deprecation: Can mark properties as deprecated while preserving historical data
Code Examples
Create a Structured Property
This example creates a structured property for tracking data retention time with numeric values and validation:
Python SDK: Create a structured property with allowed values
import os
from datahub.api.entities.structuredproperties.structuredproperties import (
AllowedValue,
StructuredProperties,
)
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
# Create a DataHub client
client = DataHubGraph(
DataHubGraphConfig(
server=os.getenv("DATAHUB_GMS_URL", "http://localhost:8080"),
token=os.getenv("DATAHUB_GMS_TOKEN"),
)
)
# Define a structured property for data retention time
retention_property = StructuredProperties(
id="io.acryl.privacy.retentionTime",
qualified_name="io.acryl.privacy.retentionTime",
display_name="Retention Time",
type="number",
description="Number of days to retain data based on privacy and compliance requirements",
entity_types=["dataset", "dataFlow"],
cardinality="SINGLE",
allowed_values=[
AllowedValue(
value=30.0,
description="30 days - for ephemeral data containing PII",
),
AllowedValue(
value=90.0,
description="90 days - for monthly reporting data with PII",
),
AllowedValue(
value=365.0,
description="365 days - for non-sensitive data",
),
],
)
# Emit the structured property to DataHub
for mcp in retention_property.generate_mcps():
client.emit(mcp)
print(f"Created structured property: {retention_property.urn}")
Create a Structured Property with Entity Type Restrictions
This example creates a property that accepts URN values but only allows references to users and groups:
Python SDK: Create a structured property with type qualifiers
from datahub.api.entities.structuredproperties.structuredproperties import (
StructuredProperties,
TypeQualifierAllowedTypes,
)
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
# Create a DataHub client
client = DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))
# Define a structured property that references DataHub entities
# This property can only reference CorpUser or CorpGroup entities
data_steward_property = StructuredProperties(
id="io.acryl.governance.dataSteward",
qualified_name="io.acryl.governance.dataSteward",
display_name="Data Steward",
type="urn",
description="The designated data steward responsible for this asset's governance and quality",
entity_types=["dataset", "dashboard", "chart", "dataJob"],
cardinality="SINGLE",
type_qualifier=TypeQualifierAllowedTypes(
allowed_types=[
"urn:li:entityType:datahub.corpuser",
"urn:li:entityType:datahub.corpGroup",
]
),
)
# Emit the structured property to DataHub
for mcp in data_steward_property.generate_mcps():
client.emit_mcp(mcp)
print(f"Created structured property: {data_steward_property.urn}")
# Example: Create a multi-value property for related datasets
related_datasets_property = StructuredProperties(
id="io.acryl.lineage.relatedDatasets",
qualified_name="io.acryl.lineage.relatedDatasets",
display_name="Related Datasets",
type="urn",
description="Other datasets that are semantically or functionally related to this asset",
entity_types=["dataset"],
cardinality="MULTIPLE",
type_qualifier=TypeQualifierAllowedTypes(
allowed_types=["urn:li:entityType:datahub.dataset"]
),
)
# Emit the second structured property
for mcp in related_datasets_property.generate_mcps():
client.emit_mcp(mcp)
print(f"Created structured property: {related_datasets_property.urn}")
Apply Structured Properties to Entities
Once structured properties are defined, you can assign them to entities:
Python SDK: Apply structured properties to a dataset
import logging
from datahub.emitter.mce_builder import make_dataset_urn
from datahub.emitter.rest_emitter import DataHubRestEmitter
from datahub.specific.dataset import DatasetPatchBuilder
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Create rest emitter
rest_emitter = DataHubRestEmitter(gms_server="http://localhost:8080")
dataset_urn = make_dataset_urn(platform="hive", name="fct_users_created", env="PROD")
for patch_mcp in (
DatasetPatchBuilder(dataset_urn)
.set_structured_property("io.acryl.dataManagement.replicationSLA", 120)
.build()
):
rest_emitter.emit(patch_mcp)
log.info(f"Added cluster_name, retention_time properties to dataset {dataset_urn}")
Patch Operations on Structured Properties
For more granular control, use patch operations to add or remove individual properties:
Python SDK: Add and remove structured properties using patches
from datahub.emitter.mce_builder import make_dataset_urn
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
from datahub.specific.dataset import DatasetPatchBuilder
# Create DataHub Client
datahub_client = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
# Create Dataset URN
dataset_urn = make_dataset_urn(platform="hive", name="fct_users_created", env="PROD")
# Create Dataset Patch to Add and Remove Structured Properties
patch_builder = DatasetPatchBuilder(dataset_urn)
patch_builder.add_structured_property(
"urn:li:structuredProperty:retentionTimeInDays", 12
)
patch_builder.remove_structured_property(
"urn:li:structuredProperty:customClassification"
)
patch_mcps = patch_builder.build()
# Emit Dataset Patch
for patch_mcp in patch_mcps:
datahub_client.emit(patch_mcp)
Query Structured Property Definitions
Retrieve structured property definitions to understand their configuration:
Python SDK: Query a structured property
from datahub.api.entities.structuredproperties.structuredproperties import (
StructuredProperties,
)
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
# Create a DataHub client
client = DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))
# List all structured properties in the system
print("Listing all structured properties:")
print("-" * 80)
for structured_property in StructuredProperties.list(client):
print(f"\nURN: {structured_property.urn}")
print(f"Display Name: {structured_property.display_name}")
print(f"Type: {structured_property.type}")
print(f"Cardinality: {structured_property.cardinality}")
print(f"Entity Types: {', '.join(structured_property.entity_types or [])}")
if structured_property.allowed_values:
print("Allowed Values:")
for av in structured_property.allowed_values:
print(f" - {av.value}: {av.description}")
# Retrieve a specific structured property by URN
print("\n" + "=" * 80)
print("Retrieving specific structured property:")
print("-" * 80)
property_urn = "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
try:
specific_property = StructuredProperties.from_datahub(client, property_urn)
print(f"\nURN: {specific_property.urn}")
print(f"Qualified Name: {specific_property.qualified_name}")
print(f"Display Name: {specific_property.display_name}")
print(f"Description: {specific_property.description}")
print(f"Type: {specific_property.type}")
print(f"Cardinality: {specific_property.cardinality}")
print(f"Immutable: {specific_property.immutable}")
print(f"Entity Types: {', '.join(specific_property.entity_types or [])}")
if specific_property.allowed_values:
print("\nAllowed Values:")
for av in specific_property.allowed_values:
print(f" - {av.value}: {av.description}")
if specific_property.type_qualifier:
print("\nType Qualifier - Allowed Entity Types:")
for entity_type in specific_property.type_qualifier.allowed_types:
print(f" - {entity_type}")
except Exception as e:
print(f"Error retrieving structured property: {e}")
# Example: List just the URNs (for scripting)
print("\n" + "=" * 80)
print("All structured property URNs:")
print("-" * 80)
for urn in StructuredProperties.list_urns(client):
print(urn)
Query via REST API
You can also retrieve structured property definitions using the REST API:
REST API: Get structured property definition
curl -X 'GET' \
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
-H 'accept: application/json'
Example Response:
{
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"propertyDefinition": {
"value": {
"qualifiedName": "io.acryl.privacy.retentionTime",
"displayName": "Retention Time",
"valueType": "urn:li:dataType:datahub.number",
"description": "Number of days to retain data",
"entityTypes": ["urn:li:entityType:datahub.dataset"],
"cardinality": "SINGLE",
"allowedValues": [
{
"value": { "double": 30 },
"description": "30 days for ephemeral data"
},
{
"value": { "double": 90 },
"description": "90 days for regular data"
}
]
}
}
}
Integration Points
How Structured Properties Relate to Other Entities
Structured properties enable flexible metadata extension across the entire DataHub ecosystem:
Applied to Core Data Assets
Structured properties can be attached to any core data asset:
- Datasets: Track data quality metrics, retention policies, or compliance classifications
- Schema Fields: Add business glossary terms, sensitivity labels, or validation rules at the column level
- Charts and Dashboards: Capture certification status, review dates, or usage guidelines
- Data Jobs and Pipelines: Record SLA requirements, criticality levels, or maintenance schedules
Applied to Organizational Entities
Properties also extend to organizational and governance entities:
- Domains: Track ownership model, maturity level, or strategic importance
- Glossary Terms: Add approval status, version numbers, or related regulations
- Data Products: Capture product lifecycle stage, target audience, or support tier
- Users and Groups: Store employee IDs, cost centers, or access review dates
Relationship with Forms
Structured properties work in conjunction with DataHub Forms to enable:
- Compliance workflows: Forms can collect structured property values to ensure policy compliance
- Bulk updates: Forms allow updating structured properties across multiple assets simultaneously
- Validation: Forms enforce property constraints during data collection
Search and Discovery
Structured properties enhance search capabilities:
- Properties marked with
showInSearchFilters: truebecome available as faceted search filters - Users can filter results by property values to find assets meeting specific criteria
- Aggregation queries can summarize property value distributions across your data landscape
GraphQL Resolvers
The GraphQL layer exposes structured properties through several resolvers:
CreateStructuredPropertyResolver: Creates new structured property definitionsUpdateStructuredPropertyResolver: Modifies existing property definitions (within allowed constraints)DeleteStructuredPropertyResolver: Removes structured property definitionsUpsertStructuredPropertiesResolver: Assigns property values to entitiesRemoveStructuredPropertiesResolver: Removes property values from entities
These resolvers are located in /datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/structuredproperties/.
Notable Exceptions
Property Definition Constraints
Once a structured property is created and in use, certain fields cannot be modified to prevent data inconsistency:
Immutable Fields:
- valueType: Cannot change the data type (e.g., from string to number)
- allowedValues: Cannot remove or modify existing allowed values (can only add new ones)
- cardinality: Cannot change between SINGLE and MULTIPLE without a version bump
Modifiable Fields:
- displayName: Can be updated for better readability
- description: Can be enhanced or clarified
- entityTypes: Can add new entity types (but not remove existing ones)
- allowedValues: Can append additional allowed values
- Display settings: All
structuredPropertySettingscan be freely modified
Breaking Changes with Versioning
To make backwards-incompatible changes (like changing cardinality or removing allowed values), you must:
- Set a new
versionvalue in formatyyyyMMddhhmmss - Apply the updated definition
- Accept that old values will become inaccessible
- Re-apply property values with the new schema
This is a destructive operation and should be carefully planned.
Soft Delete vs Hard Delete
Structured properties support two deletion modes:
Soft Delete (via status aspect with removed: true):
- Property definition remains in the system
- Property values are hidden but not deleted
- Assignment of new values is blocked
- Search filters using the property are disabled
- Fully reversible by setting
removed: false
Hard Delete (via entity deletion):
- Removes the property definition entity completely
- Triggers asynchronous removal of all property values across all entities
- Not reversible
- Elasticsearch mappings persist until reindexing occurs
Column-Level Properties
When creating properties for schema fields (columns), be aware:
- Setting
showInColumnsTable: truedisplays the property in all dataset schema views - Column-level properties automatically appear in the column sidebar
- Too many visible properties can clutter the UI, so use display settings judiciously
Search Indexing
Structured property values are indexed in Elasticsearch using a special naming convention:
- Unversioned:
structuredProperties.io_acryl_privacy_retentionTime - Versioned:
structuredProperties._versioned.io_acryl_privacy_retentionTime02.20240614080000.string
Understanding this convention is important for:
- Advanced search queries using the OpenAPI
- Debugging search issues
- Creating custom aggregations
For more information, refer to:
Technical Reference
For technical details about fields, searchability, and relationships, view the Columns tab in DataHub.