Role
The role entity represents external access management roles from source systems (e.g., Snowflake, BigQuery) that control access to data assets. This entity enables DataHub to model and display which roles provide access to datasets, helping data consumers understand what permissions they need to access specific data resources.
Identity
Roles are identified by a single piece of information:
- A unique identifier for the role: This is typically derived from the external IAM system where the role is defined. The identifier should be stable and unique across your organization's data platforms.
An example of a role identifier is urn:li:role:snowflake_reader_role.
Important Capabilities
Role Properties
Role properties are stored in the roleProperties aspect and contain key information about the external access management role:
- Name: The display name of the role in the external system (e.g., "Snowflake Reader Role")
- Description: A human-readable description explaining the purpose and scope of the role
- Type: The access level this role provides (e.g., "READ", "WRITE", "ADMIN")
- Request URL: An optional link to the external system where users can request access to this role
The following code snippet shows how to create a role with properties:
Python SDK: Create a role with properties
import os
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import RolePropertiesClass
# Create the role URN
# Role URNs follow the pattern: urn:li:role:{role_id}
role_urn = "urn:li:role:snowflake_reader_role"
# Define the role properties
role_properties = RolePropertiesClass(
name="Snowflake Reader Role",
description="Provides read-only access to analytics datasets in Snowflake",
type="READ",
requestUrl="https://mycompany.okta.com/access/request/snowflake-reader",
)
# Create a metadata change proposal
mcp = MetadataChangeProposalWrapper(
entityUrn=role_urn,
aspect=role_properties,
)
# Emit the metadata change
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
emitter = DatahubRestEmitter(gms_server=gms_server, token=token)
emitter.emit(mcp)
print(f"Created role: {role_urn}")
Role Membership (Actors)
Roles can be assigned to users and groups through the actors aspect. This tracks which users (corpuser entities) and groups (corpGroup entities) have been provisioned with the role in the external system.
The actors aspect contains:
- Users: A list of corp users who have been granted this role
- Groups: A list of corp groups that have been assigned this role
The following code snippet shows how to assign users and groups to a role:
Python SDK: Assign users and groups to a role
from datahub.emitter.mce_builder import make_group_urn, make_user_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
ActorsClass,
RoleGroupClass,
RoleUserClass,
)
# Create the role URN
# Role URNs follow the pattern: urn:li:role:{role_id}
role_urn = "urn:li:role:snowflake_reader_role"
# Define the users and groups assigned to this role
actors = ActorsClass(
users=[
RoleUserClass(user=make_user_urn("john.doe")),
RoleUserClass(user=make_user_urn("jane.smith")),
],
groups=[
RoleGroupClass(group=make_group_urn("data-analysts")),
RoleGroupClass(group=make_group_urn("business-intelligence")),
],
)
# Create a metadata change proposal
mcp = MetadataChangeProposalWrapper(
entityUrn=role_urn,
aspect=actors,
)
# Emit the metadata change
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
emitter.emit(mcp)
print(f"Assigned users and groups to role: {role_urn}")
Dataset Access Integration
Roles are connected to datasets through the access aspect on dataset entities. This aspect lists which roles provide access to a specific dataset, creating a clear view of the access control landscape.
The following code snippet shows how to associate roles with a dataset:
Python SDK: Associate roles with a dataset
from datahub.emitter.mce_builder import make_dataset_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import AccessClass, RoleAssociationClass
# Create the dataset URN
dataset_urn = make_dataset_urn(
platform="snowflake", name="analytics_db.public.user_events", env="PROD"
)
# Define the roles that provide access to this dataset
# Role URNs follow the pattern: urn:li:role:{role_id}
access_aspect = AccessClass(
roles=[
RoleAssociationClass(urn="urn:li:role:snowflake_reader_role"),
RoleAssociationClass(urn="urn:li:role:snowflake_writer_role"),
RoleAssociationClass(urn="urn:li:role:snowflake_admin_role"),
]
)
# Create a metadata change proposal
mcp = MetadataChangeProposalWrapper(
entityUrn=dataset_urn,
aspect=access_aspect,
)
# Emit the metadata change
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
emitter.emit(mcp)
print(f"Associated roles with dataset: {dataset_urn}")
Querying Role Information
You can retrieve role information using the standard REST API endpoints. The response includes all aspects of the role entity.
Query a role entity via REST API
curl 'http://localhost:8080/entities/urn%3Ali%3Arole%3Asnowflake_reader_role'
This will return the complete role entity including:
roleKey: The identity aspectroleProperties: Name, description, type, and request URLactors: Users and groups assigned to the role
Python SDK: Query a role entity
import os
from datahub.emitter.rest_emitter import DatahubRestEmitter
# Create a DataHub REST emitter
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
emitter = DatahubRestEmitter(gms_server=gms_server, token=token)
# Query a role entity by URN
role_urn = "urn:li:role:snowflake_reader_role"
# Get the role entity
role_entity = emitter._session.get(
f"{emitter._gms_server}/entities/{role_urn.replace(':', '%3A').replace('(', '%28').replace(')', '%29')}"
)
if role_entity.status_code == 200:
role_data = role_entity.json()
print(f"Role URN: {role_data.get('urn')}")
# Extract role properties
if "aspects" in role_data:
aspects = role_data["aspects"]
# Role properties
if "roleProperties" in aspects:
props = aspects["roleProperties"]["value"]
print(f"Name: {props.get('name')}")
print(f"Description: {props.get('description')}")
print(f"Type: {props.get('type')}")
print(f"Request URL: {props.get('requestUrl')}")
# Actors (users and groups)
if "actors" in aspects:
actors = aspects["actors"]["value"]
if "users" in actors:
print(f"Users: {[u['user'] for u in actors['users']]}")
if "groups" in actors:
print(f"Groups: {[g['group'] for g in actors['groups']]}")
else:
print(f"Failed to retrieve role: {role_entity.status_code}")
Search for roles via REST API
curl -X POST 'http://localhost:8080/entities?action=search' \
-H 'Content-Type: application/json' \
-d '{
"entity": "role",
"input": "reader",
"start": 0,
"count": 10
}'
This searches across role names and returns matching role URNs.
Integration Points
Relationship with CorpUser and CorpGroup
Roles have direct relationships with user and group entities:
- RoleUser: Links a role to a
corpuserentity via the "Has" relationship - RoleGroup: Links a role to a
corpGroupentity via the "Has" relationship
These relationships enable:
- Viewing which users/groups have specific roles
- Discovering all roles assigned to a particular user or group
- Auditing access patterns across your data ecosystem
Distinction from DataHub Roles
It's important to distinguish between the role entity and the dataHubRole entity:
- role (this entity): Represents external access management roles from source systems (e.g., Snowflake roles, BigQuery roles) that control access to data assets in those platforms
- dataHubRole: Represents roles within DataHub itself that control permissions for DataHub features (e.g., admin role, editor role)
The roleMembership aspect on corpuser and corpGroup entities refers to dataHubRole entities, not the external role entities documented here.
Usage Patterns
Common usage patterns for the role entity include:
- Access Discovery: Data consumers can view which roles provide access to datasets they need
- Self-Service Access Requests: Users can identify the appropriate role and request access via the provided request URL
- Access Auditing: Compliance teams can track which roles provide access to sensitive datasets
- Unified Access View: Platform teams can create a centralized view of access control across multiple data platforms
GraphQL API
The role entity is exposed through DataHub's GraphQL API with the Role type. Key resolvers include:
RoleType: Provides search and batch load capabilities for role entitiesListRolesResolver: Queries all roles in the systemBatchAssignRoleResolver: Bulk assignment of roles to users/groupsAcceptRoleResolver: Workflow for accepting role assignments
Notable Exceptions
Limited Entity Support
Currently, the role entity and access management features only support dataset entities. While roles conceptually could apply to other data assets (dashboards, charts, etc.), the access aspect is currently only defined for datasets.
Future enhancements may extend role-based access management to additional entity types.
External System Integration
The role entity is designed to represent roles that exist in external systems. DataHub does not create or manage these roles directly - it only models them for discovery and documentation purposes. The actual provisioning and de-provisioning of role memberships must be performed in the source systems.
Configuration Required
The Access Management UI features are disabled by default in self-hosted deployments. To enable role visualization in the UI, set the SHOW_ACCESS_MANAGEMENT environment variable to true for the datahub-gms service.
Active Development
The role entity and access management features are under active development and subject to change. Planned enhancements include:
- Modeling external policies in addition to roles
- Automatic extraction of roles from sources like BigQuery, Snowflake, etc.
- Extended support for more entity types beyond datasets
- Advanced access request workflows with approval processes
Technical Reference
For technical details about fields, searchability, and relationships, view the Columns tab in DataHub.