CorpGroup
The corpGroup entity represents organizational groups, teams, or departments within an enterprise. These groups can be synchronized from external identity providers like LDAP, Active Directory, or SAML/SSO systems, or created natively within DataHub. CorpGroups are essential for managing access control, ownership assignments, and organizational metadata in DataHub.
Identity
CorpGroups are uniquely identified by a single string field: the group name.
The URN structure for a corpGroup is:
urn:li:corpGroup:<encoded-group-name>
The <encoded-group-name> is a URL-encoded version of the group name that serves as a globally unique identifier within DataHub. The encoding is handled automatically by the SDK.
Examples
Here are some typical URN patterns for different group naming conventions:
urn:li:corpGroup:eng-team
urn:li:corpGroup:data-platform
urn:li:corpGroup:cn%3Dadmins%2Cou%3Dgroups%2Cdc%3Dexample%2Cdc%3Dcom # LDAP DN
urn:li:corpGroup:S-1-5-21-123456789-123456789-123456789-1234 # Active Directory SID
urn:li:corpGroup:marketing-team
The name field is searchable and supports autocomplete, making it easy to find groups across DataHub.
Important Capabilities
Group Information
Group information is stored in two aspects:
- corpGroupInfo: Contains metadata typically synchronized from external identity providers (LDAP, AD, SSO)
- corpGroupEditableInfo: Contains metadata that can be edited through the DataHub UI
CorpGroupInfo
This aspect stores the source-of-truth information from external systems:
- displayName: The human-readable name of the group
- email: Contact email for the group
- description: A description of the group's purpose and scope
- slack: Slack channel associated with the group
- created: Timestamp of when the group was created
Note: The admins, members, and groups fields in corpGroupInfo are deprecated and maintained only for backwards compatibility. Group membership is now managed through the GroupMembership aspect.
CorpGroupEditableInfo
This aspect stores information that can be edited in the DataHub UI:
- description: An editable description of the group
- pictureLink: URL to a profile picture for the group
- slack: Slack channel for the group
- email: Contact email for the group
When both aspects contain the same field (like description), the UI typically prioritizes the editable version for display.
Group Membership
Group membership is managed through the groupMembership aspect, which is attached to corpUser entities (not the group itself). This design allows for efficient queries of which groups a user belongs to.
To add a user to a group, you update the groupMembership aspect on the user entity to include the group's URN.
Origin Tracking
The origin aspect tracks where a group originated from:
- NATIVE: The group was created directly in DataHub
- EXTERNAL: The group was synchronized from an external identity provider
For external groups, the externalType field can specify the source system (e.g., "LDAP", "AzureAD", "Okta").
Ownership
Groups can have owners assigned through the standard ownership aspect. Owners are typically administrators or managers responsible for the group. Ownership types include TECHNICAL_OWNER, BUSINESS_OWNER, and others.
Tags and Properties
Like other entities in DataHub, groups support:
- globalTags: Tagging groups for organization and discovery
- structuredProperties: Custom metadata properties defined by your organization
- forms: Metadata forms for structured data collection
Code Examples
Create a CorpGroup
Python SDK: Create a group and emit to DataHub
# metadata-ingestion/examples/library/corpgroup_create.py
import os
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
CorpGroupInfoClass,
OriginClass,
OriginTypeClass,
StatusClass,
)
from datahub.utilities.urns.corp_group_urn import CorpGroupUrn
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
emitter = DatahubRestEmitter(gms_server=gms_server, token=token)
group_urn = CorpGroupUrn("data-engineering")
group_info = CorpGroupInfoClass(
displayName="Data Engineering",
description="The data engineering team builds and maintains data pipelines and infrastructure",
email="data-eng@example.com",
slack="data-engineering",
admins=[],
members=[],
groups=[],
)
metadata_event = MetadataChangeProposalWrapper(
entityUrn=str(group_urn),
aspect=group_info,
)
emitter.emit(metadata_event)
status_aspect = StatusClass(removed=False)
metadata_event = MetadataChangeProposalWrapper(
entityUrn=str(group_urn),
aspect=status_aspect,
)
emitter.emit(metadata_event)
origin_aspect = OriginClass(type=OriginTypeClass.NATIVE)
metadata_event = MetadataChangeProposalWrapper(
entityUrn=str(group_urn),
aspect=origin_aspect,
)
emitter.emit(metadata_event)
print(f"Created group: {group_urn}")
Add Members to a Group
Python SDK: Add members to an existing group
# metadata-ingestion/examples/library/corpgroup_add_members.py
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
from datahub.metadata.schema_classes import GroupMembershipClass
from datahub.metadata.urns import CorpGroupUrn, CorpUserUrn
graph = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
group_urn = str(CorpGroupUrn("data-engineering"))
users_to_add = [
CorpUserUrn("jdoe"),
CorpUserUrn("asmith"),
CorpUserUrn("bwilliams"),
]
for user_urn in users_to_add:
user_urn_str = str(user_urn)
current_membership = graph.get_aspect(user_urn_str, GroupMembershipClass)
if current_membership is None:
current_membership = GroupMembershipClass(groups=[])
if group_urn not in current_membership.groups:
current_membership.groups.append(group_urn)
metadata_event = MetadataChangeProposalWrapper(
entityUrn=user_urn_str,
aspect=current_membership,
)
emitter.emit(metadata_event)
print(f"Added {user_urn_str} to group {group_urn}")
else:
print(f"{user_urn_str} is already a member of {group_urn}")
Update Group Information
Python SDK: Update group description and metadata
# metadata-ingestion/examples/library/corpgroup_update_info.py
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
from datahub.metadata.schema_classes import (
CorpGroupEditableInfoClass,
OwnerClass,
OwnershipClass,
OwnershipTypeClass,
)
from datahub.metadata.urns import CorpGroupUrn, CorpUserUrn
graph = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
group_urn = str(CorpGroupUrn("data-engineering"))
editable_info = CorpGroupEditableInfoClass(
description="Updated description: The data engineering team builds and maintains data pipelines, infrastructure, and ensures data quality across the organization",
pictureLink="https://example.com/images/data-engineering-logo.png",
slack="data-engineering",
email="data-eng@example.com",
)
metadata_event = MetadataChangeProposalWrapper(
entityUrn=group_urn,
aspect=editable_info,
)
emitter.emit(metadata_event)
print(f"Updated editable info for group: {group_urn}")
admin_urn = str(CorpUserUrn("jdoe"))
ownership = OwnershipClass(
owners=[
OwnerClass(
owner=admin_urn,
type=OwnershipTypeClass.TECHNICAL_OWNER,
)
]
)
metadata_event = MetadataChangeProposalWrapper(
entityUrn=group_urn,
aspect=ownership,
)
emitter.emit(metadata_event)
print(f"Added {admin_urn} as owner of group: {group_urn}")
Query Groups via REST API
Fetch a group entity and its membership
To retrieve a group entity with all its aspects:
curl 'http://localhost:8080/entities/urn%3Ali%3AcorpGroup%3Aeng-team'
To find all users who are members of a specific group:
curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3AcorpGroup%3Aeng-team&types=IsMemberOfGroup'
The response will include all corpUser entities that have the group in their groupMembership aspect:
{
"start": 0,
"count": 3,
"relationships": [
{
"type": "IsMemberOfGroup",
"entity": "urn:li:corpuser:jdoe"
},
{
"type": "IsMemberOfGroup",
"entity": "urn:li:corpuser:asmith"
},
{
"type": "IsMemberOfGroup",
"entity": "urn:li:corpuser:bwilliams"
}
],
"total": 3
}
Integration Points
User Management
CorpGroups are tightly integrated with corpUser entities through the groupMembership aspect. When a user is added to a group, their groupMembership aspect is updated to include the group's URN, establishing a bidirectional relationship.
Ownership Relationships
Groups can be assigned as owners of any DataHub entity (datasets, dashboards, charts, etc.) through the ownership aspect. This allows team-based ownership where all group members are considered owners.
Example ownership assignment:
# A dataset can have a group as an owner
dataset.add_owner(CorpGroupUrn("data-engineering"))
Access Control
While not directly stored in the corpGroup aspects, groups are a fundamental component of DataHub's RBAC (Role-Based Access Control) system. Groups can be:
- Assigned roles through the roleMembership aspect
- Referenced in DataHub policies for fine-grained access control
- Used to manage permissions for metadata operations
External Identity Provider Integration
DataHub provides ingestion connectors for syncing groups from external systems:
LDAP Integration
The LDAP source connector can extract groups and their memberships:
source:
type: ldap
config:
ldap_server: "ldap://ldap.example.com"
ldap_user: "cn=admin,dc=example,dc=com"
ldap_password: "${LDAP_PASSWORD}"
base_dn: "ou=groups,dc=example,dc=com"
filter: "(objectClass=groupOfNames)"
Groups extracted from LDAP will have:
- The
originaspect set to EXTERNAL with externalType="LDAP" - Display names and descriptions from LDAP attributes
- Group membership automatically synchronized
Azure AD Integration
The Azure AD source connector syncs groups from Microsoft Azure Active Directory:
source:
type: azure-ad
config:
client_id: "${AZURE_CLIENT_ID}"
tenant_id: "${AZURE_TENANT_ID}"
client_secret: "${AZURE_CLIENT_SECRET}"
ingest_users: true
ingest_groups: true
Azure AD groups will have:
- The
originaspect set to EXTERNAL with externalType="AzureAD" - Microsoft 365 group information (if applicable)
- Nested group memberships flattened
GraphQL API
The corpGroup entity is fully supported in DataHub's GraphQL API. Common queries include:
query GetGroup {
corpGroup(urn: "urn:li:corpGroup:eng-team") {
urn
name
properties {
displayName
description
email
}
ownership {
owners {
owner {
... on CorpUser {
urn
username
}
}
}
}
}
}
Notable Exceptions
Deprecated Membership Fields
The members, admins, and groups fields in the corpGroupInfo aspect are deprecated. These fields were originally used to store group membership directly on the group entity, but this approach had scalability and consistency issues.
Current best practice is to:
- Use the groupMembership aspect on corpUser entities to track which groups a user belongs to
- Use the ownership aspect to designate group administrators
- Query relationships via the REST API to find group members
Native vs External Groups
Groups can be created in two ways:
- Native Groups: Created directly in DataHub through the UI or API, with origin type NATIVE
- External Groups: Synchronized from identity providers, with origin type EXTERNAL
External groups are typically treated as read-only in DataHub to prevent conflicts with the source system. Updates should be made in the source system (LDAP, Azure AD, etc.) and re-synchronized to DataHub.
Group Name Encoding
Group names are URL-encoded in URNs to handle special characters commonly found in LDAP DNs and Active Directory paths. When using the SDK, encoding is handled automatically. However, when constructing URNs manually or in API requests, ensure proper URL encoding:
# Correct - SDK handles encoding
CorpGroupUrn("cn=admins,ou=groups,dc=example,dc=com")
# Result: urn:li:corpGroup:cn%3Dadmins%2Cou%3Dgroups%2Cdc%3Dexample%2Cdc%3Dcom
# Incorrect - manual construction without encoding
"urn:li:corpGroup:cn=admins,ou=groups,dc=example,dc=com" # Will fail
Group Hierarchy
While the corpGroupInfo aspect includes a deprecated groups field for nested groups, DataHub does not currently have first-class support for group hierarchies. Group membership is flat - a user is either a member of a group or not. If you need hierarchical group structures, consider:
- Flattening the hierarchy during ingestion (e.g., if a user is in a child group, add them to all parent groups)
- Using naming conventions to indicate hierarchy (e.g., "engineering", "engineering-platform", "engineering-platform-data")
- Using domains or tags to represent organizational structure
Technical Reference
For technical details about fields, searchability, and relationships, view the Columns tab in DataHub.