Skip to main content
Version: Next

CorpGroup

The corpGroup entity represents organizational groups, teams, or departments within an enterprise. These groups can be synchronized from external identity providers like LDAP, Active Directory, or SAML/SSO systems, or created natively within DataHub. CorpGroups are essential for managing access control, ownership assignments, and organizational metadata in DataHub.

Identity

CorpGroups are uniquely identified by a single string field: the group name.

The URN structure for a corpGroup is:

urn:li:corpGroup:<encoded-group-name>

The <encoded-group-name> is a URL-encoded version of the group name that serves as a globally unique identifier within DataHub. The encoding is handled automatically by the SDK.

Examples

Here are some typical URN patterns for different group naming conventions:

urn:li:corpGroup:eng-team
urn:li:corpGroup:data-platform
urn:li:corpGroup:cn%3Dadmins%2Cou%3Dgroups%2Cdc%3Dexample%2Cdc%3Dcom # LDAP DN
urn:li:corpGroup:S-1-5-21-123456789-123456789-123456789-1234 # Active Directory SID
urn:li:corpGroup:marketing-team

The name field is searchable and supports autocomplete, making it easy to find groups across DataHub.

Important Capabilities

Group Information

Group information is stored in two aspects:

  • corpGroupInfo: Contains metadata typically synchronized from external identity providers (LDAP, AD, SSO)
  • corpGroupEditableInfo: Contains metadata that can be edited through the DataHub UI

CorpGroupInfo

This aspect stores the source-of-truth information from external systems:

  • displayName: The human-readable name of the group
  • email: Contact email for the group
  • description: A description of the group's purpose and scope
  • slack: Slack channel associated with the group
  • created: Timestamp of when the group was created

Note: The admins, members, and groups fields in corpGroupInfo are deprecated and maintained only for backwards compatibility. Group membership is now managed through the GroupMembership aspect.

CorpGroupEditableInfo

This aspect stores information that can be edited in the DataHub UI:

  • description: An editable description of the group
  • pictureLink: URL to a profile picture for the group
  • slack: Slack channel for the group
  • email: Contact email for the group

When both aspects contain the same field (like description), the UI typically prioritizes the editable version for display.

Group Membership

Group membership is managed through the groupMembership aspect, which is attached to corpUser entities (not the group itself). This design allows for efficient queries of which groups a user belongs to.

To add a user to a group, you update the groupMembership aspect on the user entity to include the group's URN.

Origin Tracking

The origin aspect tracks where a group originated from:

  • NATIVE: The group was created directly in DataHub
  • EXTERNAL: The group was synchronized from an external identity provider

For external groups, the externalType field can specify the source system (e.g., "LDAP", "AzureAD", "Okta").

Ownership

Groups can have owners assigned through the standard ownership aspect. Owners are typically administrators or managers responsible for the group. Ownership types include TECHNICAL_OWNER, BUSINESS_OWNER, and others.

Tags and Properties

Like other entities in DataHub, groups support:

  • globalTags: Tagging groups for organization and discovery
  • structuredProperties: Custom metadata properties defined by your organization
  • forms: Metadata forms for structured data collection

Code Examples

Create a CorpGroup

Python SDK: Create a group and emit to DataHub
# Inlined from /metadata-ingestion/examples/library/corpgroup_create.py
# metadata-ingestion/examples/library/corpgroup_create.py
import os

from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
CorpGroupInfoClass,
OriginClass,
OriginTypeClass,
StatusClass,
)
from datahub.utilities.urns.corp_group_urn import CorpGroupUrn

gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
emitter = DatahubRestEmitter(gms_server=gms_server, token=token)

group_urn = CorpGroupUrn("data-engineering")

group_info = CorpGroupInfoClass(
displayName="Data Engineering",
description="The data engineering team builds and maintains data pipelines and infrastructure",
email="data-eng@example.com",
slack="data-engineering",
admins=[],
members=[],
groups=[],
)

metadata_event = MetadataChangeProposalWrapper(
entityUrn=str(group_urn),
aspect=group_info,
)
emitter.emit(metadata_event)

status_aspect = StatusClass(removed=False)
metadata_event = MetadataChangeProposalWrapper(
entityUrn=str(group_urn),
aspect=status_aspect,
)
emitter.emit(metadata_event)

origin_aspect = OriginClass(type=OriginTypeClass.NATIVE)
metadata_event = MetadataChangeProposalWrapper(
entityUrn=str(group_urn),
aspect=origin_aspect,
)
emitter.emit(metadata_event)

print(f"Created group: {group_urn}")

Add Members to a Group

Python SDK: Add members to an existing group
# Inlined from /metadata-ingestion/examples/library/corpgroup_add_members.py
# metadata-ingestion/examples/library/corpgroup_add_members.py
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
from datahub.metadata.schema_classes import GroupMembershipClass
from datahub.metadata.urns import CorpGroupUrn, CorpUserUrn

graph = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

group_urn = str(CorpGroupUrn("data-engineering"))

users_to_add = [
CorpUserUrn("jdoe"),
CorpUserUrn("asmith"),
CorpUserUrn("bwilliams"),
]

for user_urn in users_to_add:
user_urn_str = str(user_urn)

current_membership = graph.get_aspect(user_urn_str, GroupMembershipClass)

if current_membership is None:
current_membership = GroupMembershipClass(groups=[])

if group_urn not in current_membership.groups:
current_membership.groups.append(group_urn)

metadata_event = MetadataChangeProposalWrapper(
entityUrn=user_urn_str,
aspect=current_membership,
)
emitter.emit(metadata_event)

print(f"Added {user_urn_str} to group {group_urn}")
else:
print(f"{user_urn_str} is already a member of {group_urn}")

Update Group Information

Python SDK: Update group description and metadata
# Inlined from /metadata-ingestion/examples/library/corpgroup_update_info.py
# metadata-ingestion/examples/library/corpgroup_update_info.py
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
from datahub.metadata.schema_classes import (
CorpGroupEditableInfoClass,
OwnerClass,
OwnershipClass,
OwnershipTypeClass,
)
from datahub.metadata.urns import CorpGroupUrn, CorpUserUrn

graph = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

group_urn = str(CorpGroupUrn("data-engineering"))

editable_info = CorpGroupEditableInfoClass(
description="Updated description: The data engineering team builds and maintains data pipelines, infrastructure, and ensures data quality across the organization",
pictureLink="https://example.com/images/data-engineering-logo.png",
slack="data-engineering",
email="data-eng@example.com",
)

metadata_event = MetadataChangeProposalWrapper(
entityUrn=group_urn,
aspect=editable_info,
)
emitter.emit(metadata_event)

print(f"Updated editable info for group: {group_urn}")

admin_urn = str(CorpUserUrn("jdoe"))

ownership = OwnershipClass(
owners=[
OwnerClass(
owner=admin_urn,
type=OwnershipTypeClass.TECHNICAL_OWNER,
)
]
)

metadata_event = MetadataChangeProposalWrapper(
entityUrn=group_urn,
aspect=ownership,
)
emitter.emit(metadata_event)

print(f"Added {admin_urn} as owner of group: {group_urn}")

Query Groups via REST API

Fetch a group entity and its membership

To retrieve a group entity with all its aspects:

curl 'http://localhost:8080/entities/urn%3Ali%3AcorpGroup%3Aeng-team'

To find all users who are members of a specific group:

curl 'http://localhost:8080/relationships?direction=INCOMING&urn=urn%3Ali%3AcorpGroup%3Aeng-team&types=IsMemberOfGroup'

The response will include all corpUser entities that have the group in their groupMembership aspect:

{
"start": 0,
"count": 3,
"relationships": [
{
"type": "IsMemberOfGroup",
"entity": "urn:li:corpuser:jdoe"
},
{
"type": "IsMemberOfGroup",
"entity": "urn:li:corpuser:asmith"
},
{
"type": "IsMemberOfGroup",
"entity": "urn:li:corpuser:bwilliams"
}
],
"total": 3
}

Integration Points

User Management

CorpGroups are tightly integrated with corpUser entities through the groupMembership aspect. When a user is added to a group, their groupMembership aspect is updated to include the group's URN, establishing a bidirectional relationship.

Ownership Relationships

Groups can be assigned as owners of any DataHub entity (datasets, dashboards, charts, etc.) through the ownership aspect. This allows team-based ownership where all group members are considered owners.

Example ownership assignment:

# A dataset can have a group as an owner
dataset.add_owner(CorpGroupUrn("data-engineering"))

Access Control

While not directly stored in the corpGroup aspects, groups are a fundamental component of DataHub's RBAC (Role-Based Access Control) system. Groups can be:

  • Assigned roles through the roleMembership aspect
  • Referenced in DataHub policies for fine-grained access control
  • Used to manage permissions for metadata operations

External Identity Provider Integration

DataHub provides ingestion connectors for syncing groups from external systems:

LDAP Integration

The LDAP source connector can extract groups and their memberships:

source:
type: ldap
config:
ldap_server: "ldap://ldap.example.com"
ldap_user: "cn=admin,dc=example,dc=com"
ldap_password: "${LDAP_PASSWORD}"
base_dn: "ou=groups,dc=example,dc=com"
filter: "(objectClass=groupOfNames)"

Groups extracted from LDAP will have:

  • The origin aspect set to EXTERNAL with externalType="LDAP"
  • Display names and descriptions from LDAP attributes
  • Group membership automatically synchronized

Azure AD Integration

The Azure AD source connector syncs groups from Microsoft Azure Active Directory:

source:
type: azure-ad
config:
client_id: "${AZURE_CLIENT_ID}"
tenant_id: "${AZURE_TENANT_ID}"
client_secret: "${AZURE_CLIENT_SECRET}"
ingest_users: true
ingest_groups: true

Azure AD groups will have:

  • The origin aspect set to EXTERNAL with externalType="AzureAD"
  • Microsoft 365 group information (if applicable)
  • Nested group memberships flattened

GraphQL API

The corpGroup entity is fully supported in DataHub's GraphQL API. Common queries include:

query GetGroup {
corpGroup(urn: "urn:li:corpGroup:eng-team") {
urn
name
properties {
displayName
description
email
}
ownership {
owners {
owner {
... on CorpUser {
urn
username
}
}
}
}
}
}

Notable Exceptions

Deprecated Membership Fields

The members, admins, and groups fields in the corpGroupInfo aspect are deprecated. These fields were originally used to store group membership directly on the group entity, but this approach had scalability and consistency issues.

Current best practice is to:

  1. Use the groupMembership aspect on corpUser entities to track which groups a user belongs to
  2. Use the ownership aspect to designate group administrators
  3. Query relationships via the REST API to find group members

Native vs External Groups

Groups can be created in two ways:

  1. Native Groups: Created directly in DataHub through the UI or API, with origin type NATIVE
  2. External Groups: Synchronized from identity providers, with origin type EXTERNAL

External groups are typically treated as read-only in DataHub to prevent conflicts with the source system. Updates should be made in the source system (LDAP, Azure AD, etc.) and re-synchronized to DataHub.

Group Name Encoding

Group names are URL-encoded in URNs to handle special characters commonly found in LDAP DNs and Active Directory paths. When using the SDK, encoding is handled automatically. However, when constructing URNs manually or in API requests, ensure proper URL encoding:

# Correct - SDK handles encoding
CorpGroupUrn("cn=admins,ou=groups,dc=example,dc=com")
# Result: urn:li:corpGroup:cn%3Dadmins%2Cou%3Dgroups%2Cdc%3Dexample%2Cdc%3Dcom

# Incorrect - manual construction without encoding
"urn:li:corpGroup:cn=admins,ou=groups,dc=example,dc=com" # Will fail

Group Hierarchy

While the corpGroupInfo aspect includes a deprecated groups field for nested groups, DataHub does not currently have first-class support for group hierarchies. Group membership is flat - a user is either a member of a group or not. If you need hierarchical group structures, consider:

  1. Flattening the hierarchy during ingestion (e.g., if a user is in a child group, add them to all parent groups)
  2. Using naming conventions to indicate hierarchy (e.g., "engineering", "engineering-platform", "engineering-platform-data")
  3. Using domains or tags to represent organizational structure

Technical Reference Guide

The sections above provide an overview of how to use this entity. The following sections provide detailed technical information about how metadata is stored and represented in DataHub.

Aspects are the individual pieces of metadata that can be attached to an entity. Each aspect contains specific information (like ownership, tags, or properties) and is stored as a separate record, allowing for flexible and incremental metadata updates.

Relationships show how this entity connects to other entities in the metadata graph. These connections are derived from the fields within each aspect and form the foundation of DataHub's knowledge graph.

Reading the Field Tables

Each aspect's field table includes an Annotations column that provides additional metadata about how fields are used:

  • ⚠️ Deprecated: This field is deprecated and may be removed in a future version. Check the description for the recommended alternative
  • Searchable: This field is indexed and can be searched in DataHub's search interface
  • Searchable (fieldname): When the field name in parentheses is shown, it indicates the field is indexed under a different name in the search index. For example, dashboardTool is indexed as tool
  • → RelationshipName: This field creates a relationship to another entity. The arrow indicates this field contains a reference (URN) to another entity, and the name indicates the type of relationship (e.g., → Contains, → OwnedBy)

Fields with complex types (like Edge, AuditStamp) link to their definitions in the Common Types section below.

Aspects

corpGroupKey

Key for a CorpGroup

FieldTypeRequiredDescriptionAnnotations
namestringThe URL-encoded name of the AD/LDAP group. Serves as a globally unique identifier within DataHub.Searchable

corpGroupInfo

Information about a Corp Group ingested from a third party source

FieldTypeRequiredDescriptionAnnotations
displayNamestringThe name of the group.Searchable
emailstringemail of this group
adminsstring[]owners of this group Deprecated! Replaced by Ownership aspect.⚠️ Deprecated, → OwnedBy
membersstring[]List of ldap urn in this group. Deprecated! Replaced by GroupMembership aspect.⚠️ Deprecated, → IsPartOf
groupsstring[]List of groups in this group. Deprecated! This field is unused.⚠️ Deprecated, → IsPartOf
descriptionstringA description of the group.Searchable
slackstringSlack channel for the group
createdAuditStampCreated Audit stampSearchable

globalTags

Tag aspect used for applying tags to an entity

FieldTypeRequiredDescriptionAnnotations
tagsTagAssociation[]Tags associated with a given entitySearchable, → TaggedWith

status

The lifecycle status metadata of an entity, e.g. dataset, metric, feature, etc. This aspect is used to represent soft deletes conventionally.

FieldTypeRequiredDescriptionAnnotations
removedbooleanWhether the entity has been removed (soft-deleted).Searchable

corpGroupEditableInfo

Group information that can be edited from UI

FieldTypeRequiredDescriptionAnnotations
descriptionstringA description of the groupSearchable (editedDescription)
pictureLinkstringA URL which points to a picture which user wants to set as the photo for the group
slackstringSlack channel for the group
emailstringEmail address to contact the group

ownership

Ownership information of an entity.

FieldTypeRequiredDescriptionAnnotations
ownersOwner[]List of owners of the entity.
ownerTypesmapOwnership type to Owners map, populated via mutation hook.Searchable
lastModifiedAuditStampAudit stamp containing who last modified the record and when. A value of 0 in the time field indi...

origin

Carries information about where an entity originated from.

FieldTypeRequiredDescriptionAnnotations
typeOriginTypeWhere an entity originated from. Either NATIVE or EXTERNAL.
externalTypestringOnly populated if type is EXTERNAL. The externalType of the entity, such as the name of the ident...

roleMembership

Carries information about which roles a user or group is assigned to.

FieldTypeRequiredDescriptionAnnotations
rolesstring[]→ IsMemberOfRole

structuredProperties

Properties about an entity governed by StructuredPropertyDefinition

FieldTypeRequiredDescriptionAnnotations
propertiesStructuredPropertyValueAssignment[]Custom property bag.

forms

Forms that are assigned to this entity to be filled out

FieldTypeRequiredDescriptionAnnotations
incompleteFormsFormAssociation[]All incomplete forms assigned to the entity.Searchable
completedFormsFormAssociation[]All complete forms assigned to the entity.Searchable
verificationsFormVerificationAssociation[]Verifications that have been applied to the entity via completed forms.Searchable

testResults

Information about a Test Result

FieldTypeRequiredDescriptionAnnotations
failingTestResult[]Results that are failingSearchable, → IsFailing
passingTestResult[]Results that are passingSearchable, → IsPassing

subTypes

Sub Types. Use this aspect to specialize a generic Entity e.g. Making a Dataset also be a View or also be a LookerExplore

FieldTypeRequiredDescriptionAnnotations
typeNamesstring[]The names of the specific types.Searchable

Common Types

These types are used across multiple aspects in this entity.

AuditStamp

Data captured on a resource/association/sub-resource level giving insight into when that resource/association/sub-resource moved into a particular lifecycle stage, and who acted to move it into that specific lifecycle stage.

Fields:

  • time (long): When did the resource/association/sub-resource move into the specific lifecyc...
  • actor (string): The entity (e.g. a member URN) which will be credited for moving the resource...
  • impersonator (string?): The entity (e.g. a service URN) which performs the change on behalf of the Ac...
  • message (string?): Additional context around how DataHub was informed of the particular change. ...

FormAssociation

Properties of an applied form.

Fields:

  • urn (string): Urn of the applied form
  • incompletePrompts (FormPromptAssociation[]): A list of prompts that are not yet complete for this form.
  • completedPrompts (FormPromptAssociation[]): A list of prompts that have been completed for this form.

TestResult

Information about a Test Result

Fields:

  • test (string): The urn of the test
  • type (TestResultType): The type of the result
  • testDefinitionMd5 (string?): The md5 of the test definition that was used to compute this result. See Test...
  • lastComputed (AuditStamp?): The audit stamp of when the result was computed, including the actor who comp...

Relationships

Self

These are the relationships to itself, stored in this entity's aspects

  • IsPartOf (via corpGroupInfo.groups)
  • OwnedBy (via ownership.owners.owner)

Outgoing

These are the relationships stored in this entity's aspects

  • OwnedBy

    • Corpuser via corpGroupInfo.admins
    • Corpuser via ownership.owners.owner
  • IsPartOf

    • Corpuser via corpGroupInfo.members
  • TaggedWith

    • Tag via globalTags.tags
  • ownershipType

    • OwnershipType via ownership.owners.typeUrn
  • IsMemberOfRole

    • DataHubRole via roleMembership.roles
  • IsFailing

    • Test via testResults.failing
  • IsPassing

    • Test via testResults.passing

Incoming

These are the relationships stored in other entity's aspects

  • Has

    • Role via actors.groups.group
  • OwnedBy

    • Dataset via ownership.owners.owner
    • DataJob via ownership.owners.owner
    • DataFlow via ownership.owners.owner
    • DataProcess via ownership.owners.owner
    • Chart via ownership.owners.owner
    • Dashboard via ownership.owners.owner
    • Notebook via ownership.owners.owner
  • IsMemberOfGroup

    • Corpuser via groupMembership.groups
  • IsMemberOfNativeGroup

    • Corpuser via nativeGroupMembership.nativeGroups

Global Metadata Model

Global Graph