CorpUser
CorpUser represents an individual user (or account) in the enterprise. These entities serve as the identity layer within DataHub, representing people who interact with data assets, own resources, belong to groups, and have roles and permissions within the organization. CorpUsers can represent LDAP users, Active Directory accounts, SSO identities, or native DataHub users.
Identity
CorpUsers are uniquely identified by a single piece of information:
- Username: A unique identifier for the user within the organization. This is typically sourced from the corporate identity provider (LDAP, Active Directory, etc.) or can be an email address for native DataHub users.
The URN structure for CorpUser is:
urn:li:corpuser:<username>
Examples
urn:li:corpuser:jdoe
urn:li:corpuser:john.doe@company.com
urn:li:corpuser:jdoe@company.com
The username is stored in the corpUserKey aspect, which is the identity aspect for this entity. The username field is marked as searchable and enables autocomplete functionality in the DataHub UI.
Username Conventions
The username can follow various conventions depending on your organization's identity provider:
- LDAP/Active Directory usernames:
jdoe,john.doe,john_doe - Email addresses:
jdoe@company.com,john.doe@company.com - SSO identities: Depends on your SSO provider's username format
- Native DataHub users: Typically email addresses
It's important to maintain consistency in username formats across your DataHub deployment to ensure proper identity resolution and relationship tracking.
Important Capabilities
Profile Information
The core profile information about a user is stored in the corpUserInfo aspect. This is typically populated automatically by ingestion connectors from identity providers like LDAP, Active Directory, Azure AD, Okta, or other SSO systems.
Key Fields:
- displayName: The user's name as it should appear in the UI
- email: Email address for contacting the user
- title: Job title (e.g., "Senior Data Engineer")
- firstName and lastName: Components of the user's name
- fullName: Full name typically formatted as "firstName lastName"
- managerUrn: URN reference to the user's direct manager (another CorpUser)
- departmentId and departmentName: Organizational department information
- countryCode: Two-letter country code (e.g., "US", "UK")
- active: Whether the user account is active (deprecated in favor of
corpUserStatus) - system: Whether this is a system/service account rather than a human user
The managerUrn field creates a relationship between users, enabling organizational hierarchy visualization in DataHub.
Editable User Information
The corpUserEditableInfo aspect contains information that users can modify through the DataHub UI, allowing users to enrich their profiles beyond what's provided by the identity provider.
Key Fields:
- aboutMe: A personal description or bio
- displayName: User-specified display name (overrides the one from corpUserInfo)
- title: User-specified title (overrides the one from corpUserInfo)
- teams: List of team names the user belongs to (e.g., ["Data Platform", "Analytics"])
- skills: List of skills the user possesses (e.g., ["Python", "SQL", "Machine Learning"])
- pictureLink: URL to a profile picture
- slack: Slack handle for communication
- phone: Contact phone number
- email: Contact email (can differ from the system email)
- platforms: URNs of data platforms the user commonly works with
- persona: URN of the user's DataHub persona (role-based persona like "Data Analyst")
User Status
The corpUserStatus aspect tracks the current status of the user account, replacing the deprecated active field in corpUserInfo.
Key Fields:
- status: Current status of the user (e.g., "ACTIVE", "SUSPENDED", "PROVISIONED")
- lastModified: Audit stamp with information about who last modified the status and when
This aspect provides more granular control over user account states compared to the simple boolean active field.
Group Membership
Users can be members of groups through two different aspects:
groupMembership: Represents membership in CorpGroups that may be managed within DataHub or synchronized from external systems. This creates IsMemberOfGroup relationships.
nativeGroupMembership: Represents membership in groups that are native to an external identity provider (like Active Directory groups). This creates IsMemberOfNativeGroup relationships.
Both aspects store arrays of group URNs, allowing users to belong to multiple groups simultaneously.
Role Membership
The roleMembership aspect associates users with DataHub roles, which define their permissions and access within the platform.
Key Fields:
- roles: Array of DataHubRole URNs that the user is assigned to
This creates IsMemberOfRole relationships and is fundamental to DataHub's role-based access control (RBAC) system.
Authentication Credentials
The corpUserCredentials aspect stores authentication information for native DataHub users (users created directly in DataHub rather than synchronized from an external identity provider).
Key Fields:
- salt: Salt used for password hashing
- hashedPassword: The hashed password
- passwordResetToken: Optional token for password reset operations
- passwordResetTokenExpirationTimeMillis: When the reset token expires
This aspect is only used for native authentication and is not populated for users authenticated through SSO or LDAP.
User Settings
The corpUserSettings aspect stores user-specific preferences for the DataHub UI and features.
Key Fields:
- appearance: Settings controlling the look and feel of the DataHub UI
showSimplifiedHomepage: Whether to show a simplified homepage with only datasets, charts, and dashboardsshowThemeV2: Whether to use the V2 theme
- views: Settings for the Views feature
defaultView: The user's default DataHub view
- notificationSettings: Preferences for notifications
- homePage: Settings for the home page experience
pageTemplate: The user's default page templatedismissedAnnouncementUrns: List of announcements the user has dismissed
Origin
The origin aspect tracks where the user entity originated from, distinguishing between native DataHub users and those synchronized from external systems.
Key Fields:
- type: Either "NATIVE" or "EXTERNAL"
- externalType: Name of the external identity provider (e.g., "AzureAD", "Okta", "LDAP")
This information is useful for understanding the source of truth for user data and managing synchronization processes.
Slack Integration
The slackUserInfo aspect contains detailed information about a user's Slack identity, enabling rich Slack integration features within DataHub.
Key Fields:
- slackInstance: URN of the Slack workspace
- id: Unique Slack member ID
- name: Slack username
- realName: Real name in Slack
- displayName: Display name in Slack
- email: Email associated with the Slack account
- teamId: Slack team/workspace ID
- isDeleted, isAdmin, isOwner, isPrimaryOwner, isBot: Account status flags
- timezone and timezoneOffset: User's timezone information
- title: Job title from Slack
- phone: Phone number from Slack
- profilePictureUrl: URL to Slack profile picture
- statusText and statusEmoji: Current Slack status
Tags, Structured Properties, and Forms
Like other DataHub entities, CorpUsers support:
- globalTags: Tags attached to the user entity for categorization
- structuredProperties: Custom properties defined by your organization's data model
- forms: Forms that can be attached to users for collecting structured information
- status: Generic status aspect for soft-deletion
These common aspects enable flexible metadata management and integration with DataHub's broader metadata framework.
Code Examples
Creating a CorpUser
The simplest way to create a CorpUser is using the high-level Python SDK:
Python SDK: Create a basic user
# metadata-ingestion/examples/library/corpuser_create_basic.py
import logging
import os
from datahub.api.entities.corpuser.corpuser import CorpUser
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Create a basic user with essential information
user = CorpUser(
id="jdoe",
display_name="John Doe",
email="jdoe@company.com",
title="Senior Data Engineer",
first_name="John",
last_name="Doe",
full_name="John Doe",
department_name="Data Engineering",
country_code="US",
)
# Create graph client
gms_server = os.getenv("DATAHUB_GMS_URL", "http://localhost:8080")
token = os.getenv("DATAHUB_GMS_TOKEN")
datahub_graph = DataHubGraph(DataHubGraphConfig(server=gms_server, token=token))
# Emit the user entity
for event in user.generate_mcp():
datahub_graph.emit(event)
log.info(f"Created user {user.urn}")
Creating a CorpUser with Group Memberships
Users are often members of groups. Here's how to create a user and assign them to groups:
Python SDK: Create user with group memberships
# metadata-ingestion/examples/library/corpuser_create_with_groups.py
import logging
from datahub.api.entities.corpuser.corpuser import CorpUser
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Create a user with group memberships
user = CorpUser(
id="jsmith",
display_name="Jane Smith",
email="jsmith@company.com",
title="Data Analyst",
first_name="Jane",
last_name="Smith",
full_name="Jane Smith",
department_name="Analytics",
country_code="US",
groups=["data-engineering", "analytics-team"],
)
# Create graph client
datahub_graph = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
# Emit the user entity with group memberships
for event in user.generate_mcp():
datahub_graph.emit(event)
log.info(f"Created user {user.urn} with group memberships")
Updating User Profile Information
To update editable profile information for an existing user:
Python SDK: Update user profile
# metadata-ingestion/examples/library/corpuser_update_profile.py
import logging
from datahub.api.entities.corpuser.corpuser import CorpUser, CorpUserGenerationConfig
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Update a user's editable profile information
user = CorpUser(
id="jdoe",
email="jdoe@company.com",
description="Passionate about data quality and building reliable data pipelines. "
"10+ years of experience in data engineering.",
slack="@jdoe",
phone="+1-555-0123",
picture_link="https://company.com/photos/jdoe.jpg",
)
# Create graph client
datahub_graph = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
# Emit with override_editable=True to update editable fields
for event in user.generate_mcp(
generation_config=CorpUserGenerationConfig(override_editable=True)
):
datahub_graph.emit(event)
log.info(f"Updated profile for user {user.urn}")
Adding Tags to a User
Users can be tagged for categorization and discovery:
Python SDK: Add tags to a user
# metadata-ingestion/examples/library/corpuser_add_tag.py
import logging
from datahub.emitter.mce_builder import make_tag_urn, make_user_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
from datahub.metadata.schema_classes import GlobalTagsClass, TagAssociationClass
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# User to add tag to
user_urn = make_user_urn("jdoe")
# Tag to add
tag_urn = make_tag_urn("DataEngineering")
# Create graph client
datahub_graph = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
# Read current tags
current_tags = datahub_graph.get_aspect(
entity_urn=user_urn, aspect_type=GlobalTagsClass
)
# Initialize tags if they don't exist
if current_tags is None:
current_tags = GlobalTagsClass(tags=[])
# Check if tag already exists
tag_exists = any(tag.tag == tag_urn for tag in current_tags.tags)
if not tag_exists:
# Add the new tag
new_tag = TagAssociationClass(tag=tag_urn)
current_tags.tags.append(new_tag)
# Create MCP to update the tags
mcp = MetadataChangeProposalWrapper(
entityUrn=user_urn,
aspect=current_tags,
)
# Emit the change
datahub_graph.emit(mcp)
log.info(f"Added tag {tag_urn} to user {user_urn}")
else:
log.info(f"Tag {tag_urn} already exists on user {user_urn}")
Querying Users via REST API
You can fetch user information using the REST API:
REST API: Get user information
# Get a user by URN
curl -X GET "http://localhost:8080/entities/urn%3Ali%3Acorpuser%3Ajdoe" \
-H "Authorization: Bearer <your-access-token>"
# Get specific aspects of a user
curl -X GET "http://localhost:8080/aspects/urn%3Ali%3Acorpuser%3Ajdoe?aspect=corpUserInfo&aspect=corpUserEditableInfo&aspect=groupMembership" \
-H "Authorization: Bearer <your-access-token>"
Searching for Users
You can search for users using the GraphQL API or search API:
GraphQL: Search for users
query searchUsers {
search(input: { type: CORP_USER, query: "john", start: 0, count: 10 }) {
start
count
total
searchResults {
entity {
... on CorpUser {
urn
username
properties {
displayName
email
title
fullName
}
editableProperties {
aboutMe
teams
skills
slack
}
}
}
}
}
}
Integration Points
Relationships with Other Entities
CorpUsers have several important relationships with other DataHub entities:
Ownership Relationships:
- CorpUsers can be owners of datasets, dashboards, charts, data flows, and virtually any other entity in DataHub
- The ownership relationship includes the owner type (e.g., DATAOWNER, TECHNICAL_OWNER, BUSINESS_OWNER)
Group Relationships:
- Users belong to CorpGroups through
IsMemberOfGrouprelationships - Groups can also be owners of assets, providing inherited ownership
Role Relationships:
- Users are assigned to DataHub roles through
IsMemberOfRolerelationships - Roles define permissions and access levels within DataHub
Organizational Hierarchy:
- The
managerUrnfield incorpUserInfocreatesReportsTorelationships - This enables visualization of organizational structure and reporting chains
Platform Usage:
- The
platformsfield incorpUserEditableInfocreatesIsUserOfrelationships - This helps identify which platforms users commonly work with
Persona Assignment:
- Users can be assigned to DataHub personas through the
personafield - This helps categorize users by their role and customize their experience
Identity Provider Integration
CorpUsers are typically synchronized from external identity providers:
LDAP/Active Directory:
- Most organizations use LDAP connectors to automatically synchronize user information
- The username typically corresponds to the LDAP
uidorsAMAccountName - Profile information is populated from LDAP attributes
SSO Providers (Okta, Azure AD, etc.):
- SSO integrations can provision users automatically on first login
- User attributes from the SSO provider populate the
corpUserInfoaspect - The
originaspect tracks the SSO provider as the source
Native DataHub Users:
- Users can be created directly in DataHub for testing or small deployments
- These users have credentials stored in the
corpUserCredentialsaspect - They are marked with
origin.type = NATIVE
Authentication and Authorization
CorpUsers are central to DataHub's security model:
Authentication:
- Native users authenticate with username/password
- SSO users authenticate through their identity provider
- API tokens can be associated with users for programmatic access
Authorization (RBAC):
- Users are assigned to roles through the
roleMembershipaspect - Roles define what actions users can perform
- Policies can reference users or groups to grant/restrict access
Metadata Access:
- Users can only see metadata they have permission to view
- Ownership and group membership influence what users can edit
- Policies can be user-specific or group-based
Notable Exceptions
System Users
CorpUsers can represent both human users and system/service accounts. The system field in corpUserInfo distinguishes between these:
- Human Users (
system: false): Actual people who interact with DataHub - System Accounts (
system: true): Service accounts, automated processes, or system-level operations
System users should be marked appropriately to distinguish them in reports, ownership lists, and access reviews.
Deprecated Active Field
The active field in corpUserInfo is deprecated. Use the corpUserStatus aspect instead, which provides:
- More granular status options beyond just active/inactive
- Audit information about status changes
- Better support for provisioning workflows
When working with users, prefer checking corpUserStatus.status over corpUserInfo.active.
Username Immutability
The username (in corpUserKey) is immutable once a user is created. If a user's username changes in the source system:
- A new CorpUser entity must be created with the new username
- Ownership and other relationships need to be migrated to the new entity
- The old user can be soft-deleted using the
statusaspect
Plan your username strategy carefully to avoid frequent username changes.
Display Name Precedence
Display names can appear in multiple aspects with this precedence:
corpUserEditableInfo.displayName(user-specified, highest priority)corpUserInfo.displayName(from identity provider)corpUserInfo.fullName(fallback if no display name is set)
The DataHub UI resolves these in order, showing the most specific value available.
Technical Reference
For technical details about fields, searchability, and relationships, view the Columns tab in DataHub.