Skip to main content

Entities Overview

SDK V2 provides high-level entity classes that represent DataHub metadata entities. This guide covers common patterns across all entity types.

Supported Entities

SDK V2 supports all major DataHub entity types including Dataset, Chart, Dashboard, Container, DataFlow, DataJob, MLModel, and MLModelGroup. Each entity type has a dedicated guide with comprehensive examples and API documentation.

See the Entity Guides section in the sidebar for the complete list of available entities and their documentation.

Common Patterns

Entity Lifecycle

All entities follow a consistent lifecycle:

  1. Construction - Build entity using fluent builder
  2. Metadata Addition - Add tags, owners, properties via methods
  3. Persistence - Upsert or update to DataHub
  4. Loading - Fetch existing entity from server
// 1. Construction
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.build();

// 2. Metadata Addition
dataset.addTag("pii")
.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER);

// 3. Persistence
client.entities().upsert(dataset);

// 4. Loading
Dataset loaded = client.entities().get(datasetUrn);

URN Management

Each entity type has a strongly-typed URN class:

// Dataset
DatasetUrn datasetUrn = dataset.getDatasetUrn();

// Chart
ChartUrn chartUrn = chart.getChartUrn();

// Dashboard
DashboardUrn dashboardUrn = dashboard.getDashboardUrn();

// DataJob
DataJobUrn dataJobUrn = dataJob.getDataJobUrn();

// Generic access
Urn genericUrn = entity.getUrn();

URN Construction:

// Built automatically from builder
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("db.schema.table")
.env("PROD")
.build();
// URN: urn:li:dataset:(urn:li:dataPlatform:snowflake,db.schema.table,PROD)

Fluent Method Chaining

All mutation methods return this for method chaining:

dataset.addTag("pii")
.addTag("analytics")
.addOwner("urn:li:corpuser:owner1", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("team", "data-eng")
.setDomain("urn:li:domain:Finance");

Three Creation Modes

Mode 1: From Scratch (Builder)

Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.description("My description")
.build();
// aspectCache populated with builder aspects
// pendingPatches empty

Mode 2: Loaded from Server

DatasetUrn urn = new DatasetUrn("snowflake", "my_table", "PROD");
Dataset dataset = client.entities().get(urn);
// aspectCache populated with server aspects
// Aspects have timestamps for freshness tracking

Mode 3: Reference with Lazy Loading

// Entity bound to client enables lazy loading
Dataset dataset = client.entities().reference(urn);
dataset.bindToClient(client, mode);
String desc = dataset.getDescription(); // Fetches on first access

Aspect Management

Aspect Caching

Entities cache aspects locally with TTL-based freshness:

// Cached aspects with 60-second default TTL
dataset.setCacheTtlMs(120000); // 2 minutes

// Check cache status
Map<String, RecordTemplate> aspects = dataset.getAllAspects();

Lazy Loading

When bound to a client, aspects are fetched on-demand:

Dataset dataset = client.entities().get(urn);
// aspectCache may not have all aspects

String description = dataset.getDescription();
// Triggers lazy fetch if not cached or expired

Patch-Based Operations

Pending Patches

Mutations accumulate as patches until save:

dataset.addTag("tag1");     // Creates patch
dataset.addTag("tag2"); // Creates patch
dataset.addOwner("user", OwnershipType.TECHNICAL_OWNER); // Creates patch

// Check pending patches
boolean hasPending = dataset.hasPendingPatches();
List<MetadataChangeProposal> patches = dataset.getPendingPatches();

// Emit all patches
client.entities().update(dataset);

// Patches cleared after emission
dataset.clearPendingPatches();

Understanding What Gets Sent

The upsert() method emits everything accumulated on the entity:

  1. Cached aspects (from builder)
  2. Full aspect replacements (from set*() methods)
  3. Patches (from add*/remove* methods)

What gets sent depends on how the entity was created and what operations you performed:

Builder with patches:

Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.description("Description")
.build();
dataset.addTag("pii"); // Creates patch
client.entities().upsert(dataset); // Sends: cached aspects + tag patch

Patches only (loaded entity):

Dataset dataset = client.entities().get(urn);
dataset.addTag("pii"); // Creates patch
client.entities().upsert(dataset); // Sends: tag patch only

Full aspect replacement:

Dataset dataset = client.entities().get(urn);
dataset.setDescription("New description"); // Creates full aspect MCP
client.entities().upsert(dataset); // Sends: complete description aspect

Combined operations:

Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.build();
dataset.setDescription("Description"); // Full aspect MCP
dataset.addTag("pii"); // Patch
dataset.addOwner("user", OwnershipType.TECHNICAL_OWNER); // Patch
client.entities().upsert(dataset); // Sends: cached aspects + description aspect + 2 patches

Mode-Aware Operations

Entities respect the client's operation mode:

// SDK mode client
DataHubClientV2 sdkClient = DataHubClientV2.builder()
.server("http://localhost:8080")
.operationMode(OperationMode.SDK) // Default
.build();

dataset.setDescription("User description");
sdkClient.entities().upsert(dataset);
// Writes to editableDatasetProperties

// INGESTION mode client
DataHubClientV2 ingestionClient = DataHubClientV2.builder()
.server("http://localhost:8080")
.operationMode(OperationMode.INGESTION)
.build();

dataset.setDescription("Ingested description");
ingestionClient.entities().upsert(dataset);
// Writes to datasetProperties

Common Metadata Operations

Tags

// Add tags
entity.addTag("pii");
entity.addTag("urn:li:tag:analytics");

// Remove tags
entity.removeTag("pii");

Owners

import com.linkedin.common.OwnershipType;

// Add owners
entity.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER);
entity.addOwner("urn:li:corpuser:jane", OwnershipType.DATA_STEWARD);

// Remove owners
entity.removeOwner("urn:li:corpuser:john");

Glossary Terms

// Add terms
entity.addTerm("urn:li:glossaryTerm:CustomerData");
entity.addTerm("urn:li:glossaryTerm:PII");

// Remove terms
entity.removeTerm("urn:li:glossaryTerm:CustomerData");

Domain

// Set domain
entity.setDomain("urn:li:domain:Marketing");

// Remove domain
entity.removeDomain();

Custom Properties

// Add custom properties
entity.addCustomProperty("team", "data-engineering");
entity.addCustomProperty("retention", "90_days");

// Remove custom properties
entity.removeCustomProperty("retention");

Error Handling

Handle exceptions gracefully:

try {
client.entities().upsert(dataset);
} catch (IOException e) {
// Network or serialization errors
log.error("I/O error: {}", e.getMessage());
} catch (ExecutionException e) {
// Server-side errors
log.error("Server error: {}", e.getCause().getMessage());
} catch (InterruptedException e) {
// Operation cancelled
Thread.currentThread().interrupt();
log.error("Operation interrupted");
}

Entity-Specific Guides

Each entity type has a dedicated guide with detailed information including:

  • URN structure and construction
  • Entity-specific properties and operations
  • Comprehensive code examples
  • Best practices and common patterns
  • Lineage and relationship management

See the Entity Guides section in the sidebar for the complete list of available entity documentation.

Example guides include Dataset Entity, Chart Entity, Dashboard Entity, and DataJob Entity.