Skip to main content

Java SDK V2

The DataHub Java SDK V2 provides a modern, type-safe interface for interacting with DataHub's metadata platform. Built on top of the existing DataHub infrastructure, SDK V2 offers an intuitive, fluent API for creating and managing metadata entities.

Why SDK V2?

SDK V2 represents a significant evolution from the V1 emitter-based approach, offering:

Type-Safe Entity Builders

Leverage Java's type system with fluent builders that guide you to create valid entities. No more manual URN construction or aspect wiring.

Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_database.my_schema.my_table")
.env("PROD")
.description("User profile dataset")
.build();

Simplified CRUD Operations

Perform create, read, update, and delete operations with a clean, intuitive API:

client.entities().upsert(dataset);          // Create or update
client.entities().update(dataset); // Update with patches
Dataset loaded = client.entities().get(urn); // Read from server

Efficient Patch-Based Updates

Make incremental metadata changes without fetching or replacing entire aspects:

dataset.addTag("pii")
.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("team", "data-engineering");

client.entities().update(dataset); // Applies only the changes

Lazy Loading & Caching

Efficiently fetch entity aspects on-demand with built-in TTL-based caching, reducing unnecessary network calls.

Mode-Aware Design

Supports both interactive SDK mode and high-throughput ingestion mode to fit your use case.

Installation

Add the DataHub client library to your project:

Gradle

dependencies {
implementation 'io.acryl:datahub-client:__version__'
}

Maven

<dependency>
<groupId>io.acryl</groupId>
<artifactId>datahub-client</artifactId>
<version>__version__</version>
</dependency>

Note: Check the Maven repository for the latest version.

Quick Start

Here's a complete example of creating a dataset with metadata using SDK V2:

import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;
import com.linkedin.common.OwnershipType;

// Create the client
DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.token("your-access-token") // Optional for authentication
.build();

// Build a dataset with metadata
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("analytics.public.user_events")
.env("PROD")
.description("User interaction events")
.displayName("User Events")
.build();

// Add tags and owners
dataset.addTag("pii")
.addTag("analytics")
.addOwner("urn:li:corpuser:datateam", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("retention", "90_days");

// Upsert to DataHub
client.entities().upsert(dataset);

System.out.println("Created dataset: " + dataset.getUrn());

// Close the client when done
client.close();

Core Concepts

Entities

SDK V2 provides entity classes for major DataHub entities:

  • Dataset - Tables, views, and other data containers
  • Chart - Visualizations and reports
  • Dashboard - Collections of charts (coming soon)

Each entity offers a fluent builder and methods for managing metadata.

Client Operations

The DataHubClientV2 provides centralized access to operations:

  • entities() - CRUD operations for entities
  • testConnection() - Verify connectivity to DataHub server

Patch-Based Updates

Instead of replacing entire aspects, SDK V2 uses patch operations to make surgical updates to specific metadata fields. This is more efficient and reduces the risk of overwriting concurrent changes.

Documentation

Explore detailed guides for working with SDK V2:

Example Code

Find complete, runnable examples in the examples directory:

Comparison with V1

FeatureV1 (RestEmitter)V2 (DataHubClientV2)
Entity CreationManual MCP constructionFluent entity builders
Type SafetyLow - manual aspect wiringHigh - compile-time validation
URN ManagementManual string constructionAutomatic from builder
UpdatesReplace entire aspectsPatch-based incremental updates
API StyleLow-level emitterHigh-level CRUD operations
Learning CurveSteep - requires MCP knowledgeGentle - intuitive builders

See the detailed migration guide for help transitioning from V1 to V2.

Support

For questions, issues, or contributions:

What's Next?