Java SDK V2
The DataHub Java SDK V2 provides a modern, type-safe interface for interacting with DataHub's metadata platform. Built on top of the existing DataHub infrastructure, SDK V2 offers an intuitive, fluent API for creating and managing metadata entities.
Why SDK V2?
SDK V2 represents a significant evolution from the V1 emitter-based approach, offering:
Type-Safe Entity Builders
Leverage Java's type system with fluent builders that guide you to create valid entities. No more manual URN construction or aspect wiring.
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_database.my_schema.my_table")
.env("PROD")
.description("User profile dataset")
.build();
Simplified CRUD Operations
Perform create, read, update, and delete operations with a clean, intuitive API:
client.entities().upsert(dataset); // Create or update
client.entities().update(dataset); // Update with patches
Dataset loaded = client.entities().get(urn); // Read from server
Efficient Patch-Based Updates
Make incremental metadata changes without fetching or replacing entire aspects:
dataset.addTag("pii")
.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("team", "data-engineering");
client.entities().update(dataset); // Applies only the changes
Lazy Loading & Caching
Efficiently fetch entity aspects on-demand with built-in TTL-based caching, reducing unnecessary network calls.
Mode-Aware Design
Supports both interactive SDK mode and high-throughput ingestion mode to fit your use case.
Installation
Add the DataHub client library to your project:
Gradle
dependencies {
implementation 'io.acryl:datahub-client:__version__'
}
Maven
<dependency>
<groupId>io.acryl</groupId>
<artifactId>datahub-client</artifactId>
<version>__version__</version>
</dependency>
Note: Check the Maven repository for the latest version.
Quick Start
Here's a complete example of creating a dataset with metadata using SDK V2:
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;
import com.linkedin.common.OwnershipType;
// Create the client
DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.token("your-access-token") // Optional for authentication
.build();
// Build a dataset with metadata
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("analytics.public.user_events")
.env("PROD")
.description("User interaction events")
.displayName("User Events")
.build();
// Add tags and owners
dataset.addTag("pii")
.addTag("analytics")
.addOwner("urn:li:corpuser:datateam", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("retention", "90_days");
// Upsert to DataHub
client.entities().upsert(dataset);
System.out.println("Created dataset: " + dataset.getUrn());
// Close the client when done
client.close();
Core Concepts
Entities
SDK V2 provides entity classes for major DataHub entities:
- Dataset - Tables, views, and other data containers
- Chart - Visualizations and reports
- Dashboard - Collections of charts (coming soon)
Each entity offers a fluent builder and methods for managing metadata.
Client Operations
The DataHubClientV2 provides centralized access to operations:
entities()- CRUD operations for entitiestestConnection()- Verify connectivity to DataHub server
Patch-Based Updates
Instead of replacing entire aspects, SDK V2 uses patch operations to make surgical updates to specific metadata fields. This is more efficient and reduces the risk of overwriting concurrent changes.
Documentation
Explore detailed guides for working with SDK V2:
- Getting Started Guide - Comprehensive tutorial with authentication, configuration, and basic operations
- Design Principles - Architectural overview and engineering principles behind SDK V2
- DataHubClientV2 - Client configuration, connection management, and operation modes
- Entities Overview - Common patterns across all entity types
- Dataset Entity - Complete guide to working with datasets
- Chart Entity - Creating and managing chart entities
- Patch Operations - Deep dive into efficient incremental updates
- Migration from V1 - Step-by-step guide for upgrading from SDK V1
Example Code
Find complete, runnable examples in the examples directory:
- DatasetCreateExample.java - Basic dataset creation
- DatasetPatchExample.java - Adding tags, owners, and custom properties
- DatasetFullExample.java - Comprehensive metadata management
- ChartCreateExample.java - Chart entity creation
Comparison with V1
| Feature | V1 (RestEmitter) | V2 (DataHubClientV2) |
|---|---|---|
| Entity Creation | Manual MCP construction | Fluent entity builders |
| Type Safety | Low - manual aspect wiring | High - compile-time validation |
| URN Management | Manual string construction | Automatic from builder |
| Updates | Replace entire aspects | Patch-based incremental updates |
| API Style | Low-level emitter | High-level CRUD operations |
| Learning Curve | Steep - requires MCP knowledge | Gentle - intuitive builders |
See the detailed migration guide for help transitioning from V1 to V2.
Support
For questions, issues, or contributions:
- DataHub GitHub
- DataHub Slack
- API Tutorials - Language-agnostic guides with Java examples
What's Next?
- Getting Started: Follow the comprehensive tutorial to build your first application
- Learn the Architecture: Read about SDK V2's design principles
- Explore Entities: Dive into Dataset or Chart guides
- Migrate from V1: Use the migration guide to upgrade existing code