Lineage Client
The DataHub Lineage Client provides a client for adding and retrieving lineage information from DataHub.
If you’re looking for higher-level introduction to adding and getting lineage using the SDK, see the lineage guide.
LineageClient
Bases: object
- Parameters:client (
DataHubClient
) –
add_datajob_lineage(*, datajob, upstreams=None, downstreams=None)
Add lineage between a datajob and datasets/datajobs.
- Parameters:
- datajob (
Union
[str
,DataJobUrn
]) – The datajob URN to connect lineage with - upstreams (
Optional
[List
[Union
[str
,DatasetUrn
,DataJobUrn
]]]) – List of upstream datasets or datajobs that serve as inputs to the datajob - downstreams (
Optional
[List
[Union
[str
,DatasetUrn
]]]) – List of downstream datasets that are outputs of the datajob
- datajob (
- Return type:
None
add_dataset_copy_lineage(*, upstream, downstream, column_lineage='auto_fuzzy')
- Parameters:
- upstream (
Union
[str
,DatasetUrn
]) – - downstream (
Union
[str
,DatasetUrn
]) – - column_lineage (
Union
[None
,Dict
[str
,List
[str
]],Literal
['auto_fuzzy'
,'auto_strict'
]])
- upstream (
- Return type:
None
add_dataset_transform_lineage(*, upstream, downstream, column_lineage=None, transformation_text=None)
- Parameters:
- upstream (
Union
[str
,DatasetUrn
]) – - downstream (
Union
[str
,DatasetUrn
]) – - column_lineage (
Optional
[Dict
[str
,List
[str
]]]) - transformation_text (
Optional
[str
])
- upstream (
- Return type:
None
add_lineage(*, upstream, downstream, column_lineage=False, transformation_text=None)
Add lineage between two entities.
This flexible method handles different combinations of entity types:
- dataset to dataset
- dataset to datajob
- datajob to dataset
- datajob to datajob
- dashboard to dataset
- dashboard to chart
- dashboard to dashboard
- dataset to chart
- Parameters:
- upstream (
Union
[str
,DatasetUrn
,DataJobUrn
,DashboardUrn
,ChartUrn
]) – URN of the upstream entity (dataset or datajob) - downstream (
Union
[str
,DatasetUrn
,DataJobUrn
,DashboardUrn
,ChartUrn
]) – URN of the downstream entity (dataset or datajob) - column_lineage (
Union
[bool
,Dict
[str
,List
[str
]],Literal
['auto_fuzzy'
,'auto_strict'
]]) – Optional boolean to indicate if column-level lineage should be added or a lineage mapping type (auto_fuzzy, auto_strict, or a mapping of column-level lineage) - transformation_text (
Optional
[str
]) – Optional SQL query text that defines the transformation (only applicable for dataset-to-dataset lineage)
- upstream (
- Raises:
- InvalidUrnError – If the URNs provided are invalid
- SdkUsageError – If certain parameter combinations are not supported
- Return type:
None
get_lineage(*, source_urn, source_column=None, direction='upstream', max_hops=1, filter=None, count=500)
Retrieve lineage entities connected to a source entity.
:type source_urn: Union
[str
, Urn
]
:param source_urn: Source URN for the lineage search
:type source_column: Optional
[str
]
:param source_column: Source column for the lineage search
:type direction: Literal
['upstream'
, 'downstream'
]
:param direction: Direction of lineage traversal
:type max_hops: int
:param max_hops: Maximum number of hops to traverse
:type filter: Union
[_And
, _Or
, _Not
, _EntityTypeFilter
, _EntitySubtypeFilter
, _StatusFilter
, _PlatformFilter
, _DomainFilter
, _EnvFilter
, _CustomCondition
, None
]
:param filter: Filters to apply to the lineage search
:type count: int
:param count: Maximum number of results to return
- Return type:
List
[LineageResult
] - Returns: List of lineage results
- Raises:SdkUsageError for invalid filter values –
- Parameters:
- source_urn (str | Urn) –
- source_column (str | None)
- direction (Literal [ 'upstream' ,'downstream'])
- max_hops (int)
- filter ( _And | _Or | _Not | _EntityTypeFilter | _EntitySubtypeFilter | _StatusFilter | _PlatformFilter | _DomainFilter | _EnvFilter | _CustomCondition | None)
- count (int)
infer_lineage_from_sql(*, query_text, platform, platform_instance=None, env='PROD', default_db=None, default_schema=None)
Add lineage by parsing a SQL query.
- Parameters:
- query_text (
str
) - platform (
str
) - platform_instance (
Optional
[str
]) - env (
str
) - default_db (
Optional
[str
]) - default_schema (
Optional
[str
])
- query_text (
- Return type:
None
LineagePath
Bases: object
- Parameters:
- urn (
str
) - entity_name (
str
) - column_name (
Optional
[str
])
- urn (
column_name : Optional
[str
] = None
entity_name : str
urn : str
LineageResult
Bases: object
- Parameters:
- urn (
str
) - type (
str
) - hops (
int
) - direction (
Literal
['upstream'
,'downstream'
]) - platform (
Optional
[str
]) - name (
Optional
[str
]) - description (
Optional
[str
]) - paths (
Optional
[List
[LineagePath
]]) –
- urn (