RealTime¶
RealTime Matching¶
- tamr_toolbox.realtime.matching.update_realtime_match_data(*, project, do_update_clusters=True, do_use_manual_clustering=False, **options)[source]¶
Updates data for RealTime match queries if needed, based on latest published clusters.
- Parameters
project (
MasteringProject
) – project to be updateddo_update_clusters (
bool
) – whether to update clusters, default Truedo_use_manual_clustering (
bool
) – whether to use externally managed clustering, default Falseoptions – Options passed to underlying
Operation
- Return type
- Returns
an operation object describing the update operation
- Raises
RuntimeError – if update API call fails
- tamr_toolbox.realtime.matching.poll_realtime_match_status(*, project, match_client, num_tries=10, wait_sec=1)[source]¶
Check if match service is queryable. Try up to num_tries times at 1 sec (or user-specified) interval.
- Parameters
project (
MasteringProject
) – the mastering project whose status to checkmatch_client (
Client
) – a Tamr client set to use the port of the Match APInum_tries (
int
) – max number of times to poll endpoint, default 10wait_sec (
int
) – number of seconds to wait between tries, default 1
- Return type
- Returns
bool indicating whether project is queryable
- tamr_toolbox.realtime.matching.match_query(*, project, match_client, records, type, primary_key=None, batch_size=None, min_match_prob=None, max_num_matches=None)[source]¶
Find the best matching clusters or records for each supplied record. Returns a dictionary where each key correpsonds to an input record and the value is a list of the RealTime match results for that record. An empty result list indicates a null response from matching (or no responses above the min_match_prob, if that parameter was supplied).
- Parameters
project (
MasteringProject
) – the mastering project to query for matchesmatch_client (
Client
) – a Tamr client set to use the port of the Match APItype (
str
) – one of “records” or “clusters” – whether to pull record or cluster matchesprimary_key (
Optional
[str
]) – a primary key for the data; if supplied, this must be a field in input recordsbatch_size (
Optional
[int
]) – split input into this batch size for match query calls (e.g. to prevent network timeouts), default None sends a single query with all recordsmin_match_prob (
Optional
[float
]) – if set, only matches with probability above minimum will be returned, default Nonemax_num_matches (
Optional
[int
]) – if set, at most max_num_matches will be returned for each input record in records, default None
- Return type
- Returns
Dict keyed by integers (indices of inputs), or by primary_key if primary_key is supplied, with value a list containing matched data
- Raises
ValueError – if match type is not “records” or “clusters”, or if batch_size is invalid
RuntimeError – if query fails
- tamr_toolbox.realtime.matching.transform_and_match_query(*, project, match_client, records, type, primary_key=None, batch_size=None, min_match_prob=None, max_num_matches=None, default_source_name=None)[source]¶
Find the best matching clusters or records for each supplied record. Returns a dictionary where each key correpsonds to an input record and the value is a list of the RealTime match results for that record. An empty result list indicates a null response from matching (or no responses above the min_match_prob, if that parameter was supplied). Will run schema mapping and transformations prior to realtime match. If LLT, is not enabled will just run default LLM with no transformation or schema mapping
- Parameters
project (
MasteringProject
) – the mastering project to query for matchesmatch_client (
Client
) – a Tamr client set to use the port of the Match APItype (
str
) – one of “records” or “clusters” – whether to pull record or cluster matchesprimary_key (
Optional
[str
]) – a primary key for the data; if supplied, this must be a field in input recordsbatch_size (
Optional
[int
]) – split input into this batch size for match query calls (e.g. to prevent network timeouts), default None sends a single query with all recordsmin_match_prob (
Optional
[float
]) – if set, only matches with probability above minimum will be returned, default Nonemax_num_matches (
Optional
[int
]) – if set, at most max_num_matches will be returned for each input record in records, default Nonedefault_source_name (
Optional
[str
]) – the default source name used for schema mapping in LLT, default None
- Return type
- Returns
Dict keyed by integers (indices of inputs), or by primary_key if primary_key is supplied, with value a list containing matched data
- Raises
ValueError – if match type is not “records” or “clusters”, or if batch_size is invalid
RuntimeError – if query fails