Mastering¶
Jobs¶
Tasks related to running jobs for Tamr Mastering projects
- tamr_toolbox.project.mastering.jobs.run(project, *, run_estimate_pair_counts=False, run_apply_feedback=False, run_update_realtime_match=False, process_asynchronously=False)[source]¶
Run the existing pipeline without training
- Parameters
project (
MasteringProject
) – Target mastering projectrun_estimate_pair_counts (
bool
) – Whether an estimate pairs job should be runrun_apply_feedback (
bool
) – Whether train should be called on the pair matching modelrun_update_realtime_match (
bool
) – Whether to update RealTime match data after publishing clustersprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
- tamr_toolbox.project.mastering.jobs.update_unified_dataset(project, *, process_asynchronously=False)[source]¶
Updates the unified dataset for a mastering project
- Parameters
project (
MasteringProject
) – Target mastering projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
- tamr_toolbox.project.mastering.jobs.estimate_pair_counts(project, *, process_asynchronously=False)[source]¶
Estimates the number of pairs for a mastering project
- Parameters
project (
MasteringProject
) – Target mastering projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
- tamr_toolbox.project.mastering.jobs.generate_pairs(project, *, process_asynchronously=False)[source]¶
Generates the pairs for a mastering project
- Parameters
project (
MasteringProject
) – Target mastering projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
- tamr_toolbox.project.mastering.jobs.apply_feedback(project, *, process_asynchronously=False)[source]¶
Applies feedback to update the model for a mastering project
- Parameters
project (
MasteringProject
) – Target mastering projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
- tamr_toolbox.project.mastering.jobs.update_pair_predictions(project, *, process_asynchronously=False)[source]¶
Updates pair predictions only.
- Parameters
project (
MasteringProject
) – Target mastering projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
- tamr_toolbox.project.mastering.jobs.update_clusters(project, *, process_asynchronously=False)[source]¶
Re-runs clustering only.
- Parameters
project (
MasteringProject
) – Target mastering projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
- tamr_toolbox.project.mastering.jobs.apply_feedback_and_update_results(project, *, process_asynchronously=False)[source]¶
Trains the model, predicts the pair labels, and updates the draft clusters of a mastering project
- Parameters
project (
MasteringProject
) – Target mastering projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
- tamr_toolbox.project.mastering.jobs.update_results_only(project, *, process_asynchronously=False)[source]¶
Predicts the pair labels based on the existing pair model and updates the draft clusters of a mastering project
- Parameters
project (
MasteringProject
) – Target mastering projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
- tamr_toolbox.project.mastering.jobs.publish_clusters(project, *, run_update_realtime_match=False, process_asynchronously=False)[source]¶
Publishes the clusters of a mastering project
- Parameters
project (
MasteringProject
) – Target mastering projectrun_update_realtime_match (
bool
) – whether to update RealTime match data after publishing clustersprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
Schema¶
- tamr_toolbox.project.mastering.schema.map_attribute(project, *, source_attribute_name, source_dataset_name, unified_attribute_name)¶
Maps source_attribute in source_dataset to unified_attribute in unified_dataset. If the mapping already exists it will log a warning and return the existing AttributeMapping from the project’s collection.
- Parameters
- Return type
- Returns
The created AttributeMapping
- Raises
ValueError – if input variables source_attribute_name or source_dataset_name or unified_attribute_name are set to empty strings; or if the dataset source_dataset_name is not found on Tamr; or if source_attribute_name is missing from the attributes of source_attribute_name
- tamr_toolbox.project.mastering.schema.unmap_attribute(project, *, source_attribute_name, source_dataset_name, unified_attribute_name)¶
Unmaps a source attribute.
- Parameters
source_attribute_name (
str
) – the name of the source attribute to unmapsource_dataset_name (
str
) – the name of the source dataset containing that source attributeunified_attribute_name (
str
) – the unified attribute from which to unmapproject (
Project
) – the project in which to unmap the attribute
- Return type
- Returns
None
- tamr_toolbox.project.mastering.schema.bootstrap_dataset(project, *, source_dataset, force_add_dataset_to_project=False)¶
Bootstraps a dataset (i.e. maps all source columns to themselves)
- Parameters
- Return type
- Returns
List of the AttributeMappings generated
- Raises
RuntimeError – if source_dataset is not part of the given project, set ‘force_add_dataset_to_project’ flag to True to automatically add it
- tamr_toolbox.project.mastering.schema.unmap_dataset(project, *, source_dataset, remove_dataset_from_project=False, skip_if_missing=False)¶
Wholly unmaps a dataset and optionally removes it from a project.
- Parameters
source_dataset (
Dataset
) – the source dataset (Dataset object not a string) to unmapproject (
Project
) – the project in which to unmap the datasetremove_dataset_from_project (
bool
) – boolean to also remove the dataset from the projectskip_if_missing (
bool
) – boolean to skip if dataset is not in project. If set to false and dataset is not in project will raise a RuntimeError
- Return type
- Returns
None
- Raises
RuntimeError – if source_dataset is not in project and skip_if_missing not set to True
Transformations¶
- class tamr_toolbox.project.mastering.transformations.InputTransformation(transformation, datasets=<factory>)¶
A transformation scoped to input datasets
- Version:
Requires Tamr 2020.009.0 or later
- class tamr_toolbox.project.mastering.transformations.TransformationGroup(input_scope=<factory>, unified_scope=<factory>)¶
A group of input transformations and unified transformations
- Version:
Requires Tamr 2020.009.0 or later
- Parameters
input_scope (
List
[InputTransformation
]) – A list of transformation to apply to input datasetsunified_scope (
List
[str
]) – A list of transformation scripts to apply to the unified dataset
- tamr_toolbox.project.mastering.transformations.get_all(project)¶
Get the transformations of a Project
- Version:
Requires Tamr 2020.009.0 or later
- Parameters
project (
Project
) – Project containing transformations- Return type
- Returns
All input transformations and unified transformations of a project
- tamr_toolbox.project.mastering.transformations.set_all(project, tx, *, allow_overwrite=True)¶
Set the transformations of a Project
- Version:
Requires Tamr 2020.009.0 or later
- Parameters
project (
Project
) – Project to place transformations withintx (
TransformationGroup
) – Transformations to put into projectallow_overwrite – Whether existing transformations can be overwritten
- Return type
- Returns
Response object created when transformations of a project are replaced
- Raises
RuntimeError – if allow_overwrite is set to False but transformations already exists in project
ValueError – if provided tx are invalid
- tamr_toolbox.project.mastering.transformations.get_all_unified(project)¶
Get the unified transformations of a Project
- Version:
Requires Tamr 2020.009.0 or later
- tamr_toolbox.project.mastering.transformations.set_all_unified(project, tx, *, allow_overwrite=True)¶
Set the unified transformations of a Project. Any input transformations will not be altered
- Version:
Requires Tamr 2020.009.0 or later
- Parameters
- Return type
- Returns
Response object created when transformations of a project are replaced
- Raises
RuntimeError – if allow_overwrite is set to False but transformations already exists in project