Categorization¶
Jobs¶
Tasks related to running jobs for Tamr Categorization projects
-
tamr_toolbox.project.categorization.jobs.
run
(project, *, run_apply_feedback=False, process_asynchronously=False)[source]¶ Run the project
- Parameters
project (
CategorizationProject
) – The target categorization projectrun_apply_feedback (
bool
) – Whether train should be called on the categorization modelprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
-
tamr_toolbox.project.categorization.jobs.
update_unified_dataset
(project, *, process_asynchronously=False)[source]¶ Updates the unified dataset for a categorization project
- Parameters
project (
CategorizationProject
) – Target categorization projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
-
tamr_toolbox.project.categorization.jobs.
apply_feedback
(project, *, process_asynchronously=False)[source]¶ Trains the model only.
- Parameters
project (
CategorizationProject
) – Target categorization projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
-
tamr_toolbox.project.categorization.jobs.
apply_feedback_and_update_results
(project, *, process_asynchronously=False)[source]¶ Trains the model and updates the categorization predictions of a categorization project
- Parameters
project (
CategorizationProject
) – Target categorization projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
-
tamr_toolbox.project.categorization.jobs.
update_results_only
(project, *, process_asynchronously=False)[source]¶ Updates the categorization predictions based on the existing model of a categorization project
- Parameters
project (
CategorizationProject
) – Target categorization projectprocess_asynchronously (
bool
) – Whether or not to wait for the job to finish before returning - must be set to True for concurrent workflow
- Return type
- Returns
The operations that were run
Metrics¶
Tasks related to metrics for Tamr Categorization projects
-
tamr_toolbox.project.categorization.metrics.
get_tier_confidence
(project, *, tier=- 1, allow_dataset_refresh=False)[source]¶ Extracts tier-specific average confidence from a Tamr internal dataset <unified dataset name>_classifications_average_confidences in a dictionary
- Parameters
- Return type
- Returns
dictionary - keys are category paths, joined by ‘|’ if multi-level taxonomy and values are average confidence of the corresponding keys
- Raises
RuntimeError – if dataset is not streamable and allow_dataset_refresh is False;
TypeError – if tier is not of type int; or if the project type is not classification
ValueError – if tier is less than -1 or equal to 0
Taxonomy Management¶
Tasks related to editing the taxonomy for a tamr categorization project
-
tamr_toolbox.project.categorization.taxonomy.
delete_node
(client, project_id, path, force_delete=False)[source]¶ Deletes a node from a taxonomy.
- Parameters
client (
Client
) – Tamr client connected to target instanceproject_id (
str
) – ID of the categorization projectpath (
list
) – Full path of the node to be deletedforce_delete (
bool
) – Optional flag. Default is false. If true, deletes even if there are stillassigned to that category. If false (records) –
operation fails with an error. (the) –
Returns: None
-
tamr_toolbox.project.categorization.taxonomy.
rename_node
(client, project_id, new_name, path)[source]¶ Renames an existing node in the taxonomy.
- Parameters
Returns: None
-
tamr_toolbox.project.categorization.taxonomy.
create_node
(client, project_id, path)[source]¶ Creates a category with the specified path in the project taxonomy.
- Parameters
Returns: None
-
tamr_toolbox.project.categorization.taxonomy.
get_taxonomy_as_dataframe
(client, project_id)[source]¶ Returns the taxonomy for a project given the project ID.
- Parameters
client (
Client
) – Tamr client connected to target instanceproject_id (
str
) – ID of the categorization project
- Return type
- Returns
Current taxonomy categories as a dataframe
- Raises
RuntimeError – if project is not a categorization project or if the taxonomy does not exist
-
tamr_toolbox.project.categorization.taxonomy.
move_node
(client, project_id, old_node_path, new_node_path, move_verifications=True)[source]¶ Function to move a node in a taxonomy to a new path. By default, the function will also move any verified categorizations under the old node to the new paths.
- Parameters
client (
Client
) – Tamr client connected to the target instance.project_id (
str
) – Project ID of categorization project.old_node_path (
list
) – List of the full path for the node to be moved.new_node_path (
list
) – List of the full path for where the node is to be moved to.move_verifications (
bool
) – Optional boolean argument to move verifications to the new path.to false may result in loss of work. (Setting) –
Returns: None
Schema¶
-
tamr_toolbox.project.categorization.schema.
map_attribute
(project, *, source_attribute_name, source_dataset_name, unified_attribute_name)¶ Maps source_attribute in source_dataset to unified_attribute in unified_dataset. If the mapping already exists it will log a warning and return the existing AttributeMapping from the project’s collection.
- Parameters
- Return type
- Returns
The created AttributeMapping
- Raises
ValueError – if input variables source_attribute_name or source_dataset_name or unified_attribute_name are set to empty strings; or if the dataset source_dataset_name is not found on Tamr; or if source_attribute_name is missing from the attributes of source_attribute_name
-
tamr_toolbox.project.categorization.schema.
unmap_attribute
(project, *, source_attribute_name, source_dataset_name, unified_attribute_name)¶ Unmaps a source attribute.
- Parameters
source_attribute_name (
str
) – the name of the source attribute to unmapsource_dataset_name (
str
) – the name of the source dataset containing that source attributeunified_attribute_name (
str
) – the unified attribute from which to unmapproject (
Project
) – the project in which to unmap the attribute
- Return type
- Returns
None
-
tamr_toolbox.project.categorization.schema.
bootstrap_dataset
(project, *, source_dataset, force_add_dataset_to_project=False)¶ Bootstraps a dataset (i.e. maps all source columns to themselves)
- Parameters
- Return type
- Returns
List of the AttributeMappings generated
- Raises
RuntimeError – if source_dataset is not part of the given project, set ‘force_add_dataset_to_project’ flag to True to automatically add it
-
tamr_toolbox.project.categorization.schema.
unmap_dataset
(project, *, source_dataset, remove_dataset_from_project=False, skip_if_missing=False)¶ Wholly unmaps a dataset and optionally removes it from a project.
- Parameters
source_dataset (
Dataset
) – the source dataset (Dataset object not a string) to unmapproject (
Project
) – the project in which to unmap the datasetremove_dataset_from_project (
bool
) – boolean to also remove the dataset from the projectskip_if_missing (
bool
) – boolean to skip if dataset is not in project. If set to false and dataset is not in project will raise a RuntimeError
- Return type
- Returns
None
- Raises
RuntimeError – if source_dataset is not in project and skip_if_missing not set to True
Transformations¶
-
class
tamr_toolbox.project.categorization.transformations.
InputTransformation
(transformation, datasets=<factory>)¶ A transformation scoped to input datasets
- Version:
Requires Tamr 2020.009.0 or later
-
class
tamr_toolbox.project.categorization.transformations.
TransformationGroup
(input_scope=<factory>, unified_scope=<factory>)¶ A group of input transformations and unified transformations
- Version:
Requires Tamr 2020.009.0 or later
- Parameters
input_scope (
List
[InputTransformation
]) – A list of transformation to apply to input datasetsunified_scope (
List
[str
]) – A list of transformation scripts to apply to the unified dataset
-
tamr_toolbox.project.categorization.transformations.
get_all
(project)¶ Get the transformations of a Project
- Version:
Requires Tamr 2020.009.0 or later
- Parameters
project (
Project
) – Project containing transformations- Return type
- Returns
All input transformations and unified transformations of a project
-
tamr_toolbox.project.categorization.transformations.
set_all
(project, tx, *, allow_overwrite=True)¶ Set the transformations of a Project
- Version:
Requires Tamr 2020.009.0 or later
- Parameters
project (
Project
) – Project to place transformations withintx (
TransformationGroup
) – Transformations to put into projectallow_overwrite – Whether existing transformations can be overwritten
- Return type
- Returns
Response object created when transformations of a project are replaced
- Raises
RuntimeError – if allow_overwrite is set to False but transformations already exists in project
ValueError – if provided tx are invalid
-
tamr_toolbox.project.categorization.transformations.
get_all_unified
(project)¶ Get the unified transformations of a Project
- Version:
Requires Tamr 2020.009.0 or later
-
tamr_toolbox.project.categorization.transformations.
set_all_unified
(project, tx, *, allow_overwrite=True)¶ Set the unified transformations of a Project. Any input transformations will not be altered
- Version:
Requires Tamr 2020.009.0 or later
- Parameters
- Return type
- Returns
Response object created when transformations of a project are replaced
- Raises
RuntimeError – if allow_overwrite is set to False but transformations already exists in project