Jobs

Tasks related to running jobs for groups of Tamr projects

tamr_toolbox.workflow.jobs.run(project_list, *, run_apply_feedback=False, run_estimate_pair_counts=False, run_profile_unified_datasets=False, sleep_interval=0)[source]

Run multiple projects in order

Parameters
  • project_list (List[Project]) – A list of Tamr projects

  • run_apply_feedback (bool) – Whether train should be called on the pair matching model or categorization model (based on project type)

  • run_estimate_pair_counts (bool) – Whether an estimate pairs job should be run

  • run_profile_unified_datasets (bool) – Whether unified datasets should be re-profiled

  • sleep_interval (int) – Number of seconds to sleep between job submissions. Useful in some pipeline situations

Return type

List[Operation]

Returns

The operations that were run

Raises

NotImplementedError – Raised if run() is called on an unsupported project type

tamr_toolbox.workflow.jobs.get_upstream_projects(project)[source]

Check for upstream projects associated with a specified project

Parameters

project (Project) – the tamr project for which associated upstream projects are retrieved

Return type

List[Project]

tamr_toolbox.workflow.jobs.get_project_output_datasets(project)[source]

Retrieves datasets produced by a given Tamr project

Parameters

project (Project) – the Tamr project for which associated output datasets are retrieved

Return type

List[Dataset]

Returns

The list of Tamr datasets output from the project