Utilities¶

Client¶

Tasks related to connecting to a Tamr instance

tamr_toolbox.utils.client.health_check(client)[source]¶

Query the health check API and check if each service is healthy (returns True)

Parameters: client (Client) – the tamr client
Return type: bool
Returns: True if all services are healthy, False if unhealthy

tamr_toolbox.utils.client.create(*, username, password, host, port=9100, protocol='http', base_path='/api/versioned/v1/', session=None, store_auth_cookie=False, enforce_healthy=False)[source]¶

Creates a Tamr client from the provided configuration values

Parameters

username (str) – The username to log access Tamr as
password (str) – the password for the user
host (str) – The ip address of Tamr
port (Union[str, int, None]) – The port of the Tamr UI. Pass a value of None to specify an address with no port
protocol (str) – https or http
base_path (str) – Optional argument to specify a different base path
session (Optional[Session]) – Optional argument to pass an existing requests Session
store_auth_cookie (bool) – If true will allow Tamr authentication cookie to be stored and reused
enforce_healthy (bool) – If true will enforce a healthy state upon creation

Return type

Client

Returns

Tamr client

tamr_toolbox.utils.client.create_with_jwt(*, token, host, port=9100, protocol='http', base_path='/api/versioned/v1/', session=None, store_auth_cookie=False, enforce_healthy=False)[source]¶

Creates a Tamr client from the provided configuration values using a JWT token instead of a username and password. Note that this feature is only available on v2022.010.0 or later.

Parameters

token (str) – A JWT token to authenticate the client
host (str) – The ip address of Tamr
port (Union[str, int, None]) – The port of the Tamr UI. Pass a value of None to specify an address with no port
protocol (str) – https or http
base_path (str) – Optional argument to specify a different base path
session (Optional[Session]) – Optional argument to pass an existing requests Session
store_auth_cookie (bool) – If true will allow Tamr authentication cookie to be stored and reused
enforce_healthy (bool) – If true will enforce a healthy state upon creation

Return type

Client

Returns

Tamr client

tamr_toolbox.utils.client.get_with_connection_retry(client, api_endpoint, *, timeout_seconds=600, sleep_seconds=20)[source]¶

Will handle exceptions when attempting to connect to the Tamr API.: This is used to handle connection issues when Tamr restarts due to a restore.

Parameters

client (Client) – A Tamr client object
api_endpoint (str) – Tamr API endpoint
timeout_seconds (int) – Amount of time before a timeout error is thrown. Default is 600 seconds
sleep_seconds (int) – Amount of time in between attempts to connect to Tamr.

Return type

Response

Returns

A response object from API request.

tamr_toolbox.utils.client.poll_endpoint(client, api_endpoint, *, poll_interval_seconds=3, polling_timeout_seconds=None, connection_retry_timeout_seconds=600)[source]¶

Waits until job has a state of Canceled, Succeeded, or Failed.

Parameters

client (Client) – A Tamr client object
api_endpoint (str) – Tamr API endpoint
poll_interval_seconds (int) – Amount of time in between polls of job state.
polling_timeout_seconds (Optional[int]) – Amount of time before a timeout error is thrown.
connection_retry_timeout_seconds (int) – Amount of time before timeout error is thrown during connection retry.

Return type

Response

Returns

A response object from API request.

Configuration¶

Tasks related to loading and using configuration files

tamr_toolbox.utils.config.from_yaml(path_to_file, *, default_path_to_file=None)[source]¶

Reads a yaml file and creates a dictionary. Input values can be retrieved from environment variables

Parameters

path_to_file (Union[str, Path, None]) – Path to config yaml file
default_path_to_file (Union[str, Path, None]) – Path to use if path_to_file is null or empty

Return type

Dict[str, Any]

Returns

All configuration variables in a dictionary

Custom Button¶

Helper functions related to creating & managing custom UI buttons as yaml files.

Due to how Tamr custom buttons are configured, these functions will need to be run on the actual server on which Tamr is installed to work as expected.

Important: Custom buttons are only available to versions 2022.008.0 and later

tamr_toolbox.utils.custom_button.create_redirect_button(*, extension_name, button_id, button_text, page_names, redirect_url, open_in_new_tab, output_dir, button_name)[source]¶

Create yaml file with all required attributes for a ‘REDIRECT’ UI button. Yaml file is saved locally.

Button features are only available to versions 2022.008.0 and later.

Parameters

extension_name (str) – Name of button extension
button_id (str) – A short identifier for the button to use in the, body of a POST call or a redirect URL path substitution.
button_text (str) – The button label to display in the UI.
page_names (List[str]) – The pages of the UI on which to display the button.
redirect_url (str) – The URL that the browser should load
open_in_new_tab (bool) – If true, the specified URL opens in a new browser tab.
output_dir (str) – Directory to save yaml file (absolute path)
button_name (str) – Name of yaml file

Return type

str

Returns

Path to yaml file created

tamr_toolbox.utils.custom_button.create_post_button(*, extension_name, button_id, button_text, page_names, post_url, post_body_keys, success_message, fail_message, display_response, output_dir, button_name)[source]¶

Create yaml file with all required attributes for a ‘POST’ UI button. Yaml file is saved locally.

Button features are only available to versions 2022.008.0 and later.

Parameters

extension_name (str) – Name of button extension
button_id (str) – A short identifier for the button to use in the, body of a POST call or a redirect URL path substitution.
button_text (str) – The button label to display in the UI.
page_names (List[str]) – The pages of the UI on which to display the button.
post_url (str) – The target URL for a POST API call
post_body_keys (List[str]) – Specifies the keys to request in the body of the POST call
success_message (str) – The message that displays to the user when the POST call succeeds.
fail_message (str) – The message that displays to the user when the POST call fails.
display_response (bool) – Whether the contents of the API response body should display to the user.
output_dir (str) – Directory to save yaml file (absolute path)
button_name (str) – Name of yaml file

Return type

str

Returns

Path to yaml file created

tamr_toolbox.utils.custom_button.create_button_extension(*, extension_name, buttons, output_dir)[source]¶

Given a list of button yaml files, save it as a grouped extension yaml file. Yaml file is saved locally. Button features are only available to versions 2022.008.0 and later.

Parameters

extension_name (str) – Name of button extension to save
buttons (List[str]) – List of button yaml files (absolute paths)
output_dir (str) – directory in which to save yaml extension file (absolute path)

Return type

str

Returns

Path to yaml file created

tamr_toolbox.utils.custom_button.create_button_extension_from_list(*, extension_name, output_dir, buttons)[source]¶

Given a list of button dictionaries, save it as a grouped extension yaml file. Yaml file is saved locally.

Button features are only available to versions 2022.008.0 and later.

Parameters

extension_name (str) – Name of button extension to save
output_dir (str) – directory in which to save yaml extension file (absolute path)
buttons (List[dict]) – List of button dictionaries. Either redirect or post.
examples (Format) –
--- –
redirect –
{ – “buttonType”: “redirectButton”, “buttonId”: button_id, “buttonText”: button_text, “pageNames”: page_names, “redirectUrl”: redirect_url, “openInNewTab”: open_in_new_tab
} –
--- –
post –
{ – “buttonType”: “postButton”, “buttonId”: button_id, “buttonText”: button_text, “pageNames”: page_names, “postUrl”: post_url, “postBodyKeys”: post_body_keys, “successMessage”: success_message, “failMessage”: fail_message, “displayResponse”: display_response
} –
--- –

Return type

str

Returns

Path to yaml file created

tamr_toolbox.utils.custom_button.register_buttons(*, tamr_client, buttons, tamr_install_dir, remote_client=None, impersonation_username=None, impersonation_password=None)[source]¶

Registers a list of button(s) in a Tamr instance. Requires Tamr restart to display buttons in UI.

Important: If NOT running this function using a remote client, this function must: be run on the server on which Tamr is installed.

Runs in a remote environment if an ssh client is specified otherwise runs in the local shell. If an impersonation_username is provided, the command is run as the provided user. If an impersonation_password is provided, password authentication is used for impersonation, otherwise sudo is used. Button features are only available to versions 2022.008.0 and later.

Version:: Requires Tamr 2022.008.0 or later

Parameters

tamr_client (Client) – Tamr Client object
buttons (Union[str, List[str]]) – An individual string or a list of yaml files (absolute paths) with button configs
tamr_install_dir (str) – Full path to directory where Tamr is installed
remote_client (Optional[SSHClient]) – An ssh client providing a remote connection
impersonation_username (Optional[str]) – A bash user to run the command as, this should be the tamr install user
impersonation_password (Optional[str]) – The password for the impersonation_username

Returns:

tamr_toolbox.utils.custom_button.delete_buttons(*, button_files, tamr_install_dir)[source]¶

Given a list of button yaml files, delete them thus removing the button from UI.

NB: Registered buttons are located in $TAMR_HOME/tamr/auxiliary-sevrices/conf
Requires restart of Tamr to register deletion. Button features are only available to versions 2022.008.0 and later.

Parameters

button_files (Union[str, List[str]]) – Individual string or list of button yaml files (absolute paths)
tamr_install_dir (str) – Full path to directory where Tamr is installed (absolute path)

Returns:

Logging¶

Tasks related to logging within scripts

tamr_toolbox.utils.logger.create(name, *, log_to_terminal=True, log_directory=None, log_prefix='', date_format='%Y-%m-%d')[source]¶

Return logger object with pre-defined format. Log file will be located under log_directory with file name <log_prefix>_<date>.log, quashing extra separating underscores. Defaults to <date>.log.

For use in scripts only. To log in module files, use the standard library logging module with a module-level logger and enable package logging. See https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial

>>> log = logging.getLogger(__name__)

Parameters

name (str) – This sets the name of your logger instance. It does not affect the file name. To change the filename use log_prefix
log_to_terminal (bool) – Boolean indicating whether or not to log messages to the terminal.
log_directory (Optional[str]) – The directory to place log files inside
log_prefix (str) – The string to prepend to the date in the log file name.
date_format (str) – format string for date suffix on log file name

Return type

Logger

Returns

Logger object

tamr_toolbox.utils.logger.set_logging_level(logger_name, level)[source]¶

A useful method for setting logging level for all a given logger and its handlers.

Parameters

logger_name (str) – the name of the logger for which to set the level
level (str) – log level to use. The set available from core logging package is ‘debug’, ‘info’, ‘warning’, ‘error’

Return type

None

tamr_toolbox.utils.logger.enable_package_logging(package_name, *, log_to_terminal=True, log_directory=None, level=None, log_prefix='', date_format='%Y-%m-%d')[source]¶

A helper function to enable package logging for any package following python best practices for logging names (i.e. logger name == package.module.submodule).

Parameters

package_name (str) – the name of the package for which to enable logging
log_to_terminal (bool) – Boolean indicating whether or not to log messages to the terminal
log_directory (Optional[str]) – optional log directory which the package will write logs
level (Optional[str]) – optional level to specify, default is WARNING (inherited from base logging package)
log_prefix (str) – Optional prefix for log files, if None will be blank string
date_format (str) – Optional date format for log file

Return type

None

tamr_toolbox.utils.logger.enable_toolbox_logging(*, log_to_terminal=True, log_directory=None, level=None, log_prefix='', date_format='%Y-%m-%d')[source]¶

A simple wrapper to enable_package_logging to give friendly call for users.

Parameters

log_to_terminal (bool) – Boolean indicating whether or not to log messages to the terminal
log_directory (Optional[str]) – optional directory to which to write tamr_toolbox logs
level (Optional[str]) – Optional logging level to specify, default is WARNING (inherited from base logging package)
log_prefix (str) – Optional prefix for log files, if None will be blank string
date_format (str) – Optional date format for log file

Return type

None

Operation¶

Tasks related to Tamr operations (or jobs)

tamr_toolbox.utils.operation.enforce_success(operation)[source]¶

Raises an error if an operation fails

Parameters: operation (Operation) – A Tamr operation
Return type: None

tamr_toolbox.utils.operation.from_resource_id(tamr, *, job_id)[source]¶

Create an operation from a job id

Parameters

tamr (Client) – A Tamr client
job_id (Union[int, str]) – A job ID

Return type

Operation

Returns

A Tamr operation

tamr_toolbox.utils.operation.get_latest(tamr)[source]¶

Get the latest operation

Parameters: tamr (Client) – A Tamr client
Return type: Operation
Returns: The latest job

tamr_toolbox.utils.operation.get_details(*, operation)[source]¶

Return a text describing the information of a job

Parameters: operation (Operation) – A Tamr operation
Return type: str
Returns: A text describing the information of a job

tamr_toolbox.utils.operation.get_all(tamr)[source]¶

Get a list of all jobs or operations.

Parameters: tamr (Client) – A Tamr client
Return type: List[Operation]
Returns: A list of Operation objects.

tamr_toolbox.utils.operation.get_active(tamr)[source]¶

Get a list of pending and running jobs.

Parameters: tamr (Client) – A Tamr client
Return type: List[Operation]
Returns: A list of Operations objects

tamr_toolbox.utils.operation.wait(operation, *, poll_interval_seconds=3, timeout_seconds=None)[source]¶

Continuously polls for this operation’s server-side state.

Parameters

operation (Operation) – Operation to be polled.
poll_interval_seconds (int) – Time interval (in seconds) between subsequent polls.
timeout_seconds (Optional[int]) – Time (in seconds) to wait for operation to resolve.

Raises

TimeoutError – If operation takes longer than timeout_seconds to resolve.

Return type

Operation

tamr_toolbox.utils.operation.monitor(operation, *, poll_interval_seconds=1, timeout_seconds=300)[source]¶

Continuously polls for this operation’s server-side state and returns operation when there is a state change

Parameters

operation (Operation) – Operation to be monitored.
poll_interval_seconds (float) – Time interval (in seconds) between subsequent polls.
timeout_seconds (float) – Time (in seconds) to wait for operation to resolve.

Raises

TimeoutError – If operation takes longer than timeout_seconds to resolve.

Return type

Operation

tamr_toolbox.utils.operation.safe_estimate_counts(project)[source]¶

Run the estimate counts job of project that works if it is the first :param project: A Tamr project object

Return type: Operation
Returns: An operation object for the estimate pairs job

Testing¶

Tasks related to testing code

tamr_toolbox.utils.testing.mock_api(*, response_logs_dir=None, enforce_online_test=False, asynchronous=False)[source]¶

Decorator for pytest tests that mocks API requests by reading a file of pre-generated responses. Will generate responses file based on a real connection if pre-generated responses are not found.

Parameters

response_logs_dir (Union[str, Path, None]) – Directory to read/write response logs
enforce_online_test (bool) – Whether an online test should be run, even if a response log already exists
asynchronous (bool) – Whether or not to wait for Operations called during the running of tests

Return type

Callable

Returns

Decorated function

Downstream¶

tamr_toolbox.utils.downstream.datasets(dataset, *, include_dependencies_by_name=False)[source]¶

Returns a dataset’s downstream datasets.

Parameters

dataset (Dataset) – The target dataset.
include_dependencies_by_name (bool) – Whether to include datasets based on name similarity. No dependencies will be found by name if the dataset is not an unified dataset either based on backened pipeline (if project still exists) or based on regex (dataset name has suffix ‘unified_dataset’).

Return type

List[Dataset]

Returns

List of Dataset objects ordered by number of its downstream dependencies.: Note that there can be bidirectional dependency so datasets with same number of dependencies can depend on each other.

tamr_toolbox.utils.downstream.projects(dataset, *, include_dependencies_by_name=False)[source]¶

Return list of downstream project_list for a dataset.

Parameters

dataset (Dataset) – The target dataset.
include_dependencies_by_name (bool) – Whether to include datasets based on name similarity. No dependencies will be found by name if the dataset is not an unified dataset either based on backened pipeline (if project still exists) or based on regex (dataset name has suffix ‘unified_dataset’).

Return type

List[Project]

Returns

List of downstream project_list in order,: including the project the target dataset is part of.

Upstream¶

Functions related to projects upstream of a specified project

tamr_toolbox.utils.upstream.datasets(dataset)[source]¶

Check for upstream datasets associated with a specified dataset

Parameters: dataset (Dataset) – the Tamr dataset for which associated upstream datasets are retrieved
Return type: List[Dataset]
Returns: List of Tamr datasets upstream of the target dataset

tamr_toolbox.utils.upstream.projects(project)[source]¶

Check for upstream projects associated with a specified project

Parameters: project (Project) – the tamr project for which associated upstream projects are retrieved
Return type: List[Project]
Returns: List of tamr projects upstream of the target project

Version¶

Tasks related to the version of Tamr instances

tamr_toolbox.utils.version.current(client)[source]¶

Gets the version of Tamr for provided client

Parameters: client (Client) – Tamr client
Return type: str
Returns: String representation of Tamr version

tamr_toolbox.utils.version.is_version_condition_met(*, tamr_version, min_version, max_version=None, exact_version=False, raise_error=False)[source]¶

Check if Tamr version is valid.

Parameters

tamr_version (str) – The version of Tamr being considered
min_version (str) – The earliest version of Tamr
max_version (Optional[str]) – The latest version of Tamr. Default None, in which case no max version is tested for.
exact_version (bool) – Compare against only one release of Tamr. Default is False
raise_error (bool) – If True, raise an error if the version condition is not met. Default is False.

Raises

ValueError – if min_version is greater than max_version
EnvironmentError – if raise_error is True, and the condition is not met

Notes

Patch versions (major.minor.patch) are excluded from the comparison If exact_version is True, max_version will be ignored