Utilities

Client

Tasks related to connecting to a Tamr instance

tamr_toolbox.utils.client.health_check(client)[source]

Query the health check API and check if each service is healthy (returns True)

Parameters

client (Client) – the tamr client

Return type

bool

Returns

True if all services are healthy, False if unhealthy

tamr_toolbox.utils.client.create(*, username, password, host, port='9100', protocol='http', enforce_healthy=False)[source]

Creates a Tamr client from the provided configuration values

Parameters
  • username (str) – The username to log access Tamr as

  • password (str) – the password for the user

  • host (str) – The ip address of Tamr

  • port (str) – The port of the Tamr UI

  • protocol (str) – https or http

  • enforce_healthy (bool) – If true will enforce a healthy state upon creation

Return type

Client

Returns

Tamr client

tamr_toolbox.utils.client.get_with_connection_retry(client, api_endpoint, *, timeout_seconds=600, sleep_seconds=20)[source]
Will handle exceptions when attempting to connect to the Tamr API.

This is used to handle connection issues when Tamr restarts due to a restore.

Parameters
  • client (Client) – A Tamr client object

  • api_endpoint (str) – Tamr API endpoint

  • timeout_seconds (int) – Amount of time before a timeout error is thrown. Default is 600 seconds

  • sleep_seconds (int) – Amount of time in between attempts to connect to Tamr.

Return type

Response

Returns

A response object from API request.

tamr_toolbox.utils.client.poll_endpoint(client, api_endpoint, *, poll_interval_seconds=3, polling_timeout_seconds=None, connection_retry_timeout_seconds=600)[source]

Waits until job has a state of Canceled, Succeeded, or Failed.

Parameters
  • client (Client) – A Tamr client object

  • api_endpoint (str) – Tamr API endpoint

  • poll_interval_seconds (int) – Amount of time in between polls of job state.

  • polling_timeout_seconds (Optional[int]) – Amount of time before a timeout error is thrown.

  • connection_retry_timeout_seconds (int) – Amount of time before timeout error is thrown during connection retry.

Return type

Response

Returns

A response object from API request.

Configuration

Tasks related to loading and using configuration files

tamr_toolbox.utils.config.from_yaml(path_to_file, *, default_path_to_file=None)[source]

Reads a yaml file and creates a dictionary. Input values can be retrieved from environment variables

Parameters
  • path_to_file (Union[str, Path, None]) – Path to config yaml file

  • default_path_to_file (Union[str, Path, None]) – Path to use if path_to_file is null or empty

Return type

Dict[str, Any]

Returns

All configuration variables in a dictionary

Logging

Tasks related to logging within scripts

tamr_toolbox.utils.logger.create(name, *, log_to_terminal=True, log_directory=None, log_prefix='', date_format='%Y-%m-%d')[source]

Return logger object with pre-defined format. Log file will be located under log_directory with file name <log_prefix>_<date>.log, quashing extra separating underscores. Defaults to <date>.log.

Parameters
  • name (str) – This sets the name of your logger instance. It does not affect the file name. To change the filename use log_prefix

  • log_to_terminal (bool) – Boolean indicating whether or not to log messages to the terminal.

  • log_directory (Optional[str]) – The directory to place log files inside

  • log_prefix (str) – The string to prepend to the date in the log file name.

  • date_format (str) – format string for date suffix on log file name

Return type

Logger

Returns

Logger object

tamr_toolbox.utils.logger.set_logging_level(logger_name, level)[source]

A useful method for setting logging level for all a given logger and its handlers.

Parameters
  • logger_name (str) – the name of the logger for which to set the level

  • level (str) – log level to use. The set available from core logging package is ‘debug’, ‘info’, ‘warning’, ‘error’

Return type

None

tamr_toolbox.utils.logger.enable_package_logging(package_name, *, log_directory=None, level=None, log_prefix='', date_format='%Y-%m-%d')[source]

A helper function to enable package logging for any package following python best practices for logging names (i.e. logger name == package.module.submodule).

Parameters
  • package_name (str) – the name of the package for which to enable logging

  • log_directory (Optional[str]) – optional log directory which the package will write logs

  • level (Optional[str]) – optional level to specify, default is WARNING (inherited from base logging package)

  • log_prefix (str) – Optional prefix for log files, if None will be blank string

  • date_format (str) – Optional date format for log file

Return type

None

tamr_toolbox.utils.logger.enable_toolbox_logging(*, log_directory=None, level=None, log_prefix='', date_format='%Y-%m-%d')[source]

A simple wrapper to enable_package_logging to give friendly call for users.

Parameters
  • log_directory (Optional[str]) – optional directory to which to write tamr_toolbox logs

  • level (Optional[str]) – Optional logging level to specify, default is WARNING (inherited from base logging package)

  • log_prefix (str) – Optional prefix for log files, if None will be blank string

  • date_format (str) – Optional date format for log file

Return type

None

Operation

Tasks related to Tamr operations (or jobs)

tamr_toolbox.utils.operation.enforce_success(operation)[source]

Raises an error if an operation fails

Parameters

operation (Operation) – A Tamr operation

Return type

None

tamr_toolbox.utils.operation.from_resource_id(tamr, *, job_id)[source]

Create an operation from a job id

Parameters
  • tamr (Client) – A Tamr client

  • job_id (Union[int, str]) – A job ID

Return type

Operation

Returns

A Tamr operation

tamr_toolbox.utils.operation.get_latest(tamr)[source]

Get the latest operation

Parameters

tamr (Client) – A Tamr client

Return type

Operation

Returns

The latest job

tamr_toolbox.utils.operation.get_details(*, operation)[source]

Return a text describing the information of a job

Parameters

operation (Operation) – A Tamr operation

Return type

str

Returns

A text describing the information of a job

tamr_toolbox.utils.operation.get_all(tamr)[source]

Get a list of all jobs or operations.

Parameters

tamr (Client) – A Tamr client

Return type

List[Operation]

Returns

A list of Operation objects.

tamr_toolbox.utils.operation.get_active(tamr)[source]

Get a list of pending and running jobs.

Parameters

tamr (Client) – A Tamr client

Return type

List[Operation]

Returns

A list of Operations objects

Testing

Tasks related to testing code

tamr_toolbox.utils.testing.mock_api(*, response_logs_dir=None, enforce_online_test=False)[source]

Decorator for pytest tests that mocks API requests by reading a file of pre-generated responses. Will generate responses file based on a real connection if pre-generated responses are not found.

Parameters
  • response_logs_dir (Union[str, Path, None]) – Directory to read/write response logs

  • enforce_online_test – Whether an online test should be run, even if a response log already exists

Return type

Callable

Returns

Decorated function