Utilities¶
Client¶
Tasks related to connecting to a Tamr instance
-
tamr_toolbox.utils.client.
health_check
(client)[source]¶ Query the health check API and check if each service is healthy (returns True)
- Parameters
client (
Client
) – the tamr client- Return type
- Returns
True if all services are healthy, False if unhealthy
-
tamr_toolbox.utils.client.
create
(*, username, password, host, port=9100, protocol='http', store_auth_cookie=False, enforce_healthy=False)[source]¶ Creates a Tamr client from the provided configuration values
- Parameters
username (
str
) – The username to log access Tamr aspassword (
str
) – the password for the userhost (
str
) – The ip address of Tamrport (
Union
[str
,int
,None
]) – The port of the Tamr UI. Pass a value of None to specify an address with no portprotocol (
str
) – https or httpstore_auth_cookie (
bool
) – If true will allow Tamr authentication cookie to be stored and reusedenforce_healthy (
bool
) – If true will enforce a healthy state upon creation
- Return type
Client
- Returns
Tamr client
-
tamr_toolbox.utils.client.
get_with_connection_retry
(client, api_endpoint, *, timeout_seconds=600, sleep_seconds=20)[source]¶ - Will handle exceptions when attempting to connect to the Tamr API.
This is used to handle connection issues when Tamr restarts due to a restore.
- Parameters
- Return type
Response
- Returns
A response object from API request.
-
tamr_toolbox.utils.client.
poll_endpoint
(client, api_endpoint, *, poll_interval_seconds=3, polling_timeout_seconds=None, connection_retry_timeout_seconds=600)[source]¶ Waits until job has a state of Canceled, Succeeded, or Failed.
- Parameters
client (
Client
) – A Tamr client objectapi_endpoint (
str
) – Tamr API endpointpoll_interval_seconds (
int
) – Amount of time in between polls of job state.polling_timeout_seconds (
Optional
[int
]) – Amount of time before a timeout error is thrown.connection_retry_timeout_seconds (
int
) – Amount of time before timeout error is thrown during connection retry.
- Return type
Response
- Returns
A response object from API request.
Configuration¶
Tasks related to loading and using configuration files
Logging¶
Tasks related to logging within scripts
-
tamr_toolbox.utils.logger.
create
(name, *, log_to_terminal=True, log_directory=None, log_prefix='', date_format='%Y-%m-%d')[source]¶ Return logger object with pre-defined format. Log file will be located under log_directory with file name <log_prefix>_<date>.log, quashing extra separating underscores. Defaults to <date>.log.
For use in scripts only. To log in module files, use the standard library logging module with a module-level logger and enable package logging. See https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial
>>> log = logging.getLogger(__name__)
- Parameters
name (
str
) – This sets the name of your logger instance. It does not affect the file name. To change the filename use log_prefixlog_to_terminal (
bool
) – Boolean indicating whether or not to log messages to the terminal.log_directory (
Optional
[str
]) – The directory to place log files insidelog_prefix (
str
) – The string to prepend to the date in the log file name.date_format (
str
) – format string for date suffix on log file name
- Return type
- Returns
Logger object
-
tamr_toolbox.utils.logger.
set_logging_level
(logger_name, level)[source]¶ A useful method for setting logging level for all a given logger and its handlers.
-
tamr_toolbox.utils.logger.
enable_package_logging
(package_name, *, log_to_terminal=True, log_directory=None, level=None, log_prefix='', date_format='%Y-%m-%d')[source]¶ A helper function to enable package logging for any package following python best practices for logging names (i.e. logger name == package.module.submodule).
- Parameters
package_name (
str
) – the name of the package for which to enable logginglog_to_terminal (
bool
) – Boolean indicating whether or not to log messages to the terminallog_directory (
Optional
[str
]) – optional log directory which the package will write logslevel (
Optional
[str
]) – optional level to specify, default is WARNING (inherited from base logging package)log_prefix (
str
) – Optional prefix for log files, if None will be blank stringdate_format (
str
) – Optional date format for log file
- Return type
-
tamr_toolbox.utils.logger.
enable_toolbox_logging
(*, log_to_terminal=True, log_directory=None, level=None, log_prefix='', date_format='%Y-%m-%d')[source]¶ A simple wrapper to enable_package_logging to give friendly call for users.
- Parameters
log_to_terminal (
bool
) – Boolean indicating whether or not to log messages to the terminallog_directory (
Optional
[str
]) – optional directory to which to write tamr_toolbox logslevel (
Optional
[str
]) – Optional logging level to specify, default is WARNING (inherited from base logging package)log_prefix (
str
) – Optional prefix for log files, if None will be blank stringdate_format (
str
) – Optional date format for log file
- Return type
Operation¶
Tasks related to Tamr operations (or jobs)
-
tamr_toolbox.utils.operation.
enforce_success
(operation)[source]¶ Raises an error if an operation fails
- Parameters
operation (
Operation
) – A Tamr operation- Return type
-
tamr_toolbox.utils.operation.
from_resource_id
(tamr, *, job_id)[source]¶ Create an operation from a job id
-
tamr_toolbox.utils.operation.
get_latest
(tamr)[source]¶ Get the latest operation
- Parameters
tamr (
Client
) – A Tamr client- Return type
Operation
- Returns
The latest job
-
tamr_toolbox.utils.operation.
get_details
(*, operation)[source]¶ Return a text describing the information of a job
- Parameters
operation (
Operation
) – A Tamr operation- Return type
- Returns
A text describing the information of a job
-
tamr_toolbox.utils.operation.
get_all
(tamr)[source]¶ Get a list of all jobs or operations.
- Parameters
tamr (
Client
) – A Tamr client- Return type
List
[Operation
]- Returns
A list of Operation objects.
-
tamr_toolbox.utils.operation.
get_active
(tamr)[source]¶ Get a list of pending and running jobs.
- Parameters
tamr (
Client
) – A Tamr client- Return type
List
[Operation
]- Returns
A list of Operations objects
-
tamr_toolbox.utils.operation.
wait
(operation, *, poll_interval_seconds=3, timeout_seconds=None)[source]¶ Continuously polls for this operation’s server-side state.
- Parameters
- Raises
TimeoutError – If operation takes longer than timeout_seconds to resolve.
- Return type
Operation
-
tamr_toolbox.utils.operation.
monitor
(operation, *, poll_interval_seconds=1, timeout_seconds=300)[source]¶ Continuously polls for this operation’s server-side state and returns operation when there is a state change
- Parameters
- Raises
TimeoutError – If operation takes longer than timeout_seconds to resolve.
- Return type
Operation
Testing¶
Tasks related to testing code
Downstream¶
-
tamr_toolbox.utils.downstream.
datasets
(dataset, *, include_dependencies_by_name=False)[source]¶ Returns a dataset’s downstream datasets.
- Parameters
dataset (
Dataset
) – The target dataset.include_dependencies_by_name (
bool
) – Whether to include datasets based on name similarity. No dependencies will be found by name if the dataset is not an unified dataset either based on backened pipeline (if project still exists) or based on regex (dataset name has suffix ‘unified_dataset’).
- Return type
List
[Dataset
]- Returns
- List of Dataset objects ordered by number of its downstream dependencies.
Note that there can be bidirectional dependency so datasets with same number of dependencies can depend on each other.
-
tamr_toolbox.utils.downstream.
projects
(dataset, *, include_dependencies_by_name=False)[source]¶ Return list of downstream project_list for a dataset.
- Parameters
dataset (
Dataset
) – The target dataset.include_dependencies_by_name (
bool
) – Whether to include datasets based on name similarity. No dependencies will be found by name if the dataset is not an unified dataset either based on backened pipeline (if project still exists) or based on regex (dataset name has suffix ‘unified_dataset’).
- Return type
List
[Project
]- Returns
- List of downstream project_list in order,
including the project the target dataset is part of.