Backup¶
Tasks related to backup and restore of Tamr instances
- tamr_toolbox.workflow.backup.list_backups(client)[source]¶
Lists all backups available to Tamr client. Will list both succeeded and failed backups.
- tamr_toolbox.workflow.backup.get_backup_by_id(client, backup_id)[source]¶
Fetches the json object for a given backup ID.
- Parameters
client (
Client
) – A Tamr client object.backup_id (
str
) – The relativeID corresponding to the desired backup.
- Return type
- Returns
Json dict corresponding to the desired backup.
- Raises
ValueError – Raised if GET request to Tamr fails
- tamr_toolbox.workflow.backup.initiate_backup(client, *, poll_interval_seconds=30, polling_timeout_seconds=None, connection_retry_timeout_seconds=600)[source]¶
Runs a backup of Tamr client and waits until it is finished.
- Parameters
client (
Client
) – A Tamr client objectpoll_interval_seconds (
int
) – Amount of time in between polls of job state.polling_timeout_seconds (
Optional
[int
]) – Amount of time before a timeout error is thrown.connection_retry_timeout_seconds (
int
) – Amount of time before timeout error is thrown during connection retry
- Return type
- Returns
Json dict of response from API request.
- tamr_toolbox.workflow.backup.initiate_restore(client, backup_id, *, polling_timeout_seconds=None, poll_interval_seconds=30, connection_retry_timeout_seconds=600)[source]¶
Restores the Tamr client to the state of the supplied backup.
- Parameters
client (
Client
) – A Tamr client objectbackup_id (
str
) – BackupId of the desired backup.polling_timeout_seconds (
Optional
[int
]) – Amount of time before a timeout error is thrown.poll_interval_seconds (
int
) – Amount of time in between polls of job state.connection_retry_timeout_seconds (
int
) – Amount of time before timeout error is thrown during connection retry
- Return type
- Returns
Json dict of response from API request.
- Raises
ValueError – Raised if the target backup contains errors
RuntimeError – Raised if the restore fails to start
- tamr_toolbox.workflow.backup.validate_backup(directory, *, backup_datetime_format='%Y-%m-%d_%H-%M-%S-%f')[source]¶
Validates that a directory is a valid backup. A valid backup has a manifest file, a completion file (_SUCCEEDED, _FAILED, or _CANCELED), the folder has a valid date format, and the date is prior to the current time
- tamr_toolbox.workflow.backup.delete_backups(*, backups, backup_directory, backup_datetime_format='%Y-%m-%d_%H-%M-%S-%f')[source]¶
Deletes backup folders recursively.
- tamr_toolbox.workflow.backup.classify_backups(backup_directory, *, backup_datetime_format='%Y-%m-%d_%H-%M-%S-%f')[source]¶
Takes stock of successful and failed valid backups in the backup directory.
- Parameters
- Return type
- Returns
JSON dict with the keys “successful” (List of successful backups) and “not_successful” (List of failed or cancelled backups)
- Raises
ValueError – if target backup file contains an error message
- tamr_toolbox.workflow.backup.delete_old_backups(backup_directory, *, num_successful_backups_to_keep, num_failed_backups_to_keep, backup_datetime_format='%Y-%m-%d_%H-%M-%S-%f')[source]¶
- Deletes old backups. Keeps the most recent num_successful_backups_to_keep successful backups
and the most recent num_failed_backups_to_keep failed backups
- Parameters
- Return type
- Returns
A list of deleted backups. Returns None if no backups are deleted.
- Raises
ValueError – if the number of backups to keep is less than 0
- tamr_toolbox.workflow.backup.delete_old_spark_event_logs(tamr_home_directory, *, num_days_to_keep=14)[source]¶
Deletes sparkEventLogs older than the specified number of days. This assumes that Spark is running locally on the same VM as Tamr and that the logs are on the local filesystem.
- Parameters
- Return type
- Returns
A list of deleted sparkEventLogs files
- Raises
ValueError – if num_days_to_keep is less than 0
FileNotFoundError – if sparkEventLogs directory doesn’t exist