Backup

Tasks related to backup and restore of Tamr instances

tamr_toolbox.workflow.backup.list_backups(client)[source]

Lists all backups available to Tamr client. Will list both succeeded and failed backups.

Parameters

client (Client) – A client object

Return type

Generator[Dict[str, Any], None, None]

Returns

A generator of json dict objects for the backups available to client.

tamr_toolbox.workflow.backup.get_backup_by_id(client, backup_id)[source]

Fetches the json object for a given backup ID.

Parameters
  • client (Client) – A Tamr client object.

  • backup_id (str) – The relativeID corresponding to the desired backup.

Return type

Dict[str, Any]

Returns

Json dict corresponding to the desired backup.

Raises

ValueError – Raised if GET request to Tamr fails

tamr_toolbox.workflow.backup.initiate_backup(client, *, poll_interval_seconds=30, polling_timeout_seconds=None, connection_retry_timeout_seconds=600)[source]

Runs a backup of Tamr client and waits until it is finished.

Parameters
  • client (Client) – A Tamr client object

  • poll_interval_seconds (int) – Amount of time in between polls of job state.

  • polling_timeout_seconds (Optional[int]) – Amount of time before a timeout error is thrown.

  • connection_retry_timeout_seconds (int) – Amount of time before timeout error is thrown during connection retry

Return type

Response

Returns

Json dict of response from API request.

tamr_toolbox.workflow.backup.initiate_restore(client, backup_id, *, polling_timeout_seconds=None, poll_interval_seconds=30, connection_retry_timeout_seconds=600)[source]

Restores the Tamr client to the state of the supplied backup.

Parameters
  • client (Client) – A Tamr client object

  • backup_id (str) – BackupId of the desired backup.

  • polling_timeout_seconds (Optional[int]) – Amount of time before a timeout error is thrown.

  • poll_interval_seconds (int) – Amount of time in between polls of job state.

  • connection_retry_timeout_seconds (int) – Amount of time before timeout error is thrown during connection retry

Return type

Response

Returns

Json dict of response from API request.

Raises
  • ValueError – Raised if the target backup contains errors

  • RuntimeError – Raised if the restore fails to start

tamr_toolbox.workflow.backup.validate_backup(directory, *, backup_datetime_format='%Y-%m-%d_%H-%M-%S-%f')[source]

Validates that a directory is a valid backup. A valid backup has a manifest file, a completion file (_SUCCEEDED, _FAILED, or _CANCELED), the folder has a valid date format, and the date is prior to the current time

Parameters
  • directory (Union[Path, str]) – path to backup directory

  • backup_datetime_format (str) – String datetime format in backup folder name

Return type

bool

Returns

True if directory is a valid backup, otherwise False.

tamr_toolbox.workflow.backup.delete_backups(*, backups, backup_directory, backup_datetime_format='%Y-%m-%d_%H-%M-%S-%f')[source]

Deletes backup folders recursively.

Parameters
  • backups (List[str]) – list of backups to delete

  • backup_directory (Union[Path, str]) – Path to backup directory

  • backup_datetime_format (str) – String datetime format in backup folder name

Return type

List[str]

Returns

list of deleted backup names

tamr_toolbox.workflow.backup.classify_backups(backup_directory, *, backup_datetime_format='%Y-%m-%d_%H-%M-%S-%f')[source]

Takes stock of successful and failed valid backups in the backup directory.

Parameters
  • backup_directory (Union[Path, str]) – Path to backup directory

  • backup_datetime_format (str) – String datetime format in backup folder name

Return type

Dict[str, Any]

Returns

JSON dict with the keys “successful” (List of successful backups) and “not_successful” (List of failed or cancelled backups)

Raises

ValueError – if target backup file contains an error message

tamr_toolbox.workflow.backup.delete_old_backups(backup_directory, *, num_successful_backups_to_keep, num_failed_backups_to_keep, backup_datetime_format='%Y-%m-%d_%H-%M-%S-%f')[source]
Deletes old backups. Keeps the most recent num_successful_backups_to_keep successful backups

and the most recent num_failed_backups_to_keep failed backups

Parameters
  • backup_directory (Union[Path, str]) – Path to backup directory

  • num_successful_backups_to_keep (int) – Number of successful backups to keep

  • num_failed_backups_to_keep (int) – Number of failed or canceled backups to keep

  • backup_datetime_format (str) – String datetime format in backup folder name

Return type

Optional[List[Dict[str, Any]]]

Returns

A list of deleted backups. Returns None if no backups are deleted.

Raises

ValueError – if the number of backups to keep is less than 0

tamr_toolbox.workflow.backup.delete_old_spark_event_logs(tamr_home_directory, *, num_days_to_keep=14)[source]

Deletes sparkEventLogs older than the specified number of days. This assumes that Spark is running locally on the same VM as Tamr and that the logs are on the local filesystem.

Parameters
  • tamr_home_directory (Union[Path, str]) – Path to the Tamr home directory

  • num_days_to_keep (int) – Number of days for which to keep logs

Return type

List[str]

Returns

A list of deleted sparkEventLogs files

Raises