Address Validation

This module requires an optional dependency. See Installation for details.

Validate

Tasks related to validation and refresh of address data using Google Maps API

tamr_toolbox.enrichment.address_validation.get_addr_to_validate(input_addresses, addr_mapping, expiration_date_buffer=datetime.timedelta(days=1))[source]

Find addresses not previously validated or validated too long ago.

Parameters
Return type

List[str]

Returns

List of standardized addresses not present as keys of the mapping dictionary

Raises

ValueError – if negative expiration_date_buffer is supplied

tamr_toolbox.enrichment.address_validation.from_list(all_addresses, client, dictionary, *, region_code, enable_usps_cass=False, intermediate_save_every_n=None, intermediate_save_to_disk=False, intermediate_folder='/tmp', expiration_date_buffer=datetime.timedelta(days=1))[source]

Validate a list of addresses.

The validation is saved in a dictionary on your local file system before updating the main dictionary.

Parameters
  • all_addresses (List[Tuple[Optional[str], …]]) – List of addresses to validate

  • client (Client) – a googlemaps api client

  • dictionary (Dict[str, AddressValidationMapping]) – a toolbox validation dictionary

  • region_code (Optional[str]) – optional region code, e.g. ‘US’ or ‘FR’, to pass to the maps API

  • enable_usps_cass (bool) – bool: whether to use USPS validation; only for ‘US’/’PR’ regions

  • intermediate_save_every_n (Optional[int]) – save periodically api_client dictionary every n addresses validated; if not set, will save only at end of processing

  • intermediate_save_to_disk (bool) – decide whether to save periodically the dictionary to disk to avoid loss of validation data if code breaks

  • intermediate_folder (str) – path to folder where dictionary will be save periodically to avoid loss of validation data

  • expiration_date_buffer (timedelta) – re-validate addresses if they are within this period of expiring

Return type

Dict[str, AddressValidationMapping]

Returns

The updated validation dictionary

Address Validation Mappings

Tasks related to creating, updating, saving, and moving address validation data from Tamr

class tamr_toolbox.enrichment.address_mapping.AddressValidationMapping(input_address, validated_formatted_address, expiration, region_code, postal_code, admin_area, locality, address_lines, usps_first_address_line, usps_city_state_zip_line, usps_city, usps_state, usps_zip_code, latitude, longitude, place_id, input_granularity, validation_granularity, geocode_granularity, has_inferred, has_unconfirmed, has_replaced, address_complete)[source]

DataClass for address validation data.

Parameters
  • input_address (str) – input address

  • validated_formatted_address (Optional[str]) – the “formattedAddress” returns by the validation API, if any

  • expiration (str) – the expiration timestamp of the data, 30 days from API call

  • region_code (Optional[str]) – region code returned by the validation API

  • postal_code (Optional[str]) – postal code returned by the validation API

  • admin_area (Optional[str]) – administrative area returned by the validation API (state for US addresses)

  • locality (Optional[str]) – locality returned by the validation API (city/town for US addresses)

  • address_lines (List[str]) – address lines returned by the validation API (e.g. [‘66 Church St’])

  • usps_firstAddressLine – first address line in validated USPS format, if available

  • usps_cityStateZipAddressLine – : second address line in validated USPS format, if available

  • usps_city (Optional[str]) – city in validated USPS format, if available

  • usps_state (Optional[str]) – state in validated USPS format, if available

  • usps_zipCode – str = “

  • latitude (Optional[float]) – latitude associated with validated address, if any

  • longitude (Optional[float]) – longitude associated with validated address, if any

  • place_id (Optional[str]) – the google placeId – the only result field not subject to the expiration

  • input_granularity (Literal[‘GRANULARITY_UNSPECIFIED’, ‘SUB_PREMISE’, ‘PREMISE’, ‘PREMISE_PROMXIMITY’, ‘BLOCK’, ‘ROUTE’, ‘OTHER’]) – granularity of input given by validation API

  • validation_granularity (Literal[‘GRANULARITY_UNSPECIFIED’, ‘SUB_PREMISE’, ‘PREMISE’, ‘PREMISE_PROMXIMITY’, ‘BLOCK’, ‘ROUTE’, ‘OTHER’]) – granularity of validation given by validation API

  • geocode_granularity (Literal[‘GRANULARITY_UNSPECIFIED’, ‘SUB_PREMISE’, ‘PREMISE’, ‘PREMISE_PROMXIMITY’, ‘BLOCK’, ‘ROUTE’, ‘OTHER’]) – granularity of geocode given by validation API

  • has_inferred (bool) – whether the result has inferred components

  • has_unconfirmed (bool) – whether the result has unconfirmed components

  • has_replaced (bool) – whether the result has replaced components

  • address_complete (bool) – whether the input was complete

tamr_toolbox.enrichment.address_mapping.to_dict(dictionary)[source]

Convert a toolbox address validation mapping entries to list-of-dictionary format.

Parameters

dictionary (Dict[str, AddressValidationMapping]) – a toolbox address validation mapping

Return type

List[Dict[str, Union[str, List[str], float, None]]]

Returns

A list of toolbox address validation mapping entries in dictionary format

tamr_toolbox.enrichment.address_mapping.update(main_dictionary, tmp_dictionary)[source]

Update a toolbox address validation mapping with another temporary address validation mapping

Parameters
Return type

None

tamr_toolbox.enrichment.address_mapping.from_dataset(dataset)[source]

Stream an address validation mapping dataset from Tamr.

Parameters

dataset (Dataset) – Tamr Dataset object

Return type

Dict[str, AddressValidationMapping]

Returns

A toolbox address validation mapping

Raises
  • ValueError – if the provided dataset is not a toolbox address validation mapping dataset

  • NameError – if the provided dataset does not contain all the attributes of a toolbox address validation mapping

  • RuntimeError – if there is any other problem while reading the dataset as a toolbox address validation mapping

tamr_toolbox.enrichment.address_mapping.to_dataset(addr_mapping, *, dataset=None, datasets_collection=None, create_dataset=False, dataset_name='address_validation_mapping')[source]

Ingest a toolbox address validation mapping in Tamr, creating the source dataset if needed.

Parameters
  • addr_mapping (Dict[str, AddressValidationMapping]) – a toolbox address validation mapping

  • dataset (Optional[Dataset]) – a Tamr client dataset

  • datasets_collection (Optional[DatasetCollection]) – a Tamr client datasets collection

  • create_dataset (bool) – flag to create or upsert to an existing address validation mapping source dataset

  • dataset_name (str) – name to use if creating new dataset

Return type

str

Returns

The name of the created or updated Tamr Dataset

Raises
  • ValueError – if create_dataset is False and dataset is not provided or is not a toolbox address validation mapping dataset. If create_dataset is True but datasets_collection or target_language or source_language is missing or the Tamr dataset already exists

  • RuntimeError – if there is an error during the creation of the Tamr dataset attributes

tamr_toolbox.enrichment.address_mapping.to_json(dictionary)[source]

Convert a toolbox address validation mapping entries to a json format where set object are converted to list

Parameters

dictionary (Dict[str, AddressValidationMapping]) – a toolbox address validation mapping

Return type

List[str]

Returns

A list of toolbox address validation mapping entries in json format

tamr_toolbox.enrichment.address_mapping.save(addr_mapping, addr_folder, filename='address_validation_mapping.json')[source]

Save a toolbox address validation mapping to disk

Parameters
  • addr_mapping (Dict[str, AddressValidationMapping]) – dictionary object to be saved to disk

  • addr_folder (str) – base directory where mapping is saved

  • filename (str) – filename to use to save

Return type

None

tamr_toolbox.enrichment.address_mapping.load(addr_folder, filename='address_validation_mapping.json')[source]

Load a toolbox address validation mapping from disk to memory

Parameters
  • addr_folder (str) – base directory where mapping is saved

  • filename (str) – filename where mapping is saved

Return type

Dict[str, AddressValidationMapping]

Returns

A toolbox address validation mapping

Raises

RuntimeError – if the file was found on disk but is not of a valid toolbox address validation mapping type