Address Validation¶
This module requires an optional dependency. See Installation for details.
Validate¶
Tasks related to validation and refresh of address data using Google Maps API
- tamr_toolbox.enrichment.address_validation.get_addr_to_validate(input_addresses, addr_mapping, expiration_date_buffer=datetime.timedelta(days=1))[source]¶
Find addresses not previously validated or validated too long ago.
- Parameters
- Return type
- Returns
List of standardized addresses not present as keys of the mapping dictionary
- Raises
ValueError – if negative expiration_date_buffer is supplied
- tamr_toolbox.enrichment.address_validation.from_list(all_addresses, client, dictionary, *, region_code, enable_usps_cass=False, intermediate_save_every_n=None, intermediate_save_to_disk=False, intermediate_folder='/tmp', expiration_date_buffer=datetime.timedelta(days=1))[source]¶
Validate a list of addresses.
The validation is saved in a dictionary on your local file system before updating the main dictionary.
- Parameters
all_addresses (
List
[Tuple
[Optional
[str
], …]]) – List of addresses to validateclient (
Client
) – a googlemaps api clientdictionary (
Dict
[str
,AddressValidationMapping
]) – a toolbox validation dictionaryregion_code (
Optional
[str
]) – optional region code, e.g. ‘US’ or ‘FR’, to pass to the maps APIenable_usps_cass (
bool
) – bool: whether to use USPS validation; only for ‘US’/’PR’ regionsintermediate_save_every_n (
Optional
[int
]) – save periodically api_client dictionary every n addresses validated; if not set, will save only at end of processingintermediate_save_to_disk (
bool
) – decide whether to save periodically the dictionary to disk to avoid loss of validation data if code breaksintermediate_folder (
str
) – path to folder where dictionary will be save periodically to avoid loss of validation dataexpiration_date_buffer (
timedelta
) – re-validate addresses if they are within this period of expiring
- Return type
- Returns
The updated validation dictionary
Address Validation Mappings¶
Tasks related to creating, updating, saving, and moving address validation data from Tamr
- class tamr_toolbox.enrichment.address_mapping.AddressValidationMapping(input_address, validated_formatted_address, expiration, region_code, postal_code, admin_area, locality, address_lines, usps_first_address_line, usps_city_state_zip_line, usps_city, usps_state, usps_zip_code, latitude, longitude, place_id, input_granularity, validation_granularity, geocode_granularity, has_inferred, has_unconfirmed, has_replaced, address_complete)[source]¶
DataClass for address validation data.
- Parameters
input_address (
str
) – input addressvalidated_formatted_address (
Optional
[str
]) – the “formattedAddress” returns by the validation API, if anyexpiration (
str
) – the expiration timestamp of the data, 30 days from API callregion_code (
Optional
[str
]) – region code returned by the validation APIpostal_code (
Optional
[str
]) – postal code returned by the validation APIadmin_area (
Optional
[str
]) – administrative area returned by the validation API (state for US addresses)locality (
Optional
[str
]) – locality returned by the validation API (city/town for US addresses)address_lines (
List
[str
]) – address lines returned by the validation API (e.g. [‘66 Church St’])usps_firstAddressLine – first address line in validated USPS format, if available
usps_cityStateZipAddressLine – : second address line in validated USPS format, if available
usps_city (
Optional
[str
]) – city in validated USPS format, if availableusps_state (
Optional
[str
]) – state in validated USPS format, if availableusps_zipCode – str = “
latitude (
Optional
[float
]) – latitude associated with validated address, if anylongitude (
Optional
[float
]) – longitude associated with validated address, if anyplace_id (
Optional
[str
]) – the google placeId – the only result field not subject to the expirationinput_granularity (
Literal
[‘GRANULARITY_UNSPECIFIED’, ‘SUB_PREMISE’, ‘PREMISE’, ‘PREMISE_PROMXIMITY’, ‘BLOCK’, ‘ROUTE’, ‘OTHER’]) – granularity of input given by validation APIvalidation_granularity (
Literal
[‘GRANULARITY_UNSPECIFIED’, ‘SUB_PREMISE’, ‘PREMISE’, ‘PREMISE_PROMXIMITY’, ‘BLOCK’, ‘ROUTE’, ‘OTHER’]) – granularity of validation given by validation APIgeocode_granularity (
Literal
[‘GRANULARITY_UNSPECIFIED’, ‘SUB_PREMISE’, ‘PREMISE’, ‘PREMISE_PROMXIMITY’, ‘BLOCK’, ‘ROUTE’, ‘OTHER’]) – granularity of geocode given by validation APIhas_inferred (
bool
) – whether the result has inferred componentshas_unconfirmed (
bool
) – whether the result has unconfirmed componentshas_replaced (
bool
) – whether the result has replaced componentsaddress_complete (
bool
) – whether the input was complete
- tamr_toolbox.enrichment.address_mapping.to_dict(dictionary)[source]¶
Convert a toolbox address validation mapping entries to list-of-dictionary format.
- tamr_toolbox.enrichment.address_mapping.update(main_dictionary, tmp_dictionary)[source]¶
Update a toolbox address validation mapping with another temporary address validation mapping
- Parameters
main_dictionary (
Dict
[str
,AddressValidationMapping
]) – the main toolbox address validation mapping containing prior resultstmp_dictionary (
Dict
[str
,AddressValidationMapping
]) – a temporary toolbox address validation mapping containing new data
- Return type
- tamr_toolbox.enrichment.address_mapping.from_dataset(dataset)[source]¶
Stream an address validation mapping dataset from Tamr.
- Parameters
dataset (
Dataset
) – Tamr Dataset object- Return type
- Returns
A toolbox address validation mapping
- Raises
ValueError – if the provided dataset is not a toolbox address validation mapping dataset
NameError – if the provided dataset does not contain all the attributes of a toolbox address validation mapping
RuntimeError – if there is any other problem while reading the dataset as a toolbox address validation mapping
- tamr_toolbox.enrichment.address_mapping.to_dataset(addr_mapping, *, dataset=None, datasets_collection=None, create_dataset=False, dataset_name='address_validation_mapping')[source]¶
Ingest a toolbox address validation mapping in Tamr, creating the source dataset if needed.
- Parameters
addr_mapping (
Dict
[str
,AddressValidationMapping
]) – a toolbox address validation mappingdatasets_collection (
Optional
[DatasetCollection
]) – a Tamr client datasets collectioncreate_dataset (
bool
) – flag to create or upsert to an existing address validation mapping source datasetdataset_name (
str
) – name to use if creating new dataset
- Return type
- Returns
The name of the created or updated Tamr Dataset
- Raises
ValueError – if create_dataset is False and dataset is not provided or is not a toolbox address validation mapping dataset. If create_dataset is True but datasets_collection or target_language or source_language is missing or the Tamr dataset already exists
RuntimeError – if there is an error during the creation of the Tamr dataset attributes
- tamr_toolbox.enrichment.address_mapping.to_json(dictionary)[source]¶
Convert a toolbox address validation mapping entries to a json format where set object are converted to list
- Parameters
dictionary (
Dict
[str
,AddressValidationMapping
]) – a toolbox address validation mapping- Return type
- Returns
A list of toolbox address validation mapping entries in json format
- tamr_toolbox.enrichment.address_mapping.save(addr_mapping, addr_folder, filename='address_validation_mapping.json')[source]¶
Save a toolbox address validation mapping to disk
- tamr_toolbox.enrichment.address_mapping.load(addr_folder, filename='address_validation_mapping.json')[source]¶
Load a toolbox address validation mapping from disk to memory
- Parameters
- Return type
- Returns
A toolbox address validation mapping
- Raises
RuntimeError – if the file was found on disk but is not of a valid toolbox address validation mapping type