Dataset

Manage

tamr_toolbox.dataset.manage.exists(*, client, dataset_name)[source]

Check if a dataset exists in a Tamr instance

Parameters
  • client (Client) – Tamr python client object for the target instance

  • dataset_name (str) – The dataset name

Return type

bool

Returns

True or False for if the dataset exists in target instance

tamr_toolbox.dataset.manage.create(*, client, dataset_name, dataset=None, primary_keys=None, attributes=None, attribute_types=None, attribute_descriptions=None, description=None, external_id=None, tags=None)[source]

Flexibly create a source dataset in Tamr

A template dataset object can be passed in to create a duplicate dataset with a new name. If the template dataset is not provided, the primary_keys must be defined for the dataset to be created. Additional attributes can be added in the attributes argument. The default attribute type will be ARRAY STRING. Non-default attribute types can be specified in the attribute_types dictionary. Any attribute descriptions can be specified in the attribute_descriptions dictionary.

Parameters
  • client (Client) – TUC client object

  • dataset_name (str) – name for the new dataset being created

  • dataset (Optional[Dataset]) – optional dataset TUC object to use as a template for the new dataset

  • primary_keys (Optional[List[str]]) – one or more attributes for primary key(s) of the new dataset

  • attributes (Optional[Iterable[str]]) – a list of attribute names to create in the new dataset

  • attribute_types (Optional[Dict[str, Union[PrimitiveType, Array, Map, Record]]]) – dictionary for non-default types, attribute name is the key and AttributeType is the value

  • attribute_descriptions (Optional[Dict[str, str]]) – dictionary for attribute descriptions, attribute name is the key and the attribute description is the value

  • description (Optional[str]) – description of the new dataset

  • external_id (Optional[str]) – external_id for dataset, if None Tamr will create one for you

  • tags (Optional[List[str]]) – the list of tags for the new dataset

Return type

Dataset

Returns

Dataset created in Tamr

Raises

Example

>>> import tamr_toolbox as tbox
>>> tamr_client = tbox.utils.client.create(**instance_connection_info)
>>> tbox.dataset.manage.create(
>>>     client=tamr_client,
>>>     dataset_name="my_new_dataset",
>>>     primary_keys=["unique_id"],
>>>     attributes=["name","address"],
>>>     description="My new dataset",
>>> )
tamr_toolbox.dataset.manage.update(dataset, *, attributes=None, attribute_types=None, attribute_descriptions=None, description=None, tags=None, override_existing_types=False)[source]

Flexibly update a source dataset in Tamr

All the attributes that should exist in the dataset must be defined in the attributes argument. This function will add/remove attributes in the dataset until the dataset attributes matches the set of attributes passed in as an argument. The default attribute type will be ARRAY STRING . To set non-default attribute types, they must be defined in the attribute_types dictionary. Any attribute descriptions can be specified in the attribute_descriptions dictionary. By default, the existing attribute types will not change unless override_existing_types is set to True. When False, the attribute type updates will only be logged.

Parameters
  • dataset (Dataset) – An existing TUC dataset

  • attributes (Optional[Iterable[str]]) – Complete list of attribute names that should exist in the updated dataset

  • attribute_types (Optional[Dict[str, Union[PrimitiveType, Array, Map, Record]]]) – dictionary for non-default types, attribute name is the key and AttributeType is the value

  • attribute_descriptions (Optional[Dict[str, str]]) – dictionary for attribute descriptions, attribute name is the key and the attribute description is the value

  • description (Optional[str]) – updated description of dataset, if None will not update the description

  • tags (Optional[List[str]]) – updated tags for the dataset, if None will not update tags

  • override_existing_types (bool) – boolean flag, when true will alter existing attribute’s types

Return type

Dataset

Returns

Updated Dataset

Raises

Example

>>> import tamr_toolbox as tbox
>>> from tbox.models import attribute_type
>>> tamr_client = tbox.utils.client.create(**instance_connection_info)
>>> dataset = = tamr_client.datasets.by_name("my_dataset_name")
>>> tbox.dataset.manage.update(
>>>     client=tamr_client,
>>>     dataset=dataset,
>>>     attributes=["unique_id","name","address","total_sales"],
>>>     attribute_types={"total_sales":attribute_type.ARRAY(attribute_type.DOUBLE)},
>>>     override_existing_types = True,
>>> )
tamr_toolbox.dataset.manage.create_attributes(*, dataset, attributes, attribute_types=None, attribute_descriptions=None)[source]

Create new attributes in a dataset

The default attribute type will be ARRAY STRING. To set non-default attribute types, they must be defined in the attribute_types dictionary. Any attribute descriptions can be specified in the attribute_descriptions dictionary.

Parameters
  • dataset (Dataset) – An existing TUC dataset

  • attributes (Iterable[str]) – list of attribute names to be added to dataset

  • attribute_types (Optional[Dict[str, Union[PrimitiveType, Array, Map, Record]]]) – dictionary for non-default types, attribute name is the key and AttributeType is the value

  • attribute_descriptions (Optional[Dict[str, str]]) – dictionary for attribute descriptions, attribute name is the key and the attribute description is the value

Return type

Dataset

Returns

Updated Dataset

Raises
  • requests.HTTPError – If any HTTP error is encountered

  • TypeError – If the attributes argument is not an Iterable

  • ValueError – If the dataset is a unified dataset

  • ValueError – If an attribute passed in already exists in the dataset

tamr_toolbox.dataset.manage.edit_attributes(*, dataset, attribute_types=None, attribute_descriptions=None, override_existing_types=True)[source]

Edit existing attributes in a dataset

The attribute type and/or descriptions can be updated to new values. Attributes that will be updated must be in either the attribute_types or attribute_descriptions dictionaries or both. The default attribute type will be ARRAY STRING. To set non-default attribute types, they must be defined in the attribute_types dictionary. Any attribute descriptions can be specified in the attribute_descriptions dictionary. If only the attribute_descriptions dictionary is defined, the attribute type will not be updated.

Parameters
  • dataset (Dataset) – An existing TUC dataset

  • attribute_types (Optional[Dict[str, Union[PrimitiveType, Array, Map, Record]]]) – dictionary for non-default types, attribute name is the key and AttributeType is the value

  • attribute_descriptions (Optional[Dict[str, str]]) – dictionary for attribute descriptions, attribute name is the key and the attribute description is the value

  • override_existing_types (bool) – bool flag, when true will alter existing attributes

Return type

Dataset

Returns

Updated Dataset

Raises
  • requests.HTTPError – If any HTTP error is encountered

  • ValueError – If the dataset is not a source dataset

  • ValueError – If a passed attribute does not exist in the dataset

  • ValueError – If a passed attribute is a primary key and can’t be removed

  • ValueError – If there are no updates to attributes in attribute_types or attribute_descriptions arguments

tamr_toolbox.dataset.manage.delete_attributes(*, dataset, attributes=None)[source]

Remove attributes from dataset by attribute name

Parameters
  • dataset (Dataset) – An existing TUC dataset

  • attributes (Optional[Iterable[str]]) – list of attribute names to delete from dataset

Return type

Dataset

Returns

Updated Dataset

Raises
  • ValueError – If the dataset is not a source dataset

  • ValueError – If a passed attribute does not exist in the dataset

  • ValueError – If a passed attribute is a primary key and can’t be removed

  • TypeError – If the attributes argument is not an Iterable