DF-Connect¶
Client¶
Tasks related to interacting with the Tamr auxiliary service DF-connect
- class tamr_toolbox.data_io.df_connect.client.Client(host, port, protocol, base_path, tamr_username, tamr_password, jdbc_info, cert)[source]¶
A data class for interacting with df_connect via jdbc.
- Parameters
host (
str
) – the host where df_connect is runningport (
str
) – the port on which df_connect is listeningprotocol (
str
) – http or httpsbase_path (
str
) – if using nginx-like proxy this is the redirect pathtamr_username (
str
) – the tamr account to usetamr_password (
str
) – the password for the tamr account to usejbdc_info – configuration information for the jdbc connection
cert (
Optional
[str
]) – optional path to a certfile for authentication
- tamr_toolbox.data_io.df_connect.client.from_config(config, config_key='df_connect', jdbc_key='ingest')[source]¶
Constructs a Client object from a json dictionary.
- Parameters
config (
Dict
[str
,Any
]) – A json dictionary of configuration valuesconfig_key (
str
) – block of the config to parse for values. Defaults to ‘df_connect’jdbc_key (
str
) – the key used to specify which block of df_connect–>jdbc in configuration to be used for picking up database connection information. Defaults to ‘ingest’
- Return type
- Returns
A Client object
- tamr_toolbox.data_io.df_connect.client.create(*, host, port='', protocol, base_path='', tamr_username, tamr_password, jdbc_dict, cert=None)[source]¶
Simple wrapper for creating an instance of Client dataclass object.
- Parameters
host (
str
) – the host where df_connect is runningport (
str
) – the port on which df_connect is listeningprotocol (
str
) – http or httpsbase_path – if using nginx-like proxy this is the redirect path
tamr_username (
str
) – the tamr account to usetamr_password (
str
) – the password for the tamr account to usejdbc_dict (
Dict
[str
,Any
]) – configuration information for the jdbc connectioncert (
Optional
[str
]) – optional path to a certfile for authentication
- Return type
- Returns
An instance of tamr_toolbox.data_io.df_connect.Client
- tamr_toolbox.data_io.df_connect.client.get_connect_session(connect_info)[source]¶
Returns an authenticated session using Tamr credentials from configuration. Raises an exception if df_connect is not installed or running correctly.
- Parameters
connect_info (
Client
) – An instance of a Client object- Return type
- Returns
An authenticated session
- Raises
RuntimeError – if a connection to df_connect cannot be established
- tamr_toolbox.data_io.df_connect.client.ingest_dataset(connect_info, *, dataset_name, query, primary_key=None)[source]¶
Ingest a dataset into Tamr via df-df_connect given dataset name, query string, and optional list of columns for primary key
- Parameters
dataset_name (
str
) – Name of datasetquery (
str
) – jdbc query to execute in the database and results of which will be loaded into Tamrconnect_info (
Client
) – A Client object for establishing session and loading jdbc parametersprimary_key – list of columns to use as primary key. If None then df_connect will generate its own primary key
- Return type
- Returns
JSON response from API call
- Raises
HTTPError – if the call to ingest the dataset was unsuccessful
- tamr_toolbox.data_io.df_connect.client.export_dataset(connect_info, *, dataset_name, target_table_name, truncate_before_load=False, **kwargs)[source]¶
Export a dataset via jdbc to a target database.
- Parameters
dataset_name (
str
) – the name of the dataset to exporttarget_table_name (
str
) – the table in the database to updatetruncate_before_load (
bool
) – whether or not to truncate the database table before loadconnect_info (
Client
) – A Client object for establishing session and loading jdbc parametersjdbc_key – the key for picking up relevant block for export from config file. See examples directory for usage
- Return type
- Returns
JSON response from API call
- Raises
HTTPError – if the call to export the dataset was unsuccessful
- tamr_toolbox.data_io.df_connect.client.execute_statement(connect_info, statement)[source]¶
Calls the execute statement endpoint of df-df_connect.
- tamr_toolbox.data_io.df_connect.client.profile_query_results(connect_info, *, dataset_name, queries)[source]¶
Profile the contents of JDBC queries via df_connect and write results to a Tamr dataset. For example the query “select * from table A” means that all rows from table A will be profiled, while “select * from table A where name==”my_name”” will only profile rows meeting the name==”my_name” condition. The same Tamr dataset can be used for profile results from multiple queries
- Parameters
- Return type
- Returns
JSON response from API call
- Raises
HTTPError – if the call to profile the dataset was unsuccessful
- tamr_toolbox.data_io.df_connect.client.export_dataset_avro_schema(connect_info, *, url, dataset_name, fs_type)[source]¶
Takes a connect info object and writes the avro schema to specified url for specified dataset. By default assumes HDFS but if local_fs is set to true writes to server file system.
- Parameters
- Return type
- Returns
json returned by df-connects /urlExport/<hdfs/serverfs>/avroSchema endpoint
- Raises
HTTPError – if the call to export the schema was unsuccessful
- tamr_toolbox.data_io.df_connect.client.export_dataset_as_avro(connect_info, *, url, dataset_name, fs_type)[source]¶
Takes a connect info object and writes the corresponding avro file to specified url for specified dataset. By default assumes HDFS but if local_fs is set to true writes to server file system.
- Parameters
- Return type
- Returns
json returned by df-connects /urlExport/<hdfs/serverfs>/avroSchema endpoint
- Raises
ValueError – if using an unsupported type of file system
HTTPError – if the call to export the dataset was unsuccessful
JdbcInfo¶
Tasks related to handling jdbc information for the Tamr auxiliary service DF-connect
- class tamr_toolbox.data_io.df_connect.jdbc_info.JdbcInfo(jdbc_url, db_user, db_password, fetch_size)[source]¶
A dataclass to tie together relevant data to ingest data into df_connect.
- tamr_toolbox.data_io.df_connect.jdbc_info.from_config(config, *, config_key='df_connect', jdbc_key='ingest')[source]¶
Create an instance of JdbcInfo from a json object.
- Parameters
config (
Dict
[str
,Any
]) – A json dictionary containing configuration valuesconfig_key (
str
) – the top-level key of the config to use.jdbc_key (
str
) – the key to use for the jdbc block. Needs to be within config_key block. Defaults to ‘ingest’, but can be used to specify any sub-block of a config object or yaml file. See example configs and exports for more context.
- Return type