DF-Connect¶
Client¶
Tasks related to interacting with the Tamr auxiliary service DF-connect
- 
class tamr_toolbox.data_io.df_connect.client.Client(host, port, protocol, base_path, tamr_username, tamr_password, jdbc_info)[source]¶
- A data class for interacting with df_connect via jdbc. - Parameters
- host ( - str) – the host where df_connect is running
- port ( - str) – the port on which df_connect is listening
- protocol ( - str) – http or https
- base_path ( - str) – if using nginx-like proxy this is the redirect path
- tamr_username ( - str) – the tamr account to use
- tamr_password ( - str) – the password for the tamr account to use
- jbdc_info – configuration information for the jdbc connection 
 
 
- 
tamr_toolbox.data_io.df_connect.client.from_config(config, config_key='df_connect', jdbc_key='ingest')[source]¶
- Constructs a Client object from a json dictionary. - Parameters
- config ( - Dict[- str,- Any]) – A json dictionary of configuration values
- config_key ( - str) – block of the config to parse for values. Defaults to ‘df_connect’
- jdbc_key ( - str) – the key used to specify which block of df_connect–>jdbc in configuration to be used for picking up database connection information. Defaults to ‘ingest’
 
- Return type
- Returns
- A Client object 
 
- 
tamr_toolbox.data_io.df_connect.client.create(*, host, port='', protocol, base_path='', tamr_username, tamr_password, jdbc_dict)[source]¶
- Simple wrapper for creating an instance of Client dataclass object. - Parameters
- host ( - str) – the host where df_connect is running
- port ( - str) – the port on which df_connect is listening
- protocol ( - str) – http or https
- base_path – if using nginx-like proxy this is the redirect path 
- tamr_username ( - str) – the tamr account to use
- tamr_password ( - str) – the password for the tamr account to use
- jdbc_dict ( - Dict[- str,- Any]) – configuration information for the jdbc connection
 
- Return type
- Returns
- An instance of tamr_toolbox.data_io.df_connect.Client 
 
- 
tamr_toolbox.data_io.df_connect.client.get_connect_session(connect_info)[source]¶
- Returns an authenticated session using Tamr credentials from configuration. Raises an exception if df_connect is not installed or running correctly. - Parameters
- connect_info ( - Client) – An instance of a Client object
- Return type
- Session
- Returns
- An authenticated session 
- Raises
- RuntimeError – if the a connection to df_connect cannot be established 
 
- 
tamr_toolbox.data_io.df_connect.client.ingest_dataset(connect_info, *, dataset_name, query, primary_key=None)[source]¶
- Ingest a dataset into Tamr via df-df_connect given dataset name, query string, and optional list of columns for primary key - Parameters
- dataset_name ( - str) – Name of dataset
- query ( - str) – jdbc query to execute in the database and results of which will be loaded into Tamr
- connect_info ( - Client) – A Client object for establishing session and loading jdbc parameters
- primary_key – list of columns to use as primary key. If None then df_connect will generate its own primary key 
 
- Return type
- Returns
- JSON response from API call 
- Raises
- HTTPError – if the call to ingest the dataset was unsuccessful 
 
- 
tamr_toolbox.data_io.df_connect.client.export_dataset(connect_info, *, dataset_name, target_table_name, truncate_before_load=False, **kwargs)[source]¶
- Export a dataset via jdbc to a target database. - Parameters
- dataset_name ( - str) – the name of the dataset to export
- target_table_name ( - str) – the table in the database to update
- truncate_before_load ( - bool) – whether or not to truncate the database table before load
- connect_info ( - Client) – A Client object for establishing session and loading jdbc parameters
- jdbc_key – the key for picking up relevant block for export from config file. See examples directory for usage 
 
- Return type
- Returns
- JSON response from API call 
- Raises
- HTTPError – if the call to export the dataset was unsuccessful 
 
- 
tamr_toolbox.data_io.df_connect.client.execute_statement(connect_info, statement)[source]¶
- Calls the execute statement endpoint of df-df_connect. 
- 
tamr_toolbox.data_io.df_connect.client.profile_query_results(connect_info, *, dataset_name, queries)[source]¶
- Profile the contents of JDBC queries via df_connect and write results to a Tamr dataset. For example the query “select * from table A” means that all rows from table A will be profiled, while “select * from table A where name==”my_name”” will only profile rows meeting the name==”my_name” condition. The same Tamr dataset can be used for profile results from multiple queries - Parameters
- Return type
- Returns
- JSON response from API call 
- Raises
- HTTPError – if the call to profile the dataset was unsuccessful 
 
- 
tamr_toolbox.data_io.df_connect.client.export_dataset_avro_schema(connect_info, *, url, dataset_name, fs_type)[source]¶
- Takes a connect info object and writes the avro schema to specified url for specified dataset. By default assumes HDFS but if local_fs is set to true writes to server file system. - Parameters
- Return type
- Returns
- json returned by df-connects /urlExport/<hdfs/serverfs>/avroSchema endpoint 
- Raises
- HTTPError – if the call to export the schema was unsuccessful 
 
- 
tamr_toolbox.data_io.df_connect.client.export_dataset_as_avro(connect_info, *, url, dataset_name, fs_type)[source]¶
- Takes a connect info object and writes the corresponding avro file to specified url for specified dataset. By default assumes HDFS but if local_fs is set to true writes to server file system. - Parameters
- Return type
- Returns
- json returned by df-connects /urlExport/<hdfs/serverfs>/avroSchema endpoint 
- Raises
- ValueError – if using an unsupported type of file system 
- HTTPError – if the call to export the dataset was unsuccessful 
 
 
JdbcInfo¶
Tasks related to handling jdbc information for the Tamr auxiliary service DF-connect
- 
class tamr_toolbox.data_io.df_connect.jdbc_info.JdbcInfo(jdbc_url, db_user, db_password, fetch_size)[source]¶
- A dataclass to tie together relevant data to ingest data into df_connect. 
- 
tamr_toolbox.data_io.df_connect.jdbc_info.from_config(config, *, config_key='df_connect', jdbc_key='ingest')[source]¶
- Create an instance of JdbcInfo from a json object. - Parameters
- config ( - Dict[- str,- Any]) – A json dictionary containing configuration values
- config_key ( - str) – the top-level key of the config to use.
- jdbc_key ( - str) – the key to use for the jdbc block. Needs to be within config_key block. Defaults to ‘ingest’, but can be used to specify any sub-block of a config object or yaml file. See example configs and exports for more context.
 
- Return type