DF-Connect¶
Client¶
Tasks related to interacting with the Tamr auxiliary service DF-connect
-
class
tamr_toolbox.data_io.df_connect.client.
Client
(host, port, protocol, base_path, tamr_username, tamr_password, jdbc_info, cert)[source]¶ A data class for interacting with df_connect via jdbc.
- Parameters
host (
str
) – the host where df_connect is runningport (
str
) – the port on which df_connect is listeningprotocol (
str
) – http or httpsbase_path (
str
) – if using nginx-like proxy this is the redirect pathtamr_username (
str
) – the tamr account to usetamr_password (
str
) – the password for the tamr account to usejbdc_info – configuration information for the jdbc connection
cert (
Optional
[str
]) – optional path to a certfile for authentication
-
tamr_toolbox.data_io.df_connect.client.
from_config
(config, config_key='df_connect', jdbc_key='ingest')[source]¶ Constructs a Client object from a json dictionary.
- Parameters
config (
Dict
[str
,Any
]) – A json dictionary of configuration valuesconfig_key (
str
) – block of the config to parse for values. Defaults to ‘df_connect’jdbc_key (
str
) – the key used to specify which block of df_connect–>jdbc in configuration to be used for picking up database connection information. Defaults to ‘ingest’
- Return type
- Returns
A Client object
-
tamr_toolbox.data_io.df_connect.client.
create
(*, host, port='', protocol, base_path='', tamr_username, tamr_password, jdbc_dict, cert=None)[source]¶ Simple wrapper for creating an instance of Client dataclass object.
- Parameters
host (
str
) – the host where df_connect is runningport (
str
) – the port on which df_connect is listeningprotocol (
str
) – http or httpsbase_path – if using nginx-like proxy this is the redirect path
tamr_username (
str
) – the tamr account to usetamr_password (
str
) – the password for the tamr account to usejdbc_dict (
Dict
[str
,Any
]) – configuration information for the jdbc connectioncert (
Optional
[str
]) – optional path to a certfile for authentication
- Return type
- Returns
An instance of tamr_toolbox.data_io.df_connect.Client
-
tamr_toolbox.data_io.df_connect.client.
get_connect_session
(connect_info)[source]¶ Returns an authenticated session using Tamr credentials from configuration. Raises an exception if df_connect is not installed or running correctly.
- Parameters
connect_info (
Client
) – An instance of a Client object- Return type
- Returns
An authenticated session
- Raises
RuntimeError – if a connection to df_connect cannot be established
-
tamr_toolbox.data_io.df_connect.client.
ingest_dataset
(connect_info, *, dataset_name, query, primary_key=None)[source]¶ Ingest a dataset into Tamr via df-df_connect given dataset name, query string, and optional list of columns for primary key
- Parameters
dataset_name (
str
) – Name of datasetquery (
str
) – jdbc query to execute in the database and results of which will be loaded into Tamrconnect_info (
Client
) – A Client object for establishing session and loading jdbc parametersprimary_key – list of columns to use as primary key. If None then df_connect will generate its own primary key
- Return type
- Returns
JSON response from API call
- Raises
HTTPError – if the call to ingest the dataset was unsuccessful
-
tamr_toolbox.data_io.df_connect.client.
export_dataset
(connect_info, *, dataset_name, target_table_name, truncate_before_load=False, **kwargs)[source]¶ Export a dataset via jdbc to a target database.
- Parameters
dataset_name (
str
) – the name of the dataset to exporttarget_table_name (
str
) – the table in the database to updatetruncate_before_load (
bool
) – whether or not to truncate the database table before loadconnect_info (
Client
) – A Client object for establishing session and loading jdbc parametersjdbc_key – the key for picking up relevant block for export from config file. See examples directory for usage
- Return type
- Returns
JSON response from API call
- Raises
HTTPError – if the call to export the dataset was unsuccessful
-
tamr_toolbox.data_io.df_connect.client.
execute_statement
(connect_info, statement)[source]¶ Calls the execute statement endpoint of df-df_connect.
-
tamr_toolbox.data_io.df_connect.client.
profile_query_results
(connect_info, *, dataset_name, queries)[source]¶ Profile the contents of JDBC queries via df_connect and write results to a Tamr dataset. For example the query “select * from table A” means that all rows from table A will be profiled, while “select * from table A where name==”my_name”” will only profile rows meeting the name==”my_name” condition. The same Tamr dataset can be used for profile results from multiple queries
- Parameters
- Return type
- Returns
JSON response from API call
- Raises
HTTPError – if the call to profile the dataset was unsuccessful
-
tamr_toolbox.data_io.df_connect.client.
export_dataset_avro_schema
(connect_info, *, url, dataset_name, fs_type)[source]¶ Takes a connect info object and writes the avro schema to specified url for specified dataset. By default assumes HDFS but if local_fs is set to true writes to server file system.
- Parameters
- Return type
- Returns
json returned by df-connects /urlExport/<hdfs/serverfs>/avroSchema endpoint
- Raises
HTTPError – if the call to export the schema was unsuccessful
-
tamr_toolbox.data_io.df_connect.client.
export_dataset_as_avro
(connect_info, *, url, dataset_name, fs_type)[source]¶ Takes a connect info object and writes the corresponding avro file to specified url for specified dataset. By default assumes HDFS but if local_fs is set to true writes to server file system.
- Parameters
- Return type
- Returns
json returned by df-connects /urlExport/<hdfs/serverfs>/avroSchema endpoint
- Raises
ValueError – if using an unsupported type of file system
HTTPError – if the call to export the dataset was unsuccessful
JdbcInfo¶
Tasks related to handling jdbc information for the Tamr auxiliary service DF-connect
-
class
tamr_toolbox.data_io.df_connect.jdbc_info.
JdbcInfo
(jdbc_url, db_user, db_password, fetch_size)[source]¶ A dataclass to tie together relevant data to ingest data into df_connect.
-
tamr_toolbox.data_io.df_connect.jdbc_info.
from_config
(config, *, config_key='df_connect', jdbc_key='ingest')[source]¶ Create an instance of JdbcInfo from a json object.
- Parameters
config (
Dict
[str
,Any
]) – A json dictionary containing configuration valuesconfig_key (
str
) – the top-level key of the config to use.jdbc_key (
str
) – the key to use for the jdbc block. Needs to be within config_key block. Defaults to ‘ingest’, but can be used to specify any sub-block of a config object or yaml file. See example configs and exports for more context.
- Return type