Connecting to databases

ETL Helper functions take DB API 2.0 Connection objects as an argument.

Users are free to create their own connections directly or to supply them from connection pools.

Alternatively, ETL Helper’s DbParams class can be used.

DbParams

DbParams classes store connection parameters and provide validation and utility methods to help with creating connections. Connections can be made via the connect function.

The examples below show how to create DbParams objects for different databases. The instantiation checks that the correct attributes have been provided for the specified dbtype. See passwords section for how to provide passwords.

"""ETL Helper script to demonstrate DbParams."""
import etlhelper as etl

ORACLEDB = etl.DbParams(
    dbtype="ORACLE",
    host="localhost",
    port=1521,
    dbname="mydata",
    user="oracle_user",
)

POSTGRESDB = etl.DbParams(
    dbtype="PG",
    host="localhost",
    port=5432,
    dbname="mydata",
    user="postgres_user",
)

SQLITEDB = etl.DbParams(
    dbtype="SQLITE",
    filename="/path/to/file.db",
)

MSSQLDB = etl.DbParams(
    dbtype="MSSQL",
    host="localhost",
    port=1433,
    dbname="mydata",
    user="mssql_user",
    odbc_driver="ODBC Driver 17 for SQL Server",
)

DbParams objects can also be created from environment variables, using the from_environment() function.

## Export environment variables in the shell before running script
# export ETLHelper_dbtype="PG"
# export ETLHelper_host="localhost"
# export ETLHelper_port=5432
# export ETLHelper_dbname="mydata"
# export ETLHelper_user="postgres_user"

POSTGRESDB = etl.DbParams.from_environment(prefix="ETLHelper")

The is_reachable() method checks if a given database can be reached over the network. This does not require a username or password.

if not ORACLEDB.is_reachable():
    raise ETLHelperError("Network problems")

connect function

The DbParams.connect() function returns a DBAPI2 connection as provided by the underlying driver. Using context-manager syntax as below ensures that the connection is closed after use.

with SQLITEDB.connect() as src_conn:
    with POSTGRESDB.connect('PGPASSWORD') as dest_conn:
        do_something()

A standalone etlhelper.connect() function provides backwards-compatibility with previous releases of etlhelper:

import etlhelper as etl
from my_databases import ORACLEDB
oracle_conn = etl.connect(ORACLEDB, "ORACLE_PASSWORD")

Both versions accept additional keyword arguments that are passed to the DB API 2.0-compatible connect function of the underlying driver. For example, the following sets a timeout used by sqlite3 to limit how long it waits to get a lock on a table.

conn = MY_SQLITE_DB.connect(timeout=20)

Passwords

Database passwords must be specified via an environment variable. This reduces the temptation to store them within scripts.

All connection methods take a password_variable argument with the name of the environment variable from which the password should be read.

Environment variables can be set on the command line via:

  • export ORACLE_PASSWORD=some-secret-password on Linux

  • set ORACLE_PASSWORD=some-secret-password on Windows

Or in a Python terminal via:

import os
os.environ["ORACLE_PASSWORD"] = "some-secret-password"

No password is required for SQLite databases.

Database-specific connection options

Handling of LOBs for Oracle connections

Oracle databases have special column types for Character Large Object (CLOB) and Binary Large Object (BLOB). In ETL Helper, the oracledb driver has been configured to return these as native Python str and bytes objects respectively. This is comparable to the behaviour of other database drivers e.g. SQLite, PostgreSQL and avoids the user having to take the extra step of reading the LOB and results in faster data transfer. However, it is not suitable for LOBs larger than 1 Gb.

To return CLOB and BLOB columns as LOBs, configure the driver as follows:

"""ETL Helper script to demonstrate oracle handling Large Objects (LOBs)."""
import etlhelper as etl
import oracledb
from my_databases import ORACLEDB

select_sql = "SELECT my_clob, my_blob FROM my_table"

with ORACLEDB.connect("ORA_PASSWORD") as conn:
    # By default, ETL Helper returns native types
    result_as_native = etl.fetchall(select_sql, conn)

    # Update oracledb settings to return LOBs
    oracledb.defaults.fetch_lobs = True
    result_as_lobs = etl.fetchall(select_sql, conn)

See the oracledb docs for more information.

Disabling fast_executemany for SQL Server and other pyODBC connections

By default an etlhelper pyODBC connection uses a cursor with its fast_executemany attribute set to True. This setting improves the performance of the DB API 2.0 executemany() function when performing bulk inserts to a SQL Server database. However, this overides the default behaviour of pyODBC and there are some limitations in doing this. Importantly, it is only recommended for applications that use Microsoft’s ODBC Driver for SQL Server. See pyODBC fast_executemany.

Using fast_executemany may raise a MemoryError if query involves columns of types TEXT and NTEXT, which are now deprecated. Under these circumstances, etlhelper falls back on fast_executemany being set to False and produces a warning output. See Inserting into SQL server with fast_executemany results in MemoryError.

If required, the fast_executemany attribute can be set to False via a keyword argument to the connect function:

conn = MSSQLDB.connect('MSSQL_PASSWORD', fast_executemany=False)

This keyword argument is used by etlhelper, any further keyword arguments are passed to the connect function of the underlying driver.

Connecting to servers with self-signed certificates with SQL Server

Since the ODBC Driver 18 for SQL Server, the default setting has been to fail certificate validation for servers with self-signed certificates. It is possible to override this setting within the connection string.

ETL Helper provides an optional argument to the connect function to apply the override and trust the server’s self-signed certificate.

conn = MSSQLDB.connect('MSSQL_PASSWORD', trust_server_certificate=True)