Error Handling ^^^^^^^^^^^^^^ This section describes Exception classes, ``on_error`` functions and error handling via SQL. ETLHelperError -------------- ETL Helper has a :ref:`variety of Exception classes `, all of which are subclasses of the :class:`ETLHelperError ` base class. To aid debugging, the :class:`ETLHelperQueryError `, :class:`ETLHelperExtractError ` and :class:`ETLHelperInsertError ` classes print the SQL query and the required paramstyle as well as the error message returned by the database. .. literalinclude:: ../code_demos/error_handling/demo_extract_error.py :language: python The output is: .. code:: bash etlhelper.exceptions.ETLHelperExtractError: SQL query raised an error. SELECT * FROM bad_table Required paramstyle: qmark no such table: bad_table also handling errors in SQL e.g. ON CONFLICT .. _on_error: on_error -------- The default behaviour of ``etlhelper`` is to raise an exception on the first error and abort the transfer. Sometimes it is desirable to ignore the errors and to do something else with the failed rows. The ``on_error`` parameter allows a function to be passed that is applied to the failed rows of each chunk. The input is a list of (row, exception) tuples. Different examples are given here. The simplest approach is to collect all the errors into a list to process at the end. .. code:: python errors = [] etl.executemany(sql, conn, rows, on_error=errors.extend) if errors: do_something() Errors can be logged to the ``etlhelper`` logger. .. literalinclude:: ../code_demos/error_handling/demo_log_error.py :language: python The IDs of failed rows can be written to a file. .. code:: python def write_bad_ids(failed_rows): with open('bad_ids.txt', 'at') as out_file: for row, exception in failed_rows: out_file.write(f"{row.id}\n") etl.executemany(sql, conn, rows, on_error=write_bad_ids) ``executemany``, ``load``, ``copy_rows`` and ``copy_table_rows`` can all take an ``on_error`` parameter. They each return a tuple containing the number of rows processed and the number of rows that failed. Error handling via SQL ---------------------- The ``on_error`` functions allow individual failed rows to be processed, however this flexibility can come at the expense of speed. Each chunk of data that contains a bad row will be retried on a row-by-row basis. Databases also have methods for handling errors e.g. duplicate primary keys using SQL. By customising an INSERT query (which can be programmatically generated with :func:`generate_insert_query() `) the database can be instructed how to process such rows. The following example for SQLite will ignore duplicate rows. Different databases have different syntax and capabilities, including ``upsert`` and ``merge``. .. literalinclude:: ../code_demos/error_handling/demo_sql_conflict_error.py :language: python The output is: .. code:: bash {'id': 1, 'name': 'basalt', 'grain_size': 'fine'}