ETL: A Simple Package to Load Data from Views
A common, native way to load data into tables in Oracle is to create a view to load the data from.
Depending on how the view is built, you can either refresh (i.e. overwrite) the data in a table or append fresh data to the table.
Here, I present a simple package
ETL that only requires you to maintain a configuration table and obviously the source views (or tables) and target tables.
ETL is for single-source/single-target loads.
That means that for each target there is a source view (or table), and each source view can only be used to load data into one target table at a time.
Fancier logic, where a single source object is used to feed multiple tables in one go, is not supported by
ETL because it’s designed to be simple.
At its heart is the configuration table
ETL_CONF, which allows you to specify a (remote) table or view in any schema the user has access to.
ETL_CONF_BEF_INS checks before each insert whether the source and target tables exist.
It does a few other checks too, just to make sure you’re not inserting garbage.
ETL_CONF_AFT_MOD is fired after modifications on
These modifications are then logged in
ETL_CONF_LOG, so that you have an audit trail of who changed what and when.
Configurations can be grouped into so-called categories, which are often useful when you want to execute them in batches, or simply to organize them.
You can run the
ETL package with
LOAD_ALL_TABS, the signatures of which are shown below:
PROCEDURE load_tab_from_view ( target_own_in adm_tab_load_conf.target_own%TYPE , target_obj_in adm_tab_load_conf.target_obj%TYPE ); PROCEDURE load_all_tabs ( category_in adm_tab_load_conf.load_category%TYPE := NULL , resume_load_at_in adm_tab_load_conf.load_order%TYPE := NULL );
LOAD_ALL_TABS can execute all view-to-table loads in sequence when no parameters are supplied, or it can load a specific category either in its entirety or from a certain sequence number (using
This is for instance useful when there was an exception that cancelled the remainder of a category, and you’ve now fixed the bug but do not want the loaders to run from scratch again.
ETL stop on exceptions?
Well, the sequence of the load is specified in
LOAD_ORDER and the assumption is that there may be dependencies among configurations in the sense that data has to be inserted into a particular table before another load can be executed.
You’ll see entries in the execution log for clean-up procedures. By default, log entries are purged after 100 days, so that the logs do not become unwieldy.
By the way, the code is on my repository, where you’ll find a deployment script in the
It installs all packages in the repository, but then again most are needed anyway because
ETL builds on the infrastructure that’s already there.
ETL-related tables and views have comments, so that the data dictionary is up to date.
Refreshing versus Appending
When a configuration is set to refresh (‘REF’), all data in the target table is removed and everything from the source table or view is inserted into the emptied table.
Appending data (‘APD’) requires the source table/view to add only fresh data.
In the case of a view, this means having a
WHERE clause that looks at data that’s not already in the table.
For large tables, indexes are obviously required to make that work well.
For append configurations it is also possible to specify a single-column
WHERE clause to be used to purge ‘stale’ data.
The idea is that append configurations, like logs, can grow indefinitely, hence the need to sweep some data.
ARCHIVE_COL_VALUE can be used to define a single-column predicate to remove old data.
For example, suppose you have a table with an
DIAGNOSED_AT column to indicate when a patient was diagnosed, that is, it’s a
Now, you’re tasked with automatically purging data older than 10 days.
ARCHIVE_COL_NAME equal to
ARCHIVE_COL_OPER to ‘<’, and
SYSDATE - 10.
The reason you are not allowed to specify the
WHERE clause yourself, which would have been slightly easier to implement, is that it may lead to improper use of the feature.
Append configurations ought to have proper indexes on the target table, and a free-style purge predicate seems too easy to abuse the built-in feature.
You can of course modify that easily if you’re not happy with this restriction.
The columns are called
ARCHIVE_COL_* because at some point I might want to add logic to move the data to an archive table rather than the rubbish bin; the option between archive and purge would obviously be configurable.
There are also three views that have been created for reasons of convenience.
The first one,
ETL_STATS collects some statistics, so that you have an idea about the run-time performance of each configuration.
ETL_HIST combines the information from
ETL_STATS and the administration table
ETL_CONF, so it’s your go-to object for both the load order and statistics.
ETL_RECENT is the last view that comes with
ETL, and it gives you the numbers from the last five successful executions, which is especially helpful when changes to the source views have been made and the current execution time is markedly different from the historical one.
I’ve also added a simple view
ERROR_LOG_RECENT that shows the errors from the last hour.
This is mainly useful when developing.
ETL provides two procedures, you can easily combine it with Oracle’s built-in scheduler.
That way you can periodically load data with minimal effort.
After all, you do not need to create scheduler jobs for every single object.
For instance, suppose you have a category for master data and you wish to re-load master data every hour. You can then do the following:
CREATE OR REPLACE PROCEDURE load_master AS BEGIN etl.load_all_tabs(category_in => 'MASTER'); END load_master; BEGIN dbms_scheduler.create_job( job_name => 'load_master_tables', job_type => 'STORED_PROCEDURE', job_action => 'LOAD_MASTER', repeat_interval => 'FREQ=HOURLY', enabled => TRUE, comments => 'Job to load all master data'); END;
When scheduling jobs, you obviously have to make sure that the previous run had sufficient time to complete before attempting another one.
That’s all, folks. Now go forth and tinker!