ETL (Oracle Warehouse Builder 11gR2)
The
process of extracting, transforming,
and loading data can appear rather complicated. We do have a special term to
describe it, ETL, which contains the three steps mentioned. We’re dealing with
source data on different database systems from our target and a database from a
vendor other than Oracle, Microsoft SQL Server in this case. Let’s look from a
high level at what is involved in getting that data from a source system to our
target, and then take a look at whether to stage the data or not. We will then
see how to automate that process in Warehouse Builder, which will relieve us of
much of the work.
Manual ETL processes
First
of all, we need to be able to get data out of that
source system and move it over to the target system. We can’t begin to do
anything until that is accomplished, but what means can we use to do so? We
know that the Oracle Database provides various methods to load data into it.
There is an application that Oracle provides called SQL*Loader, which is a
utility to load data from flat files. This could be one way to get data from
our source system. Every database vendor provides some means of extracting data
from their tables and saving it to flat files. We could copy the file over and
then use the SQL*Loader utility to load the file. Reading the documentation
that describes how to use that utility, we see that we have to define a control
file to describe the loading process and definitions of the fields to be
loaded. This seems like a lot of work, so let’s see what other options we might
have.
The Oracle
Database allows us to create database links as we saw back in topic 2. When we
define our sources, we can link them to other vendor’s database systems via the
heterogeneous services, which is exactly what we set up in topic 2. This looks
like a better way to go. We could define a database link to point to our source
database, and then we could directly copy the data into our database.
However, our
target database structure doesn’t look anything like the source database
structure. The POS Transactional database is a relational database that is
highly normalized, and our target consists of cubes and dimensions implemented
relationally in the database. How are we going to get the data copied into that
structure? Clearly, there will be some manipulation of the data to get it
reformatted and restructured from source to target. We cannot just take all the
rows from one table in the source structure and copy them into a table in the
target structure for each source table. The data will have to be manipulated
when it is copied. This means we need to develop code that can perform this
rather complex task, depending on the manipulations that need to be done.
In a
nutshell, this is the process of extract, transform, and load. We have to:
1. Extract the
data from the source system by some method.
2. Load flat
files using SQL*Loader or via a direct database link. Then we have to transform
that data with SQL or PL/SQL code in the database to match and fit it into the
target structure.
3. Finally, we
have to load it into the target structure.
The good news
here is that the Warehouse Builder provides us the means to design this process
graphically, and then generate all the code we need automatically so that we
don’t have to build all that code manually.
No comments:
Post a Comment