Tuesday, October 9, 2012

[OWB] ETL Oracle Warehouse Builder 11gR2


ETL (Oracle Warehouse Builder 11gR2)

 

The process of extracting, transforming, and loading data can appear rather complicated. We do have a special term to describe it, ETL, which contains the three steps mentioned. We’re dealing with source data on different database systems from our target and a database from a vendor other than Oracle, Microsoft SQL Server in this case. Let’s look from a high level at what is involved in getting that data from a source system to our target, and then take a look at whether to stage the data or not. We will then see how to automate that process in Warehouse Builder, which will relieve us of much of the work.

Manual ETL processes

First of all, we need to be able to get data out of that source system and move it over to the target system. We can’t begin to do anything until that is accomplished, but what means can we use to do so? We know that the Oracle Database provides various methods to load data into it. There is an application that Oracle provides called SQL*Loader, which is a utility to load data from flat files. This could be one way to get data from our source system. Every database vendor provides some means of extracting data from their tables and saving it to flat files. We could copy the file over and then use the SQL*Loader utility to load the file. Reading the documentation that describes how to use that utility, we see that we have to define a control file to describe the loading process and definitions of the fields to be loaded. This seems like a lot of work, so let’s see what other options we might have.

The Oracle Database allows us to create database links as we saw back in topic 2. When we define our sources, we can link them to other vendor’s database systems via the heterogeneous services, which is exactly what we set up in topic 2. This looks like a better way to go. We could define a database link to point to our source database, and then we could directly copy the data into our database.

However, our target database structure doesn’t look anything like the source database structure. The POS Transactional database is a relational database that is highly normalized, and our target consists of cubes and dimensions implemented relationally in the database. How are we going to get the data copied into that structure? Clearly, there will be some manipulation of the data to get it reformatted and restructured from source to target. We cannot just take all the rows from one table in the source structure and copy them into a table in the target structure for each source table. The data will have to be manipulated when it is copied. This means we need to develop code that can perform this rather complex task, depending on the manipulations that need to be done.

In a nutshell, this is the process of extract, transform, and load. We have to:

1. Extract the data from the source system by some method.
2. Load flat files using SQL*Loader or via a direct database link. Then we have to transform that data with SQL or PL/SQL code in the database to match and fit it into the target structure.
3. Finally, we have to load it into the target structure.
The good news here is that the Warehouse Builder provides us the means to design this process graphically, and then generate all the code we need automatically so that we don’t have to build all that code manually.



No comments:

Post a Comment