Tuesday, May 21, 2013

[OBIEE 11g] Exalytics Loading Data


 



Exalytics: Loading Data



There are three primary use cases for Exalytics/OBIEE/TimesTen:



1.    Normal BI use case



2.    DataMart use case



3.   Operational Data Store (ODS) use case




In terms of how TimesTen is used (1) is the most common case (maybe 70-80% of customers) and (2) and (3) are very similar to each other.





For (1), only aggregates are stored in TimesTen. These aggregates are recommended by the BI software based on the various reports and dashboards that have been configured there, the queries that BI generates as a result of them and the way the data is organised in the backend database. BI generates scripts to compute these aggregates and to load them into TimesTen. 

This is all completely controlled and managed by the BI software. We should not be getting involved in any recommendations relating to data loading for this aspect (since it is all under the control of BI and is closely coupled with who BI is setup and used). Any 'advice' for customers in this area is the job of BI SCs, BI consulting and maybe BI support. 





For (2) and (3) a subset of the backend database DataMart / ODS is typically loaded into TimesTen. This usually consists of one or more very large 'fact' tables plus a larger number of significantly smaller (but still often large in absolute terms) 'dimension' tables. The decision as to which tables need to be moved into TimesTen and what subset of the data is needed is still really driven by BI considerations and has to be carefully evaluated on a case by case basis. We should not get involved in that aspect; that is firmly the province of BI SCs / consultants to assist the customer with. 

Once the customer already nows the exact set of tables etc. that they want to move into TimesTen then we can potentially provide some advice, though again it should be in conjunction with the BI folk to ensure we are all aligned on the best option(s) for a given use case.





For initial data loading (initial population):



If the backend DB is non-Oracle then potential options are:



1.    Manual; create compatible TT schemas and create tables in TT, unload data from backend DB in a format that ttBulkCp can read, load data into TimesTen using ttBulkCp.



2.    Use OBIEE. This has some limited data loading capabilities.



3.    Use Oracle Data Integrator (if the backend DB is supported). This requires a license so it is not a free option.



4.    Use Oracle Datawarehouse Administration Console (DAC). This also requires a licence so is not a free option.



If the backend database is Oracle 10gR2 or later then there is an additional option:



5.    Use ttImportFromOracle to perform type mapping, data compression etc. and generate TT compatible DDL files to create the tables in Oracle. Use the (new) parallel data load feature of ttImportFromOracle (as from TT 11.2.2.5.0) to load data to TT tables quickly and with high throughput.




For Oracle DB backends in general (5) would be the recommendation (since it is free and flexible) but customers may of course choose any of these options.





For incremental refresh loads:



Typically, once the initial population has been done then from time to time (might be hourly but more often daily or less frequently) there is a need to add new data to the TT datamart / ODS. Usually this is simply adding new data but sometimes it may also require updating existing data. Due to the limited capacity of memory there will also need to be a mechanism to periodically purge 'old' data.



This is a more complex area and as of today we do not have any really good solutions (but they are begin worked on). Post 11.2.2.5.0 we hope to have some better options (but that is internal information not for external consumption at present). The two main challenges here are:




1.    All queries that originate from BI use table level locking for performance reasons. Thus they are incompatible leith any concurrent DML against the tables being queried.




2.    What does it mean anyway, in business terms, to run report and queries against data  that is changing (given that TimesTen does not support the same query consistency models as e.g. Oracle DB). 



So today options for incremental refresh require either:



a)    Some BI application downtime to allow the TT tables to be refreshed. This maybe can be selective if there are different sets of tables that can be refreshed together.



b)   Some form of double buffering whereby two copies of all relevant tables are maintained in TT. App runs against copy 1 while copy 2 is refreshed then app switches to copy 2. Later copy 1 can be refreshed and app switched back to copy 1 afterwards. Bi can support this but the memory requirements in TimesTen often make it impractical.



The 'tools' that can be used for this are pretty much the same as for initial loading…





For non-Oracle backend DBs:



1.    Manual; Unload data from backend DB in a format that ttBulkCp can read, load data into TimesTen using ttBulkCp. If anything other than INSERT is needed then may need top use staging tables and PL/SQL processing in TimesTen etc.



2.    Use Oracle Data Integrator (if the backend DB is supported). This requires a license so it is not a free option.


3.    Use Oracle Datawarehouse Administration Console (DAC).

DAC is free and supported for customers with license of any of the BI Applications.



If the backend database is Oracle 10gR2 or later then there is an additional option:



4.    Use the ttLoadFromOracle building to load data directly into the TT tables, or into staging tables for further processing using e.g. PL/SQL. The LoadData.sql script generated by ttImportFromOracle could be used as a basis to construct suitable scripts.






No comments:

Post a Comment