Thursday, October 18, 2012

[OBIEE 10g And 11g] Cache Management


 OBIEE Cache Management


Introduction to Cache

Decision support queries sometimes require large amounts of database processing. If you reduce the amount of database querying then you can speed up the time to produce reports. In an Analytics system the way to reduce database accesses is to create a ‘Cache’ of data on the same machine as the Analytics engine. The Siebel Analytics Server can save the results of a query in cache files and then reuse those results later when a similar query is requested. We refer to this as ‘query cache’. Using query cache, the cost of database processing only needs to be paid once for a query, not every time the query is run.

The query cache allows the Analytics Server to satisfy many subsequent query requests without having to access the back-end databases. This reduction in communication costs can dramatically decrease query response time.

However, as updates occur on the back-end databases (due to an ETL run), the query cache entries do not reflect the latest data in the warehouse; they become ‘stale’. Therefore, Siebel Analytics administrators need either to enable a method of updating the cash on a rolling basis, or purge it of old data and refresh it to a set schedule.

Caching may be considered to have small cost in terms of disk space to store the cache, and a small number of I/O transactions on the server but the this should be easily outweighed by the improvement in query response times. Caching does require some on going management in terms of limiting the cache size and ensuring the data is refreshed, but both these tasks can be controlled automatically.

Not all queries are suitable for cache use. A simple list of all accounts would not be a good dataset to cache as the cache file would be too large.

The Plan

1. Start with Cache enabled
Caching is proven to be generally beneficial so there is nothing to be gained from not enabling some form of caching from the outset. (exception noted above)
2. Develop a cache updating method.
Analytics has various methods to manage the cache generation and deletion.
3. Monitor query code to spot potential for improvement to the cache.
Check for slow running queries on an ongoing basis.


Cache Testing 

It is prudent to test your reports to ensure that they do benefit from the cached data.

1. Test the response times on queries using cache
2. Test the response times on queries not using cache



Cache Methods

This section describes the available methods of ensuring the data in the cache is kept current.
For the cache to be effective as a means or of performance enhancement it needs to be populated with relevant data quickly, relying on this data being built up as a result of users queries is not practical where the data will be updated frequently. To speed population a process known as seeding can be implemented, this involves running carefully designed queries to populate the cache with the required data. This can be done in a number of ways but the most efficient is to build this functionality into the ETL process.
One of the costs, or disadvantages of caching is the potential for data latency, or a ‘stale’ cache, this occurs when the data in the cache is not purged after the data warehouse has been updated, there are several options available to deal with this.

Possible Methods

a. No Cache
b. Polling table
c. Caching enabled at table level
d. Manual Cache Management


a. No Cache - If no cache is used then every request for data will generate a new SQL query that will be applied to the SRMW, this increase network traffic, hugely increases the demands on the server and affects productivity. The speed at which the results are returned is governed by the speed of the database and the network, and the ability of the Analytics server to compute the parameters of the query.

b. Polling table - Data can be refreshed using an Event Table. This is a table in the database which is populated with an entry recording the details of a table when that table is updated by the ETL process, the Analytics Server polls the table and purges data from the cache if a table has been updated. This is useful where incremental ETL processes are run during the day, for instance to update sales data, the frequency with which the Analytics Server checks the polling table can be set to coincide with that of the ETL so data from the more frequently updated tables is purged from the cache more often to avoid out of date results to queries. Where incremental ETL is run once a day or overnight for instance, this approach is arguably less beneficial. The frequency with which the Analytics Server polls the event table is set in the Analytics Administration tool, Tools>Utilities>Siebel Event Tables. N.B The parameters for the Event table contain table names only and cannot contain an alias, when the data for a table is purged data from an alias of that table is not, this can lead to misleading results and an alterative purging strategy must be found for the alias.

c. Table Level - In it’s simplest form, caching can be enabled table by table within the data warehouse, by default all tables have caching enabled and the persistence time, the time the data is left in the cache, is infinite. To achieve the best combination of performance improvement whilst limiting disk space used by the cache, tables that are rarely queried can be deselected from this process, performance can be further enhanced by altering the persistence time to coincide with incremental ETL processes as with method b. The two methods can also be used in conjunction with each other.
It is still important to purge the cache otherwise you can be querying yesterday’s data, even though an ETL was run overnight.

d. Manual Management – A cache manager is available from within the repository when you are connected in online mode. To access the Cache Manager select Tools\ Manage\ Cache. Note that this option is greyed out unless you have enabled caching in the NQSConfig.ini file. Manual cache management involves the purging of cache physically by a user. It is not suitable for day to day operations, but can be handy during testing.


Purging Options 

Invoking ODBC Extension Functions
The following ODBC functions affect cache entries associated with the repository specified by the ODBC connection. You can call these functions using the nqcmd.exe command-line executable.

The syntax of the call will be as follows:
nqcmd -d "Analytics Web" –u administrator –p sadmin –s purge.txt
Where purge.txt contains the call (for example, call SAPurgeAllCache()).

SAPurgeCacheByQuery. Purges a cache entry that exactly matches a specified query.
The following call programmatically purges the cache entry associated with this query:
Call SAPurgeCacheByQuery(‘select lastname, firstname from employee where salary >
100000’ );

SAPurgeCacheByTable. Purges all cache entries associated with a specified physical table
name (fully qualified) for the repository to which the client has connected.
This function takes up to four parameters representing the four components (database, catalog,
schema and table name proper) of a fully qualified physical table name. For example, you might have a table with the fully qualified name of DBName.CatName.SchName.TabName. To purge the cache entries associated with this table in the physical layer of the Siebel Analytics repository, execute the following call in a script:
Call SAPurgeCacheByTable( ‘DBName’, ‘CatName’, ‘SchName’, ‘TabName’ );
Wild cards are not supported by the Siebel Analytics Server for this function. Additionally,
DBName and TabName cannot be null. If either one is null, you will receive an error message.

SAPurgeAllCache. Purges all cache entries. The following is an example of this call:
Call SAPurgeAllCache();

SAPurgeCacheByDatabase. Purges all cache entries associated with a specific physical
database name. A record is returned as a result of calling any of the ODBC procedures to purge
the cache. This function takes one parameter that represents the physical database name and
the parameter cannot be null. The following is an example of this call:
Call SAPurgeCacheByDatabase( ‘DBName’ );

Siebel Analytics Scheduler
The SA Scheduler can be used for running general purpose scripts that extend the functionality of Siebel Analytics.
The script purgeSASCache is used to periodically purge all of the cache from the Siebel Analytics Server:
/////////////////////////////////////////////////////////
// purgeSASCache.js
//
// Purges the cache on SAS.
// Parameter(0) - The user name to pass in to NQCMD.
// Parameter(1) - The password for the aforementioned user.
/////////////////////////////////////////////////////////
// The full path to nqcmd.exe
var nqCmd = "D:\\SiebelAnalytics\\Bin\\nqcmd.exe";
// The data source name
var dsn = "Analytics Web";
// The user to execute the queries
var user = Parameter(0);
// The password of the aforementioned user
var pswd = Parameter(1);
// The ODBC procedure call for purging the cache
var sqlStatement = "{call SAPurgeAllCache()};";
//////////////////////////////////////////////////////////
// Returns a string from the file name
//////////////////////////////////////////////////////////
function GetOutput(fso, fileName)
{
var outStream = fso.OpenTextFile(fileName, 1);
var output = outStream.ReadAll();

outStream.Close();
return output;
}
//////////////////////////////////////////////////////////
// Get WshShell object and run nqCmd. Capture the output
// so that we can handle erroneous conditions.
var wshShell = new ActiveXObject("WScript.Shell");
// Create a temp file to input the SQL statement.
var fso = new ActiveXObject("Scripting.FileSystemObject");
var tempFolder = fso.GetSpecialFolder(2);
var tempInFileName = fso.GetTempName();
var tempOutFileName = fso.GetTempName();
tempInFileName = tempFolder + "\\" + tempInFileName;
tempOutFileName = tempFolder + "\\" + tempOutFileName;
var tempInFile = fso.CreateTextFile(tempInFileName, true);
tempInFile.WriteLine(sqlStatement);
tempInFile.Close();
try
{
// execute
var dosCmd = nqCmd + " -d \"" + dsn + "\" -u \"" + user
+ "\" -p \"" + pswd + "\" -s \"" + tempInFileName + "\"" +
" -o \"" + tempOutFileName + "\"";
wshShell.Run(dosCmd, 0, true);
var output = GetOutput(fso, tempOutFileName);
// Remove the temp files
fso.DeleteFile(tempInFileName);
if (fso.FileExists(tempOutFileName)) {
fso.DeleteFile(tempOutFileName);
}
// Check the output for any errors
if (output.indexOf("Processed: 1 queries") == -1) {
ExitCode = -1;
Message = output;
}
else if (output.indexOf("Encountered") != -1) {
ExitCode = -2;
Message = output;
}
else {
ExitCode = 0;
}
} catch (e) {

if (fso.FileExists(tempInFileName)) {
fso.DeleteFile(tempInFileName);
}
if (fso.FileExists(tempOutFileName)) {
fso.DeleteFile(tempOutFileName);
}
throw e;
}


Options for Seeding the Cache

Seeding the cache can be automated via an ODBC call in much the same way as purging.

The syntax of the call will be as follows:
nqcmd -d "Analytics Web" –u administrator –p sadmin – file.sql

Where file sql is a script containing sql select statements to populate the cache, ideally these will be a super set of the queries issued by the requests in Siebel Answers and dashboards deemed most likely to derive maximum performance gains from using the cache.

Siebel Delivers can also be used to seed the cache automatically, the advantage of this is that iBots can trigger the specific requests that are required for caching, this should carry further performance benefits.


NQSConfig.ini 


The NQSConfig.ini file located in the $SiebelAnalytics\Config folder needs to be modified in order to activate and parameterise Analytics Caching. Below are settings for caching with cache tables of an expected size of approximately 250 000 records.

Parameter Value Comments

ENABLE 
YES
Turns overall Caching on or off

DATA_STORAGE_PATHS 
“C:\SiebelAnalyticsData\Temp\Cache 500 MB”
For optimal performance, the directories specified should be on high performance storage systems.

METADATA_FILE 
"C:\SiebelAnalytics\metadata_file.dat"
The filename needs to be a fully qualified pathname with a .dat extension.

REPLACE_ALGORITHM 
LRU
Currently, the only supported algorithm is LRU (Least Recently Used).

BUFFER_POOL_SIZE 
20 MB
Specifies the amount of physical memory Analytics can use to store Cache files, in. More cache in memory means better response time.

MAX_ROWS_PER_CACHE_ENTRY 
0
When set to 0, there is no limit to the number of rows per cache entry.

MAX_CACHE_ENTRY_SIZE 
10 MB
Maximum size of a specific cache entry on the Physical drive.

POPULATE_AGGREGATE_ROLLUP_HITS 
YES
Specifies whether aggregate cache files should be created. Setting this parameter to TRUE may result in better performance, but results in more entries being added to the cache.

METADATA_BACKUP_FREQUENCY_MINUTES 
1440
This creates a backup file with a .bak extension. There will only ever be one of these.

A : Configuring cache parameters in NQSConfig.ini
Go to [ CACHE ] section of the config file
1 ENABLE  should be YES for cache to work.
2 DATA_STORAGE_PATHS : Need path where cache file to be stored and max allowed size
3 MAX_ROWS_PER_CACHE_ENTRY: Max row to be in any cache file. And hence his parameter helps to avoid run away query results being cache
4 MAX_CACHE_ENTRY_SIZE : Max size of each cache file
5 MAX_CACHE_ENTRIES: Max cache file in the directory specified.
Note:  When limit specified in MAX_CACHE_ENTRIES reached server will delete the Least Recently Used (i.e file which has not been used since long) to accommodate the new entry
6 POPULATE_AGGREGATE_ROLLUP_HITS: default is NO. It will store the agg result even if that request is being served from cache
7 MAX_SUBEXPR_SEARCH_DEPTH: it seaches the express till  the level specified for the cache hit.

B : Controlling cache at physical layer in Administrator tool
 Physical Property Of table :Cache

Please note the cacheable checkbox. By default all the physical tables are cacheable.
rest two options determines cache is valid till what time ?
Cache never expires will put the entry in cache permanently unless its cleaned by paramer 5 above
Cache persistence time This will determine validity time of cache
C :Using Cache manager
Open repository in online mode
Go to Manage–>Cache  You will see the screen as below. OBIEE administrator can control the cache from here as well.
 
Cache Manager Cache Tab 

                                                    Cache Manager Physical table

 
                                                               Cache Operations

You can see all the cache entries as shown in the screen above. You can purge the cache from here , see the sql being generated , Copy -save sql , and see the info which basically shows you the config parameters for the cache.
So this will be helpful in purging the cache entries manually.
I would suggest you to explore all the options.

D: Automatically purging cache entries
Cache entry will automatically gets purged when MAX_CACHE_ENTRIES limit reaches or when on physical layer it reaches that time.
Another Way to Automatically purge cache is to use even pulling table.
Event pulling table reads the data ( or status of the ETL ) and based on that it purges the cache. The problem with this is: it is not purely even base. i.e we can do like purge the cache and seed cache when ETL is done. However it can read the pulling table at specific interval and decides its action.
It a separate post in itself. I will address this in my future post.
E: Seeding cache
In normal English Seeding cache means running report automatically or manually (mostly in non business hours) before user actually needs it.
So when users needs that data it reads from the cache and hence performance is really amazing.
1 you can run report maually to generate cache( this is not possible for non business hours)
2 setting and iBot to run the report at specific time so that it generates cache.
Please refer the screen print below for better understanding.

                                                                                            
                                                                     Seeding Cache

 create an iBot for the report for which you want to seed the cache. In Destinaton tab select Oracle BI Server Cache check box to seed the cache as shown above.
 


2 comments:

  1. Hi Kalyani,
    Your blogspot infos are really good & helping lot.
    But it is difficult to read your blog due to the theme and background colors of the blog. Kindly change it pls

    ReplyDelete
  2. I cant read ur page ... colors too rich.. :-(

    ReplyDelete