Tuesday, March 12, 2013

[OBIEE 10g] Use Of OBIEE STITCH JOINS:





Use of OBIEE-Stitch Joins:



The general approach to developing with OBI is to push as much processing work to the database as possible.  The goal is generally to let the DB engine handle all the work, all within itself, and reduce as much as possible any performance or latency hits introduced by a network.  In general this is correct.


However, there may be a point where that philosophy becomes overkill.


When OBI generates SQL, there are a number of options it has available to use as defined in the Features tab of your database:






The specific scenario I’m going to discuss is with a conformed query.  A conformed query is simply the merging together of two separate queries along a shared dimension.  There are a few options in the Features tab that have a large degree of influence over how OBI handles a conformed query.



The Scenario

Consider the following simple model of a conformed data model:






LTS stands for Logical Table Source; forget about Logical Tables they don’t matter.


Then consider the following simple query from Answers:


SELECT Person.Name, LTS1.Metric1, LTS2.Metric2;


OBI will need to go after 2 result sets and then merge them together:


#1:


SELECT Person.Name, SUM(LTS1.Metric1)

FROM Driver, LTS1

GROUP BY Person.Name;



#2:


SELECT Person.Name, SUM(LTS2.Metric2)

FROM Driver, LTS2

GROUP BY Person.Name;



Then the results of each query will have to be merged together.  It’s not quite as simple as you might first expect, especially if there are more conformed fields than just Person.Name.  The tricky part deals with null values and different datasets, which OBI assumes is always the case:



Example Query Fragment Result Sets:


Query #1
Query #2
Ann, 100Bob, 130

Chuck, 150
Bob, 1000Chuck, 1010
Dave, 1200



As you can see, the result sets are different, so simply joining on the common field of Person.Name is inadequate – some additional wrapper logic is needed to produce the following results:


Person.Name
Metric1
Metric2
Ann
100

Bob
130
1000
Chuck
150
1010
Dave

1200



What needs to occur is a FULL OUTER JOIN to stitch the two together.  And OUTER JOINS, let alone FULL OUTER JOINS, are CPU-intensive operations. If there were more than one field, the merging logic is even more complicated, as combinations of values with nulls need to be considered.  More on that later.



OBI’s SQL Generation Choices


The two base inner queries will happen no matter how you configure OBI; it’s the data retrieval part of the problem, and you have to get the data out of the raw tables somehow.  Sure there are some alterations and strange things (like the WITH syntax), but basically those 2 queries have to get fired no matter what.



The merge/stitch/join potion however is flexible, and is ultimately what this article is about.  OBI has 2 choices:


a)      The Database can do the merge

b)      OBI can do the merge


There are 2 database features which control where the merge will occur: 
 ROWNUM_SUPPORTED and PERF_PREFER_INTERNAL_STITCH_JOIN.  On many databases, OBI defaults the ROWNUM_SUPPORTED feature to True and PERF_PREFER_INTERNAL_STITCH_JOIN to False.  Let’s take a look at the effect of these 2 options:




ROWNUM_SUPPORTED
PERF_PREFER_INTERNAL_STITCH_JOIN
Join Location
1
Yes
No
Database
2
Yes
Yes
OBI
3
No
No
OBI
4
No
Yes
OBI



When OBI does the stitching, it simply fires off multiple, but simple, queries to the database, waits for each to come back, and then does the merging on its own:

  • Database Load Reduced
  • Network Load Increased
  • OBI Load Increased


When the database does the stitching, the two base queries are wrapped together in a monster SQL statement with a FULL OUTER JOIN linking them and a ton of additional value checking SQL in the format of:


D1.c4 is not null then D1.c4 when D2.c2 is not null then D2.c2 end  as c1,

  • Database CPU Load Increased
  • Network Load Reduced
  • OBI Load Reduced



Let’s look at some examples to show more specifically what I’m talking about.


Here is the first option, where the database does the join.  These are real queries from my current system, but don’t focus on the internals of the black and blue text – it’s the red text that is important.  The SQL in black and blue is the core data access code, the stuff that is basically the same regardless of how you glue them together. Note this is a 3 column select with a few filters on it and a strange piece that selects from Dual (ignore it):



select D1.c1 as c1,

D1.c2 as c2,

D1.c3 as c3


from


(select D1.c1 as c1,

D1.c2 as c2,

D1.c3 as c3,

D1.c4 as c4,

D1.c5 as c5


from


(select case  when D1.c4 is not null then D1.c4 when D2.c2 is not null then D2.c2 end  as c1,

D2.c1 as c2,

case  when case  when D1.c3 is not null then D1.c3 when D2.c3 is not null then D2.c3 end  = ‘Driver’ then D1.c1 else D1.c2 end  as c3,

case  when D1.c3 is not null then D1.c3 when D2.c3 is not null then D2.c3 end  as c4,

case  when D1.c5 is not null then D1.c5 when D2.c4 is not null then D2.c4 end  as c5,

ROW_NUMBER() OVER (PARTITION BY case  when D1.c3 is not null then D1.c3 when D2.c3 is not null then D2.c3 end , case  when D1.c4 is not null then D1.c4 when D2.c2 is not null then D2.c2 end , case  when D1.c5 is not null then D1.c5 when D2.c4 is not null then D2.c4 end  ORDER BY case  when D1.c3 is not null then D1.c3 when D2.c3 is not null then D2.c3 end  ASC, case  when D1.c4 is not null then D1.c4 when D2.c2 is not null then D2.c2 end  ASC, case  when D1.c5 is not null then D1.c5 when D2.c4 is not null then D2.c4 end  ASC) as c6


from


(select D1.c1 as c1,

D1.c2 as c2,

D1.c3 as c3,

D1.c4 as c4,

D1.c5 as c5


from


(select count(distinct T10979.DRIVER_WID) as c1,

count(distinct 0) as c2,

T13395.ENTITY_TYPE as c3,

TRUNC(T804.DAY_DATE) as c4,

T804.DAY_WID as c5,

ROW_NUMBER() OVER (PARTITION BY T804.DAY_WID, T13395.ENTITY_TYPE ORDER BY T804.DAY_WID ASC, T13395.ENTITY_TYPE ASC) as c6


from


(select ‘Vehicle’ as ENTITY_TYPE  from dual union select ‘Driver’ from dual) T13395,

W_DAY_DIM T804 /* Date (W_DAY_DIM) */ ,

W_CUSTOMER_DIM T806 /* Company (W_CUSTOMER_DIM) */ ,

W_MEASURE_TYPE_DIM T867 /* Metric (W_MEASURE_TYPE_DIM) */ ,

W_DRIVER_DAY_AGG T10979 /* Facts: Measure DayDrv (W_DRIVER_DAY_AGG) */

where  ( T804.DAY_WID = T10979.DATE_WID and T806.CUSTOMER_WID = T10979.CUSTOMER_WID and T806.CUSTOMER_WID = 188111.0 and T867.MEASURE_TYPE_WID = T10979.MEASURE_TYPE_WID and T10979.CUSTOMER_WID = 188111.0 and T13395.ENTITY_TYPE = ‘Driver’ and T804.DAY_DATE between TO_DATE(’2010-07-11 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) and TO_DATE(’2010-07-25 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) )

group by T804.DAY_WID, T13395.ENTITY_TYPE,  TRUNC(T804.DAY_DATE)

) D1

where  ( D1.c6 = 1 )

) D1 full outer join (select D1.c1 as c1,

D1.c2 as c2,

D1.c3 as c3,

D1.c4 as c4


from


(select sum(T1021.MEASURE_COUNT) as c1,

TRUNC(T804.DAY_DATE) as c2,

T13395.ENTITY_TYPE as c3,

T804.DAY_WID as c4,

ROW_NUMBER() OVER (PARTITION BY T804.DAY_WID, T13395.ENTITY_TYPE ORDER BY T804.DAY_WID ASC, T13395.ENTITY_TYPE ASC) as c5


from


(select ‘Vehicle’ as ENTITY_TYPE  from dual union select ‘Driver’ from dual) T13395,

W_DAY_DIM T804 /* Date (W_DAY_DIM) */ ,

W_CUSTOMER_DIM T806 /* Company (W_CUSTOMER_DIM) */ ,

W_MEASURE_TYPE_DIM T867 /* Metric (W_MEASURE_TYPE_DIM) */ ,

W_FLEET_AGG T1021 /* Facts: Measure Fleet (W_FLEET_AGG) */

where  ( T804.DAY_WID = T1021.DATE_WID and T806.CUSTOMER_WID = T1021.CUSTOMER_WID and T806.CUSTOMER_WID = 188111.0 and T867.MEASURE_TYPE_WID = T1021.MEASURE_TYPE_WID and T1021.CUSTOMER_WID = 188111.0 and T13395.ENTITY_TYPE = ‘Driver’ and T804.DAY_DATE between TO_DATE(’2010-07-11 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) and TO_DATE(’2010-07-25 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) )

group by T804.DAY_WID, T13395.ENTITY_TYPE,  TRUNC(T804.DAY_DATE)

) D1

where  ( D1.c5 = 1 )

) D2 On D1.c3 = D2.c3 and D1.c5 = D2.c4

) D1

where  ( D1.c6 = 1 )

) D1

order by c1

Now with PERF_PREFER_INTERNAL_STITCH_JOIN = TRUE:

select D1.c4 as c1,

D1.c2 as c2,

D1.c3 as c3,

D1.c1 as c4,

D1.c5 as c5


from


(select D1.c1 as c1,

D1.c2 as c2,

D1.c3 as c3,

D1.c4 as c4,

D1.c5 as c5


from


(select count(distinct T10979.DRIVER_WID) as c1,

count(distinct 0) as c2,

T13395.ENTITY_TYPE as c3,

TRUNC(T804.DAY_DATE) as c4,

T804.DAY_WID as c5,

ROW_NUMBER() OVER (PARTITION BY T804.DAY_WID, T13395.ENTITY_TYPE ORDER BY T804.DAY_WID ASC, T13395.ENTITY_TYPE ASC) as c6


from


(select ‘Vehicle’ as ENTITY_TYPE  from dual union select ‘Driver’ from dual) T13395,

W_DAY_DIM T804 /* Date (W_DAY_DIM) */ ,

W_CUSTOMER_DIM T806 /* Company (W_CUSTOMER_DIM) */ ,

W_MEASURE_TYPE_DIM T867 /* Metric (W_MEASURE_TYPE_DIM) */ ,

W_DRIVER_DAY_AGG T10979 /* Facts: Measure DayDrv (W_DRIVER_DAY_AGG) */

where  ( T804.DAY_WID = T10979.DATE_WID and T806.CUSTOMER_WID = T10979.CUSTOMER_WID and T806.CUSTOMER_WID = 188111.0 and T867.MEASURE_TYPE_WID = T10979.MEASURE_TYPE_WID and T10979.CUSTOMER_WID = 188111.0 and T13395.ENTITY_TYPE = ‘Driver’ and T804.DAY_DATE between TO_DATE(’2010-07-11 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) and TO_DATE(’2010-07-25 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) )

group by T804.DAY_WID, T13395.ENTITY_TYPE,  TRUNC(T804.DAY_DATE)

) D1

where  ( D1.c6 = 1 )

) D1

order by c3, c5;

——————————————————————————————-



select D2.c2 as c1,

D2.c1 as c2,

D2.c3 as c3,

D2.c4 as c4


from


(select D1.c1 as c1,

D1.c2 as c2,

D1.c3 as c3,

D1.c4 as c4


from


(select sum(T1021.MEASURE_COUNT) as c1,

TRUNC(T804.DAY_DATE) as c2,

T13395.ENTITY_TYPE as c3,

T804.DAY_WID as c4,

ROW_NUMBER() OVER (PARTITION BY T804.DAY_WID, T13395.ENTITY_TYPE ORDER BY T804.DAY_WID ASC, T13395.ENTITY_TYPE ASC) as c5


from


(select ‘Vehicle’ as ENTITY_TYPE  from dual union select ‘Driver’ from dual) T13395,

W_DAY_DIM T804 /* Date (W_DAY_DIM) */ ,

W_CUSTOMER_DIM T806 /* Company (W_CUSTOMER_DIM) */ ,

W_MEASURE_TYPE_DIM T867 /* Metric (W_MEASURE_TYPE_DIM) */ ,

W_FLEET_AGG T1021 /* Facts: Measure Fleet (W_FLEET_AGG) */

where  ( T804.DAY_WID = T1021.DATE_WID and T806.CUSTOMER_WID = T1021.CUSTOMER_WID and T806.CUSTOMER_WID = 188111.0 and T867.MEASURE_TYPE_WID = T1021.MEASURE_TYPE_WID and T1021.CUSTOMER_WID = 188111.0 and T13395.ENTITY_TYPE = ‘Driver’ and T804.DAY_DATE between TO_DATE(’2010-07-11 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) and TO_DATE(’2010-07-25 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) )

group by T804.DAY_WID, T13395.ENTITY_TYPE,  TRUNC(T804.DAY_DATE)

) D1

where  ( D1.c5 = 1 )

) D2

order by c3, c4;



As you can see, the heart of the queries are identical, the difference is only in where the merging occurs.


One side note:  some databases do not like the ROW_NUMBER() function, so see if yours does.  MPP databases from my TeraData experience perform this operation very slowly.  I’ve also seen conflicting information from authors on whether it’s good or bad to use ROW_NUMBER as it pertains to performance.


Without the ROWNUM_SUPPORTED setting, the queries are even simpler:


select count(distinct T10979.DRIVER_WID) as c1,

count(distinct 0) as c2,

T13395.ENTITY_TYPE as c3,

TRUNC(T804.DAY_DATE) as c4,

T804.DAY_WID as c5


from


(select ‘Vehicle’ as ENTITY_TYPE  from dual union select ‘Driver’ from dual) T13395,

W_DAY_DIM T804 /* Date (W_DAY_DIM) */ ,

W_CUSTOMER_DIM T806 /* Company (W_CUSTOMER_DIM) */ ,

W_MEASURE_TYPE_DIM T867 /* Metric (W_MEASURE_TYPE_DIM) */ ,

W_DRIVER_DAY_AGG T10979 /* Facts: Measure DayDrv (W_DRIVER_DAY_AGG) */

where  ( T804.DAY_WID = T10979.DATE_WID and T806.CUSTOMER_WID = T10979.CUSTOMER_WID and T806.CUSTOMER_WID = 188111.0 and T867.MEASURE_TYPE_WID = T10979.MEASURE_TYPE_WID and T10979.CUSTOMER_WID = 188111.0 and T13395.ENTITY_TYPE = ‘Driver’ and T804.DAY_DATE between TO_DATE(’2010-07-11 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) and TO_DATE(’2010-07-25 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) )

group by T804.DAY_WID, T13395.ENTITY_TYPE,  TRUNC(T804.DAY_DATE)

order by c5, c3

+++dev2:2a0000:2a0001:—-2010/11/03 06:08:29

——————– Sending query to database named AM Data Mart (id: <<49984>>):

select sum(T1021.MEASURE_COUNT) as c1,

TRUNC(T804.DAY_DATE) as c2,

T13395.ENTITY_TYPE as c3,

T804.DAY_WID as c4


from


(select ‘Vehicle’ as ENTITY_TYPE  from dual union select ‘Driver’ from dual) T13395,

W_DAY_DIM T804 /* Date (W_DAY_DIM) */ ,

W_CUSTOMER_DIM T806 /* Company (W_CUSTOMER_DIM) */ ,

W_MEASURE_TYPE_DIM T867 /* Metric (W_MEASURE_TYPE_DIM) */ ,

W_FLEET_AGG T1021 /* Facts: Measure Fleet (W_FLEET_AGG) */

where  ( T804.DAY_WID = T1021.DATE_WID and T806.CUSTOMER_WID = T1021.CUSTOMER_WID and T806.CUSTOMER_WID = 188111.0 and T867.MEASURE_TYPE_WID = T1021.MEASURE_TYPE_WID and T1021.CUSTOMER_WID = 188111.0 and T13395.ENTITY_TYPE = ‘Driver’ and T804.DAY_DATE between TO_DATE(’2010-07-11 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) and TO_DATE(’2010-07-25 00:00:00′ , ‘YYYY-MM-DD HH24:MI:SS’) )

group by T804.DAY_WID, T13395.ENTITY_TYPE,  TRUNC(T804.DAY_DATE)

order by c4, c3



To Internally Stitch or Not?


You’re going to hate me for this, but there is no single answer.  It depends very much on a variety of items.  Personally I prefer the OBI stitch join syntax; it is simply easier to read during development.  

QA’ing the monster queries is akin to how the crew read the black and green screen from the Matrix and made perfect sense of it – not for the inexperienced.  I am a firm believer that if the SQL is not perfect, it’s wrong.  So in order to debug/tune the SQL, this is the way to go. But of course what a person prefers is not always what the technology prefers.


When I say perfect, I mean that the proper function shipping and grouping is being pushed to the database.  If these are incorrect, you may end up with a situation where the query is returning too many records to OBI across the network.  The difference here can be huge; in some cases your inner query may return a million rows or more across the network for OBI to then write to disk and do its MAX() computation.  

With a large recordset, this is a deal breaker.  With a small recordset, you will be ok.  This is why it is imperative to know what your queries are doing and how many rows they return from the OBI log file:



——————– Rows 15, bytes 1680 retrieved from database query id: <<49984>> — Good

——————– Rows 150000, bytes 16800000 retrieved from database query id: <<49984>> — Bad



Ok, so now I’m assuming you did your RPD work and you have the correct core SQL being generated.  Now it’s on to determining where the stitch should occur.

Recently I had the opportunity to do some load testing on our pre-production system.



We made 3 realizations that are specific to our system (you may or may not have the same ones):

  1. The database CPU was the bottleneck doing nested loops for processing the stitching
  2. The inner SQL all returned only a few rows per query, usually under 10
  3. We had a tremendous amount of unused OBI Server CPU power



We these inputs, we switched over to letting OBI do the stitching.  The results were dramatic:

  1. At 100 concurrent users, the DB load factor went from 75 to 9 (16 CPU cores)
  2. Avg. report response time improved 9x.  Remember, this is at load, not a single query on an idle box
  3. OBI Server load maxed out at 1% across 3 nodes even with stitching



These results speak for themselves.  A substantial improvement by changing one single checkbox!



Be Careful Of


There are a few things to be careful of when you go from one technique to another – the direction doesn’t matter much:


  1. Make sure your inner SQL is perfect – that is a prerequisite for this entire topic
  2. Regression test everything  when you make any SQL generation switches.  I have seen differences appear going from OBI to DB stitch and also the reverse.  I cannot stress this enough.
  3. To resolve these differences, particularly when letting OBI do the stitching, is it is a little more strict than the DB stitching.  Example:  After letting OBI do the stitching, we tracked a discrepancy down to having an incorrect mid-level hierarchy key.  We had defined the level key as State, and Oracle DB was fine with it.  But OBI didn’t like it, and instead wanted us to make the level key to be Country & State.  Expect to find some difficult  RPD configuration issues to resolve.
  4. Nested Aggregations force your hand somewhat in that they frequently require ROW_NUMBER support in order to generate multi-level SQL (i.e., MAX(SUM())). Without it, the database will bring the results of the inner aggregation across the network to OBI – this could be millions of records.  If this is the case and you want OBI to do the joins, make sure that your PERF_PREFER_INTERNAL_STITCH_JOIN is set to Yes.



Conclusion


For many systems, where the stitching occurs will not be a problem.  It depends on the complexity of your queries, the type of hardware you have available, where your bottlenecks are, the planned load and even metrics.  

Furthermore, you may not see the benefits of going one way or another until the system is at full user load causing resource capacities and limits to be hit causing bottlenecks. I strongly recommend not only a load test, but compare both options at load to identify which option is best.  

Of course before you do this it is imperative that you regression test to make sure you get the same results – do not make the mistake of assuming such a small configuration change can’t have large impacts.






No comments:

Post a Comment