Monday, March 18, 2013

[OBIEE 10g] WareHouse Table Naming Conventions

 






Warehouse Table Naming Conventions



These conventions are based upon Oracle’s BI Applications 7.9.6.3 standards.


The Analytic Data Warehouse (ADW) is an arbitrary name for the performance layer of your data-warehouse. The diagram below shows a typical data warehouse reference architecture with the Performance Layer circled in red.






Table Names


The ADW schema will predominantly hold tables built by ETL and queried by OBIEE
  • Fact tables contain columns which can be summed, averaged, etc.
  • Dimensional tables contain descriptive columns
  • ADW tables to be loosely named in the format business name/grain/type
  • Business name is a short business description of the table which is not tied to Northgate or e-Business Suite names
  • Grain is optional and describes table granularity (useful for aggregate tables)
    • For the time dimension, the level names DAY, WEEK, MONTH, YEAR should be used
  • Type is one of:
    • _A = aggregated fact table
    • _D = dimension table
    • _DH = dimension hierarchy table
    • _F = fact table
    • Staging tables (used during ETL but not by OBIEE) will have S appended to the
      • _DS = dimensional staging table
      • _FS = fact staging table
    • Materialized views should have MV appended to them
      • _DMV = dimensional materialized view
      • _FMV = fact materialized view
  • By default use singular names (facts maybe plural?)
  • Some examples:
    • TENANCY_D = Tenancy dimension
    • DEBT_DETAIL_F = Debt fact (lowest level detail)
    • DEBT_REGION_A = Debt fact aggregated to a regional level
    • DAY_D = Calendar dimension at day level (lowest level)
    • WEEK_D = Calendar dimension at week level
    • MONTH_D = Calendar dimension at month level
    • PROPERTY_UTILISATION_FMV = Property utilisation fact materialized view




Columns

  • Column names should be business relevant, non-technical, and not tied to Northgate or e-Business Suite names
  • Column suffixes:
    • WID - Identifier generated by ETL to link dimension and fact tables (except ROW_ID). For example TENANCY_WID would reference TENANCY_D.ROW_ID.
    • _ID - Source system identifier
    • _FLG - Flags Y/N
    • _IND - Indicator e.g. D,W,M
    • _DT - Date
    • _TS – Timestamp
    • _AMT - Amount
    • _CNT - Count
    • _PCT - Percentage
    • _CODE - A coded alphanumeric value that defines a possible value for a data element
    • _NAME - A short name that describes the code
    • _DESC - A long description for the code
  • Other conventions:
    • _NO - For descriptive number columns, i.e. TELEPHONE_NO
    • _TXT - For text fields which contain concatenated values etc.
  • System columns on each table:
    • ROW_WID - surrogate key to uniquely identify record (generated during ETL), first column in table
    • DATASOURCE_NUM_ID - Unique identifier of source system from which data was extracted. TBD numbers to use, 2nd column in table
    • INTEGRATION_ID - Unique identifier of fact or dimension in source system, 3rd column in table
    • W_INSERT_DT - Date inserted to Warehouse, at the end of table column definitions
    • W_UPDATE_DT - Date updated to Warehouse, at the end of table column definitions
    • ETL_RUN_ID - Identifier of ETL process that loaded, integrated, checked the row
  • Optional on each table:
    • CREATED_ON_DT - Date created in source system
    • CHANGED_ON_DT - Date changed in source system
    • DELETED_FLG - For logical deletes of warehouse records
  • Type 2 Slowly Changing Dimension columns:
    • EFFECTIVE_FROM_DT – Date that the dimensional record is effective from
    • EFFECTIVE_TO_DT – Date that the dimensional record is effective to
    • CURRENT_FLG – Is record current? Y/N
  • All ADW surrogate keys should be integers and generated from a sequence
  • Foreign keys should always use warehouse surrogate keys
  • The W_ prefix denotes a translated to warehouse code/name/description combination. (the concept of domain values)



Notes on Time Dimensions

  • Reporting, financial, and other calendars are all associated (de-normalized) onto each day of the year
  • Dimensional aggregates are needed by OBIEE for aggregate navigation. So WEEK_D, MONTH_D, and YEAR_D should also be created.
  • ETL updates dimension with reporting flags as necessary
  • The surrogate key ROW_WID is an integer representation of the period, of the format YYYYMMDD or YYYYMM or YYYY


Example Physical Table Column Conventions



Domain
Datatype
Notes
AMT
Number (12,2)
Amount
CNT
Integer
Count
CODE
Varchar2 (20)
Code e.g. TUS, TUD
DD
Number (2)
Day of month e.g. 1-31
DDD
Number (3)
Day of year e.g. 1-366
DESC
Varchar2 (200)
Code description
DS
Number (2)
Datasource e.g. 1=EBS
DT
Date
Date and time
FLG
Char (1)
Y/N flag
ID
Varchar2 (12)
Identifier
IND
Varchar2 (3)
Indicator e.g. D,W,M,Q,A
MM
Number (2)
Month of year e.g. 1-12
NAME
Varchar2 (250)
Name
NO
Number (12)
Number
PCT
Number (3,3)
Percentage
QQ
Number (2)
Quarter of year e.g. 1-4
TS
Timestamp
Timestamp
TXT
Varchar2 (200)
Text string e.g. +64 4 1234567
WW
Number (2)
Week of year e.g. 1-53
WID
Number (12)
Warehouse Identifier i.e. rowid
YYYY
Number (4)
Year e.g. 2012




Refresher on Slowly Changing Dimensions Types









1 comment:


  1. This blog is looking good and very well written and keep write it.
    click here now

    ReplyDelete