Warehouse
Table Naming Conventions
These conventions are based upon
Oracle’s BI Applications 7.9.6.3 standards.
The Analytic Data Warehouse (ADW) is
an arbitrary name for the performance
layer of your data-warehouse. The diagram below shows a typical data
warehouse reference architecture with the Performance Layer circled in red.
Table Names
The ADW schema will predominantly
hold tables built by ETL and queried by OBIEE
- Fact tables contain columns
which can be summed, averaged, etc.
- Dimensional tables contain
descriptive columns
- ADW tables to be loosely named
in the format business name/grain/type
- Business name is a short
business description of the table which is not tied to Northgate or
e-Business Suite names
- Grain is optional and describes
table granularity (useful for aggregate tables)
- For the time dimension, the
level names DAY, WEEK, MONTH, YEAR should be used
- Type is one of:
- _A = aggregated fact table
- _D = dimension table
- _DH = dimension hierarchy
table
- _F = fact table
- Staging tables (used during
ETL but not by OBIEE) will have S appended to the
- _DS = dimensional staging
table
- _FS = fact staging table
- Materialized views should have
MV appended to them
- _DMV = dimensional
materialized view
- _FMV = fact materialized view
- By default use singular names
(facts maybe plural?)
- Some examples:
- TENANCY_D = Tenancy dimension
- DEBT_DETAIL_F = Debt fact (lowest
level detail)
- DEBT_REGION_A = Debt fact
aggregated to a regional level
- DAY_D = Calendar dimension at
day level (lowest level)
- WEEK_D = Calendar dimension at
week level
- MONTH_D = Calendar dimension
at month level
- PROPERTY_UTILISATION_FMV =
Property utilisation fact materialized view
Columns
- Column names should be business relevant, non-technical, and not tied to Northgate or e-Business Suite names
- Column suffixes:
- WID - Identifier generated by
ETL to link dimension and fact tables (except ROW_ID). For example
TENANCY_WID would reference TENANCY_D.ROW_ID.
- _ID - Source system identifier
- _FLG - Flags Y/N
- _IND - Indicator e.g. D,W,M
- _DT - Date
- _TS – Timestamp
- _AMT - Amount
- _CNT - Count
- _PCT - Percentage
- _CODE - A coded alphanumeric
value that defines a possible value for a data element
- _NAME - A short name that
describes the code
- _DESC - A long description for
the code
- Other conventions:
- _NO - For descriptive number
columns, i.e. TELEPHONE_NO
- _TXT - For text fields which
contain concatenated values etc.
- System columns on each table:
- ROW_WID - surrogate key to
uniquely identify record (generated during ETL), first column in table
- DATASOURCE_NUM_ID - Unique identifier
of source system from which data was extracted. TBD numbers to use, 2nd
column in table
- INTEGRATION_ID - Unique
identifier of fact or dimension in source system, 3rd column in table
- W_INSERT_DT - Date inserted to
Warehouse, at the end of table column definitions
- W_UPDATE_DT - Date updated to
Warehouse, at the end of table column definitions
- ETL_RUN_ID - Identifier of ETL
process that loaded, integrated, checked the row
- Optional on each table:
- CREATED_ON_DT - Date created
in source system
- CHANGED_ON_DT - Date changed
in source system
- DELETED_FLG - For logical
deletes of warehouse records
- Type 2 Slowly Changing
Dimension columns:
- EFFECTIVE_FROM_DT – Date that
the dimensional record is effective from
- EFFECTIVE_TO_DT – Date that
the dimensional record is effective to
- CURRENT_FLG – Is record
current? Y/N
- All ADW surrogate keys should
be integers and generated from a sequence
- Foreign keys should always use
warehouse surrogate keys
- The W_ prefix denotes a
translated to warehouse code/name/description combination. (the concept of
domain values)
Notes on Time Dimensions
- Reporting, financial, and other
calendars are all associated (de-normalized) onto each day of the year
- Dimensional aggregates are
needed by OBIEE for aggregate navigation. So WEEK_D, MONTH_D, and
YEAR_D should also be created.
- ETL updates dimension with
reporting flags as necessary
- The surrogate key ROW_WID is an
integer representation of the period, of the format YYYYMMDD or YYYYMM or
YYYY
Example Physical Table Column
Conventions
Domain
|
Datatype
|
Notes
|
AMT
|
Number
(12,2)
|
Amount
|
CNT
|
Integer
|
Count
|
CODE
|
Varchar2
(20)
|
Code
e.g. TUS, TUD
|
DD
|
Number
(2)
|
Day
of month e.g. 1-31
|
DDD
|
Number
(3)
|
Day
of year e.g. 1-366
|
DESC
|
Varchar2
(200)
|
Code
description
|
DS
|
Number
(2)
|
Datasource
e.g. 1=EBS
|
DT
|
Date
|
Date
and time
|
FLG
|
Char
(1)
|
Y/N
flag
|
ID
|
Varchar2
(12)
|
Identifier
|
IND
|
Varchar2
(3)
|
Indicator
e.g. D,W,M,Q,A
|
MM
|
Number
(2)
|
Month
of year e.g. 1-12
|
NAME
|
Varchar2
(250)
|
Name
|
NO
|
Number
(12)
|
Number
|
PCT
|
Number
(3,3)
|
Percentage
|
QQ
|
Number
(2)
|
Quarter
of year e.g. 1-4
|
TS
|
Timestamp
|
Timestamp
|
TXT
|
Varchar2
(200)
|
Text
string e.g. +64 4 1234567
|
WW
|
Number
(2)
|
Week
of year e.g. 1-53
|
WID
|
Number
(12)
|
Warehouse
Identifier i.e. rowid
|
YYYY
|
Number
(4)
|
Year
e.g. 2012
|
Refresher
on Slowly Changing Dimensions Types
ReplyDeleteThis blog is looking good and very well written and keep write it.
click here now