In previous blog I have tried to explain the OBIA customization types and its impacts on various components of the solution.
http://bidirect.blogspot.co.uk/2012/05/oracle-business-intelligence.html
In successive blogs of this series I would like to cover the naming conventions and development guidelines that should be followed when developing new attributes, tables, dimensions and facts in the Business Analytics Warehouse (BAW) and new/changed repository objects in the OBIEE repository.
http://bidirect.blogspot.co.uk/2012/05/oracle-business-intelligence.html
In successive blogs of this series I would like to cover the naming conventions and development guidelines that should be followed when developing new attributes, tables, dimensions and facts in the Business Analytics Warehouse (BAW) and new/changed repository objects in the OBIEE repository.
In this blog my focus would be more on database objects modification standard practices and guidelines such as tables, columns and indexes.
Why Best Practices? – The standard and guidelines for database schema, tables, columns and indexes is for better manageability and simplify the future upgrades. In long term it would help to reduce the application day to day cost and successive upgrades.
Database Schema and Objects
The Business Analytics Warehouse (BAW) is the pre-built Data Warehouse as part of BI-Apps. The database schema will be decided by the installation team. A default is “BAW”. It contains both the staging tables and DW target tables. All customized tables, columns, indexes and other database objects MUST be added to BAW schema. (Best Practice DB 1)
Why to add in BAW Schema? – For better manageability and avoid any cross schema joins at OBIEE or Informatica Level as I understand cross schema joins within repository and Informatica would more complicated than keeping it simple. The mantra is keep it simple.
The database objects changes which is been done within the database schema and objects should use best practices for creating and managing versions and comments option of underline databases. The database objects customization should use versions and comments features of database for better traceability. (Best Practice DB 2)
Why Versions and Comments? Putting comments on tables and columns based on phase of projects, user created and date would be useful and provide enough traceability of customization done.
In addition make sure that what so ever customization done to database objects should be imported into the DAC container. The database objects customization should be imported into the DAC container. (Best Practice DB 3)
Why Import? DAC should be your central repository which would be used to migrate the database objects customization from development environment to test and then production. We should avoid using word/text file as a storage option for database objects.
Tables
Naming of Tables
The naming conventions used for tables in the Oracle BI Data Warehouse are as follows. The naming conventions for tables defined in BI Data Warehouse should be followed if custom tables are created in the warehouse. (Best Practice DB 4)
Why Standard Naming Conentions? It would help to identify meaning of each of the table and its significance and usability in overall solution.
The names of the Oracle Data Warehouse tables consist of three parts:
Prefix
Name
Suffix
The three components are capitalized and separated by an underscore (_), e.g. W_GL_OTHER_F. Each component has a meaning, as described in the table below:
Component
|
Value
|
Meaning
|
Prefix
|
W_
|
Out-of-the-box warehouse table
|
Prefix
|
WC_
|
Custom warehouse table
|
Suffix
|
_F
|
Fact table (base)
|
Suffix
|
_FS
|
Staging Fact table (base)
|
Suffix
|
_A
|
Aggregate fact table (base)
|
Suffix
|
_D
|
Dimension table (base)
|
Suffix
|
_DS
|
Staging Dimension table (base)
|
Suffix
|
_DH
|
Dimension hierarchy table (base)
|
Suffix
|
_SCD
|
Slowly changing dimension table (base)
|
Suffix
|
_MD
|
Mini-dimension table
|
Suffix
|
_H
|
Helper table
|
Suffix
|
_HS
|
Helper Staging Table
|
Suffix
|
_TMP
|
Pre Staging Temporary table
|
Suffix
|
_G, _GS, _S
|
Internal Table
|
Prefix
|
_UD
|
Unbounded dimension
|
Prefix
|
_DHLS
|
Staging for Dimension Helper
|
Prefix
|
_DHL
|
Dimension Helper Table
|
Prefix
|
_M
|
Map Dimension
|
The types of tables used by the Oracle Business Analytics Warehouse are as follows. The developer should use following types of tables; I believe the list of the tables used has covered almost all types of tables possible for DW solution. Any new addition would be a treated as a new entrant in the family and should be welcomed. (Best Practice DB 5)
Table Type
|
Description
|
Aggregate tables (_A)
|
Contain summed (aggregated) data
|
Dimension tables (_D)
|
Star analysis dimensions
|
Staging tables for Dimension (_DS)
|
Tables used to hold dimension information that have not been through the final ETL transformations.
|
Staging tables for Usage Accelerator
(WS_)
|
Tables containing the necessary columns for the ETL transformations
|
Dimension Hierarchy tables (_DH)
|
Tables that store the dimension's hierarchical structure
|
Dimension Helper tables (_DHL)
|
Tables that store M:M relationships between two joining dimension tables
|
Staging tables for Dimension Helper
(_DHLS)
|
Staging tables for storing M:M relationships between
two joining dimension tables
|
Fact tables (_F)
|
Contain the metrics being analyzed by dimensions.
|
Fact Staging tables (_FS)
|
Staging tables used to hold the metrics being analyzed by dimensions that have not been through the final ETL transformations
|
Internal tables (_G, _GS, _S)
|
Internal tables are used primarily by ETL mappings for data transformation and controlling ETL runs
|
Helper tables (_H)
|
Helper tables are inserted between the fact and dimension tables to support a many-to-many (M:M) relationship between fact and dimension records
|
Map dimension tables (_M)
|
Tables that store cross-referencing mappings between the external data ID and the internal ID
|
Mini dimension tables (_MD)
|
Include combinations of the most queried attributes of their parent dimensions. The database joins these small tables to the fact tables
|
Primary extract tables (_PE)
|
Tables that are used to support the soft delete feature. The table includes all the primary key columns (integration ID column) from the source system. When a delete event happens, the full extract from the source compares the data previously extracted in the primary extract table to determine if a physical deletion was done in the transactional database. The soft delete feature is disabled by default. Therefore, the primary extract tables are not populated until you enable the soft delete feature
|
Unbounded dimension tables(_UD)
|
Tables containing information that is not bounded in transactional database data but should be treated as bounded data in the Oracle Business Analytics Warehouse
|
Columns
Naming of Columns
This section describes the naming conventions used for columns in the Oracle BI Data Warehouse. The naming conventions for columns defined in BI Data Warehouse should be followed if custom columns are created in the warehouse. (Best Practice DB 6)
Why Naming Convention for Columns? It would help to identify meaning of each of the column and its significance and usability in overall table usage.
The columns of the Oracle Data Warehouse tables consist of three parts:
Prefix
Name
Suffix
Component
|
Value
|
Meaning
|
Prefix
|
X_
|
Custom warehouse column
|
Suffix
|
_AMT
|
This column contains a currency amount
|
Suffix
|
_CD
|
The column value is based on the contents of the List of Values (LOV)
|
Suffix
|
_DT
|
This column contains a date or date time value.
|
Suffix
|
_FLG
|
This column contains a Boolean value where Y indicates Yes or True; N indicates No or False
|
Suffix
|
_I
|
Language Independent Code
|
Suffix
|
_ID
|
Columns are used in staging tables, they corresponds to the source identifier
|
Suffix
|
_WID
|
Identifier generated by Oracle BI linking dimension and fact tables, except for ROW_WID
|
Suffix
|
_NAME
|
Name corresponding to the code column (columns ending with _CODE)
|
Suffix
|
_DESC
|
Long Description corresponding to the code column (columns ending with _CODE)
|
Suffix
|
_NUM
|
This column contains a number or an identifying alphanumeric value
|
All new tables, whether Fact or Dimension MUST contain the following important fields :( Best Practice DB 7)
Why System Fields? It would provide metadata information about the data and DML operations. Metadata of data such as source identifier, source type, created by, creation date etc is useful in many ways such as setting up changed data capture, setting up DML operations such as insert, update and up-sert.
Column Name
|
Column Type
|
Required in Area
|
Description
|
INTEGRATION_ID
|
VARCHAR2(80 CHAR)
|
Staging / DW
|
Stores the primary key or unique identifier of a record in the source table.
|
DATASOURCE_NUM
|
NUMBER(10)
|
Staging / DW
|
Stores the data source from which the data is extracted – this is setup in DAC. E-Business Suite = 4. New external sources should have their own unique number.
|
ETL_PROC_WID
|
NUMBER(10)
|
DW
|
Stores the ID of the ETL process information; details of ETL process are stored in the W_ETL_RUN in the warehouse
|
ROW_WID
|
NUMBER
|
DW
|
Is the sequence number generated during the ETL process; unique identifier (sequence number) for the tables.
|
CREATED_ON_DT
|
DATE
|
Staging / DW
|
Stores the creation date of a record in the primary source table
|
CHANGED_ON_DT
AUX1_CHANGED_ON_DT
AUX2_CHANGED_ON_DT
AUX3_CHANGED_ON_DT
AUX4_CHANGED_ON_DT
|
DATE
|
Staging / DW
|
Stores the last update date of a record in the primary source table
Additional column prefixed by AUX# can be added to store the last update date related to auxiliary source tables
|
CREATED_BY_ID
|
VARCHAR2(80)
|
Staging
|
Stores the user id who created a record in the primary source table
|
CREATED_BY_WID
|
NUMBER(10,0)
|
DW
|
Surrogate key of the user id who created a record in the primary source table
|
CHANGED_BY_ID
|
VARCHAR2(80)
|
Staging
|
Stores the user id who last updated a record in the primary source table
|
CHANGED_BY_WID
|
NUMBER(10,0)
|
DW
|
Surrogate key of the user id who last updated a record in the primary source table
|
TENANT_ID
|
VARCHAR2(80)
|
Staging / DW
|
This permits to distinguish distinct source instances having a common data source num identifier
Default value is ‘DEFAULT’
|
Miscellaneous Technical columns for new tables
All new tables, MAY contain the following important fields:( Best Practice DB 8)
Why Flg Columns ? It would provide metadata information about the data and DML operations. Metadata of data such as source identifier, source type, created by, creation date etc is useful in many ways such as setting up changed data capture, setting up DML operations such as insert, update and upsert.
Column Name
|
Column type
|
Required in Area
|
Description
|
CURRENT_FLG
|
CHAR(1)
|
DW
|
For Slowly changing dimension purpose
|
DELETE_FLG
|
CHAR(1)
|
Staging / DW
|
For Soft/Hard delete purposes
|
W_INSERT_DT
|
DATE
|
DW
|
optional insertion date for ETL audit purpose
|
W_UPDATE_DT
|
DATE
|
DW
|
optional update date for ETL audit purpose
|
SRC_EFF_FROM_DATE
|
DATE
|
Staging / DW
|
For Type 2,3 or 6 slowly changing dimension purpose
|
SRC_EFF_TO_DATE
|
DATE
|
Staging / DW
|
For Type 2,3 or 6 slowly changing dimension purpose
|
EFFECTIVE_FROM_DATE
|
DATE
|
Staging / DW
|
For Type 2,3 or 6 slowly changing dimension purpose
|
EFFECTIVE_TO_DATE
|
DATE
|
Staging / DW
|
For Type 2,3 or 6 slowly changing dimension purpose
|
Miscellaneous Technical columns for new tables
The currency codes for related system columns are as follows:( Best Practice DB 9)
Why Currency Codes? To support reporting in multiple currencies is one of the prime features of BI Apps implementations.
System Column
|
Description
|
DOC_CURR_CODE
|
Code for the currency in which the document was created in the source system.
|
LOC_CURR_CODE
|
Usually the reporting currency code for the financial company in which the document was created.
|
GRP_CURR_CODE
|
The primary group reporting currency code for the group of companies or organizations in which the document was created.
|
LOC_EXCHANGE_RATE
|
Currency conversion rate from the document currency code to the
local currency code.
|
GLOBAL1_EXCHANGE_RATE
|
Currency conversion rate from the document currency code to the primary group currency code.
|
GLOBAL2_EXCHANGE_RATE
|
Currency conversion rate from the document currency code to the GLOBAL1 currency code.
|
GLOBAL3_EXCHANGE_RATE
|
Currency conversion rate from document currency code to the GLOBAL2 currency code.
|
Indexes
Naming of Indexes
The naming conventions used for Indexes in the Oracle BI Data Warehouse are as follows. The naming conventions for indexes defined in BI Data Warehouse should be followed if custom indexes are created in the warehouse. (Best Practice DB 10)
Why Naming Conventions for Indexes ? It would help to identify custom indexes and its significance and usability in overall solution.
The index name of the Oracle Data Warehouse tables consists of two parts
Prefix
Name
Suffix
The three components are capitalized and separated by an underscore (_), e.g. W_GL_BALANCE_F_U1, each component has a meaning as described in the table below.
Component
|
Value
|
Description
|
Prefix
|
W_
|
Standard Data Warehouse Table
|
Prefix
|
WC_
|
Custom Table
|
Name
|
XXX
|
Table Name (e.g. W_GL_BALANCE_F)
|
Suffix
|
_U1
|
Unique Index ( e.g. composite index on INTEGRATION_ID and DATASOURCE_NUM_ID
|
Suffix
|
_F1
|
Non Unique Dimensional Reference Key Index ( e.g. bitmap index such as BALANCE_DT_WID )
|
Suffix
|
_M1
|
Non Unique Index ( e.g. bitmap or normal index or composite normal index )
|
Best Practices for Indexes
The best practices for the indexes are as below.
· The custom indexes in Source System ( e.g. Oracle EBS) for incremental Loads performance can be categorized into following groups and action need to taken based on facts
o Tables that do not have indexes on LAST_UPDATE_DATE in the latest EBS releases, but there are no performance implications reported with indexes on LAST_UPDATE_DATE column.
o Tables that have indexes on LAST_UPDATE_DATE columns, introduced in Oracle EBS Release 12.
o Tables that cannot have indexes on LAST_UPDATE_DATE because of serious performance degradations in the source EBS environments.
The indexes on source system should have prefix OBIEE_ and DO NOT follow to the standard source system index naming conventions (e.g. OBIEE_OE_ORDER_HEADERS_ALL)(Best Practice DB 11)
Why? This is to make sure that these custom indexes are managed separately from the rest of the standard indexes.
· Staging tables typically do not require indices, not a strict rule
Don’t hesitate to include indexes on staging table if necessary(Best Practice DB12).
Why? Let common sense prevails, I hate strict rules; if you prove the index would help you in data load performance don’t worry go ahead and just do it.
· Create indices on all columns that the ETL uses for dimensions and facts. For example:
o ROW_WIDs of dimensions and facts
o INTEGRATION_ID
o DATASOURCE_NUM_ID
o Flags
o Dimensional Reference Keys of fact tables
Create indices on all system columns such as ROW_WID, INTEGRATION_ID etc used within ETL processes.This is absolutely MUST for ETL performance. (Best Practice DB13).
Why? ETL Performance, without these indices I can tell you that your ETL won’t run efficiently.
· Create Bitmap Indexes for Dimensional reference keys and flags, treat this a MUST requirement. (Best Practice DB14).
Why? With Bitmap Indexes for Dimensional reference keys and flags allows us to use STAR TRANSFORMATION.I believe it is absolutely essential to achieve best query performances.
Make sure that the database parameter for STAR TRANSFORMATION & Repository Database Parameters setting has been in place.
· Carefully consider on which columns to put filter conditions (Best Practice DB15).
Why? Setting up index on column which we used to put condition in report would improve performance for obvious reason.
· Define indices to improve query performance(Best Practice DB16)
o Inspect standard repository objects for guidance
Why? The index creation processes should be driven by business requirements rather than physical structures and conventional wisdom. I believe the reporting system is a mainly a business support system and should be driven by business requirements.
· Register new indices in the DAC, treat this as a MUST requirement ( Best Practice DB17)
Options within DAC are
o Local or Partitioned Index
o Query Index or ETL Index
o The DAC server drops and re-creates indices during a full load and /or incremental load
Why? As I said earlier DAC should be your central repository for all database objects which includes indexes. This is helpful during migration of code from development instance to test and finally to production. DAC would be useful to maintain drop and re-recreate indices during full load and/or incremental load.
· Review/Revisit Index Strategy on regular basis ( Best Practice DB18)
Why? Review and Revisit the Indices should be done on regular basis, particularly with every new release which might end up with different usage pattern and query performances.
· The best practices for handling BI Applications indexes in Exadata Warehouse (Best Practice DB19):
Why? Exadata, Exalytics are specially designed engineering system and requires different approach. Based on my experience in this area the success mantra is don’t rush to any conclusion, make sure that proper comprehensive benchmarking is done before dropping or disabling any indices.
o Turn on Index usage monitoring to identify any unused indexes and drop / disable them in your environment.
o Consider pinning the critical target tables in smart flash cache
o Consider building custom aggregates to pre-aggregate more data and simplify queries performance.
o Drop selected query indexes and disable them in DAC to use Exadata Storage Indexes / Full Table Scans only after running comprehensive benchmarks and ensuring no impact on any other queries performance.
I hope you find the second part of this series informative and useful. My plan is to write next couple of blogs on guideline on Informatica, DAC and Repository customisation.
Excellent post! Very helpful. Looking forward to the next few posts on Informatica, DAC and Repository customisation.
ReplyDeleteWith the advent of cloud computing, businesses of all sizes are looking to improve their data warehouse architectures.Azure Synapse is a tool that helps to integrate on-premises data sources with a cloud-based data warehouse. It transparently ingests data from on-premises data sources, and then transfers the data to Azure, where it is stored in Azure data warehouse. Azure Synapse also pushes data between Azure and on-premises data warehouses.
ReplyDelete