Pages

Sunday 13 January 2013

Oracle BI Apps 7.9.6.3 Tips and Best Practice Guidelines for Customization ( Database Objects)– Part B

In previous blog I have tried to explain the OBIA customization types and its impacts on various components of the solution.

http://bidirect.blogspot.co.uk/2012/05/oracle-business-intelligence.html

In successive blogs of this series I would like to cover the naming conventions and development guidelines that should be followed when developing new attributes, tables, dimensions and facts in the Business Analytics Warehouse (BAW) and new/changed repository objects in the OBIEE repository.

In this blog my focus would be more on database objects modification standard practices and guidelines such as tables, columns and indexes.

Why Best Practices? – The standard and guidelines for database schema, tables, columns and indexes is for better manageability and simplify the future upgrades. In long term it would help to reduce the application day to day cost and successive upgrades.
 
Database Schema and Objects

The Business Analytics Warehouse (BAW) is the pre-built Data Warehouse as part of BI-Apps.  The database schema will be decided by the installation team.  A default is “BAW”.  It contains both the staging tables and DW target tables. All customized tables, columns, indexes and other database objects MUST be added to BAW schema. (Best Practice DB 1)

Why to add in BAW Schema? – For better manageability and avoid any cross schema joins at OBIEE or Informatica Level as I understand cross schema joins within repository and Informatica would more complicated than keeping it simple. The mantra is keep it simple.

The database objects changes which is been done within the database schema and objects should use best practices for creating and managing versions and comments option of underline databases. The database objects customization should use versions and comments features of database for better traceability. (Best Practice DB 2)

Why Versions and Comments? Putting comments on tables and columns based on phase of projects, user created and date would be useful and provide enough traceability of customization done.

In addition make sure that what so ever customization done to database objects should be imported into the DAC container. The database objects customization should be imported into the DAC container. (Best Practice DB 3)

Why Import? DAC should be your central repository which would be used to migrate the database objects customization from development environment to test and then production. We should avoid using word/text file as a storage option for database objects.


Tables

Naming of Tables

The naming conventions used for tables in the Oracle BI Data Warehouse are as follows.  The naming conventions for tables defined in BI Data Warehouse should be followed if custom tables are created in the warehouse. (Best Practice DB 4)

Why Standard Naming Conentions? It would help to identify meaning of each of the table and its significance and usability in overall solution.

The names of the Oracle Data Warehouse tables consist of three parts:

Prefix
Name
Suffix

The three components are capitalized and separated by an underscore (_), e.g. W_GL_OTHER_F. Each component has a meaning, as described in the table below:

Component
Value
Meaning
Prefix
W_
Out-of-the-box warehouse table
Prefix
WC_
Custom warehouse table
Suffix
_F
Fact table (base)
Suffix
_FS
Staging Fact table (base)
Suffix
_A
Aggregate fact table (base)
Suffix
_D
Dimension table (base)
Suffix
_DS
Staging Dimension table (base)
Suffix
_DH
Dimension hierarchy table (base)
Suffix
_SCD
Slowly changing dimension table (base)
Suffix
_MD
Mini-dimension table
Suffix
_H
Helper table
Suffix
_HS
Helper Staging Table
Suffix
_TMP
Pre Staging Temporary table
Suffix
_G, _GS, _S
Internal Table
Prefix
_UD
Unbounded dimension
Prefix
_DHLS
Staging for Dimension Helper
Prefix
_DHL
Dimension Helper Table
Prefix
_M
Map Dimension


The types of tables used by the Oracle Business Analytics Warehouse are as follows. The developer should use following types of tables; I believe the list of the tables used has covered almost all types of tables possible for DW solution. Any new addition would be a treated as a new entrant in the family and should be welcomed. (Best Practice DB 5)

Why different types? The type of the table and its description would be helpful to understand its significance in overall solution

Table Type
Description
Aggregate tables (_A)
Contain summed (aggregated) data
Dimension tables (_D)
Star analysis dimensions
Staging tables for Dimension (_DS)
Tables used to hold dimension information that have not been through the final ETL transformations.
Staging tables for Usage Accelerator
(WS_)
Tables containing the necessary columns for the ETL transformations
Dimension Hierarchy tables (_DH)
Tables that store the dimension's hierarchical structure
Dimension Helper tables (_DHL)
Tables that store M:M relationships between two joining dimension tables
Staging tables for Dimension Helper
(_DHLS)
Staging tables for storing M:M relationships between
two joining dimension tables
Fact tables (_F)
Contain the metrics being analyzed by dimensions.
Fact Staging tables (_FS)
Staging tables used to hold the metrics being analyzed by dimensions that have not been through the final ETL transformations
Internal tables (_G, _GS, _S)
Internal tables are used primarily by ETL mappings for data transformation and controlling ETL runs
Helper tables (_H)
Helper tables are inserted between the fact and dimension tables to support a many-to-many (M:M) relationship between fact and dimension records
Map dimension tables (_M)
Tables that store cross-referencing mappings between the external data ID and the internal ID
Mini dimension tables (_MD)
Include combinations of the most queried attributes of their parent dimensions. The database joins these small tables to the fact tables
Primary extract tables (_PE)
Tables that are used to support the soft delete feature. The table includes all the primary key columns (integration ID column) from the source system. When a delete event happens, the full extract from the source compares the data previously extracted in the primary extract table to determine if a physical deletion was done in the transactional database. The soft delete feature is disabled by default. Therefore, the primary extract tables are not populated until you enable the soft delete feature
Unbounded dimension tables(_UD)
Tables containing information that is not bounded in transactional database data but should be treated as bounded data in the Oracle Business Analytics Warehouse

Columns

Naming of Columns

This section describes the naming conventions used for columns in the Oracle BI Data Warehouse.  The naming conventions for columns defined in BI Data Warehouse should be followed if custom columns are created in the warehouse. (Best Practice DB 6)

Why Naming Convention for Columns? It would help to identify meaning of each of the column and its significance and usability in overall table usage.

The columns of the Oracle Data Warehouse tables consist of three parts:

Prefix
Name
Suffix

The three components are capitalized and separated by an underscore (_), e.g. X_BUDGET_AMT. Each component has a meaning, as described in the table below:

Component
Value
Meaning
Prefix
X_
Custom warehouse column
Suffix
_AMT
This column contains a currency amount
Suffix
_CD
The column value is based on the contents of the List of Values (LOV)
Suffix
_DT
This column contains a date or date time value.
Suffix
_FLG
This column contains a Boolean value where Y indicates Yes or True; N indicates No or False
Suffix
_I
Language Independent Code
Suffix
_ID
Columns are used in staging tables, they corresponds to the source identifier
Suffix
_WID
Identifier generated by Oracle BI linking dimension and fact tables, except for ROW_WID
Suffix
_NAME
Name corresponding to the code column (columns ending with _CODE)
Suffix
_DESC
Long Description corresponding to the code column (columns ending with _CODE)
Suffix
_NUM
This column contains a number or an identifying alphanumeric value

All new tables, whether Fact or Dimension MUST contain the following important fields :( Best Practice DB 7)

Why System Fields? It would provide metadata information about the data and DML operations. Metadata of data such as source identifier, source type, created by, creation date etc is useful in many ways such as setting up changed data capture, setting up DML operations such as insert, update and up-sert.


Column Name
Column Type
Required in Area
Description
INTEGRATION_ID
VARCHAR2(80 CHAR)
Staging / DW
Stores the primary key or unique identifier of a record in the source table.
DATASOURCE_NUM
NUMBER(10)
Staging / DW
Stores the data source from which the data is extracted – this is setup in DAC.  E-Business Suite = 4.  New external sources should have their own unique number.
ETL_PROC_WID
NUMBER(10)
DW
Stores the ID of the ETL process information; details of ETL process are stored in the W_ETL_RUN in the warehouse
ROW_WID
NUMBER
DW
Is the sequence number generated during the ETL process; unique identifier (sequence number) for the tables.
CREATED_ON_DT
DATE
Staging / DW
Stores the creation date of a record in the primary source table
CHANGED_ON_DT
AUX1_CHANGED_ON_DT
AUX2_CHANGED_ON_DT
AUX3_CHANGED_ON_DT
AUX4_CHANGED_ON_DT
DATE
Staging / DW
Stores the last update date of a record in the primary source table
Additional column prefixed by AUX# can be added to store the last update date related to auxiliary source tables
CREATED_BY_ID
VARCHAR2(80)
Staging
Stores the user id who created a record in the primary source table
CREATED_BY_WID
NUMBER(10,0)
DW
Surrogate key of the user id who created a record in the primary source table
CHANGED_BY_ID
VARCHAR2(80)
Staging
Stores the user id who last updated a record in the primary source table
CHANGED_BY_WID
NUMBER(10,0)
DW
Surrogate key of the user id who last updated a record in the primary source table
TENANT_ID
VARCHAR2(80)
Staging / DW
This permits to distinguish distinct source instances having a common data source num identifier
Default value is ‘DEFAULT’



Miscellaneous Technical columns for new tables


All new tables, MAY contain the following important fields:( Best Practice DB 8)

Why Flg Columns ? It would provide metadata information about the data and DML operations. Metadata of data such as source identifier, source type, created by, creation date etc is useful in many ways such as setting up changed data capture, setting up DML operations such as insert, update and upsert.     


Column Name
Column type
Required in Area
Description
CURRENT_FLG
CHAR(1)
DW
For Slowly changing dimension purpose
DELETE_FLG
CHAR(1)
Staging / DW
For Soft/Hard delete purposes
W_INSERT_DT
DATE
DW
optional insertion date for ETL audit purpose
W_UPDATE_DT
DATE
DW
optional update date for ETL audit purpose
SRC_EFF_FROM_DATE
DATE
Staging / DW
For Type 2,3 or 6 slowly changing dimension purpose
SRC_EFF_TO_DATE
DATE
Staging / DW
For Type 2,3 or 6 slowly changing dimension purpose
EFFECTIVE_FROM_DATE
DATE
Staging / DW
For Type 2,3 or 6 slowly changing dimension purpose
EFFECTIVE_TO_DATE
DATE
Staging / DW
For Type 2,3 or 6 slowly changing dimension purpose



Miscellaneous Technical columns for new tables

The currency codes for related system columns are as follows:( Best Practice DB 9)

Why Currency Codes? To support reporting in multiple currencies is one of the prime features of BI Apps implementations.       


System Column
Description
DOC_CURR_CODE
Code for the currency in which the document was created in the source system.
LOC_CURR_CODE
Usually the reporting currency code for the financial company in which the document was created.
GRP_CURR_CODE
The primary group reporting currency code for the group of companies or organizations in which the document was created.
LOC_EXCHANGE_RATE
Currency conversion rate from the document currency code to the
local currency code.
GLOBAL1_EXCHANGE_RATE
Currency conversion rate from the document currency code to the primary group currency code.
GLOBAL2_EXCHANGE_RATE
Currency conversion rate from the document currency code to the GLOBAL1 currency code.
GLOBAL3_EXCHANGE_RATE
Currency conversion rate from document currency code to the GLOBAL2 currency code.

























Indexes

Naming of Indexes

The naming conventions used for Indexes in the Oracle BI Data Warehouse are as follows. The naming conventions for indexes defined in BI Data Warehouse should be followed if custom indexes are created in the warehouse. (Best Practice DB 10)

Why Naming Conventions for Indexes ? It would help to identify custom indexes and its significance and usability in overall solution.

The index name of the Oracle Data Warehouse tables consists of two parts

Prefix
Name
Suffix

The three components are capitalized and separated by an underscore (_), e.g. W_GL_BALANCE_F_U1, each component has a meaning as described in the table below.

Component
Value
Description
Prefix
W_
Standard Data Warehouse Table
Prefix
WC_
Custom Table
Name
XXX
Table Name (e.g. W_GL_BALANCE_F)
Suffix
_U1
Unique Index ( e.g. composite index on INTEGRATION_ID and DATASOURCE_NUM_ID
Suffix
_F1
Non Unique Dimensional Reference Key Index ( e.g.  bitmap index such as BALANCE_DT_WID )
Suffix
_M1
Non Unique Index ( e.g. bitmap or normal index or composite normal index )
















Best Practices for Indexes
The best practices for the indexes are as below.

·         The custom indexes in Source System ( e.g. Oracle EBS) for incremental Loads performance can be categorized into following groups and action need to taken based on facts
o   Tables that do not have indexes on LAST_UPDATE_DATE in the latest EBS releases, but there are no performance implications reported with indexes on LAST_UPDATE_DATE column.
o   Tables that have indexes on LAST_UPDATE_DATE columns, introduced in Oracle EBS Release 12.
o   Tables that cannot have indexes on LAST_UPDATE_DATE because of serious performance degradations in the source EBS environments.
The indexes on source system should have prefix OBIEE_ and DO NOT follow to the standard source system index naming conventions (e.g. OBIEE_OE_ORDER_HEADERS_ALL)(Best Practice DB 11)

Why? This is to make sure that these custom indexes are managed separately from the rest of the standard indexes.
·         Staging tables typically do not require indices, not a strict rule
Don’t hesitate to include indexes on staging table if necessary(Best Practice DB12).  

Why? Let common sense prevails, I hate strict rules; if you prove the index would help you in data load performance don’t worry go ahead and just do it.

·         Create indices on all columns that the ETL uses for dimensions and facts. For example:
o   ROW_WIDs of dimensions and facts
o   INTEGRATION_ID
o   DATASOURCE_NUM_ID
o   Flags
o   Dimensional Reference Keys  of fact tables
Create indices on all system columns such as ROW_WID, INTEGRATION_ID etc used within ETL processes.This is absolutely MUST for ETL performance. (Best Practice DB13).    

Why? ETL Performance, without these indices I can tell you that your ETL won’t run efficiently.

·         Create Bitmap Indexes for Dimensional reference keys and flags, treat this a MUST requirement. (Best Practice DB14).   
Why? With Bitmap Indexes for Dimensional reference keys and flags allows us to use STAR TRANSFORMATION.I believe it is absolutely essential to achieve best query performances.

Make sure that the database parameter for STAR TRANSFORMATION & Repository Database Parameters setting has been in place.  
·         Carefully consider on which columns to put filter conditions (Best Practice DB15).
Why? Setting up index on column which we used to put condition in report would improve performance for obvious reason.
·         Define indices to improve query performance(Best Practice DB16)
o   Inspect standard repository objects for guidance
Why? The index creation processes should be driven by business requirements rather than physical structures and conventional wisdom. I believe the reporting system is a mainly a business support system and should be driven by business requirements.

·         Register new indices in the DAC, treat this as a MUST requirement ( Best Practice DB17)
Options within DAC are
o   Local or Partitioned Index
o   Query Index or ETL Index
o   The DAC server drops and re-creates indices during a full load and /or incremental load

Why?  As I said earlier DAC should be your central repository for all database objects which includes indexes. This is helpful during migration of code from development instance to test and finally to production. DAC would be useful to maintain drop and re-recreate indices during full load and/or incremental load.

·         Review/Revisit Index Strategy on regular basis ( Best Practice DB18)

Why?  Review and Revisit the Indices should be done on regular basis, particularly with every new release which might end up with different usage pattern and query performances.

·         The best practices for handling BI Applications indexes in Exadata Warehouse (Best Practice DB19):

Why? Exadata, Exalytics are specially designed engineering system and requires different approach. Based on my experience in this area the success mantra is don’t rush to any conclusion, make sure that proper comprehensive benchmarking is done before dropping or disabling any indices.
o   Turn on Index usage monitoring to identify any unused indexes and drop / disable them in your environment.
o   Consider pinning the critical target tables in smart flash cache
o   Consider building custom aggregates to pre-aggregate more data and simplify queries performance.
o   Drop selected query indexes and disable them in DAC to use Exadata Storage Indexes / Full Table Scans only after running comprehensive benchmarks and ensuring no impact on any other queries performance.
I hope you find the second part of this series informative and useful. My plan is to write next couple of blogs on guideline on Informatica, DAC and Repository customisation.

2 comments:

  1. Excellent post! Very helpful. Looking forward to the next few posts on Informatica, DAC and Repository customisation.

    ReplyDelete
  2. With the advent of cloud computing, businesses of all sizes are looking to improve their data warehouse architectures.Azure Synapse is a tool that helps to integrate on-premises data sources with a cloud-based data warehouse. It transparently ingests data from on-premises data sources, and then transfers the data to Azure, where it is stored in Azure data warehouse. Azure Synapse also pushes data between Azure and on-premises data warehouses.

    ReplyDelete