Saturday, 22 February 2014

OBIA 7964 - Need To Tune Lengthy ETL - Analyze Table ETL step is performing very slowly

Performance of daily incremental load is vital with increasing demand for more and more data from structural and non-structural data sources. There is always a desire to limit the daily incremental load window to accommodate more data acquisition routines to enhance analytical capabilities of business users. A reporting solution spanning across different time zones make it more difficult.

There is no substitute for a good code. This blog is an attempt to explain the overall incremental/full load process, different phases of incremental/full load, performance bottlenecks across these phases in general, and how one can improve performance when Analyze Table ETL step is performing very slowly.  This blog is based on an oracle tech note published around analyze table configuration.

OBIA Load Phases

OBIA Load Process has different phases such as dimension load, fact load, post load process etc. I would like to categories them into following areas based on loading phase and activities performed by tasks.

  • Change Capture Phase
  • SDE – Source Dependent Extract
  • SIL – Source Independent Loading
  • PLP – Post Load Process
  • Analyze Tables
  • Drop and Create Indexes 
To investigate issues around performance it is very important to look into current extraction and loading timings around these categories. The performance issue might be anywhere in the ETL spectrum drawn.

Analyze Table ETL step runs after SIL or PLP data load, the purpose of this step is to gather statistics of reporting table which offers benefits in performance while executing reports against the reporting tables. This blog is an attempt to discuss the necessity of analyze tables, its cost VS benefits analysis and also discuss number of options to improve overall performance of the analyze table ETL step.

Why Analyze Tables? Costs Vs Benefits 

Whether or not execute "analyze table" is a decision taken based on costs and benefits of performance. The recommendation is to analyze because, the database is gathering statistics about the data. It uses the statistics to access the data more efficiently, both during ETL and during BI reporting i.e. when your end-users execute BI reports or dashboards.

If you do less or no analysis of db statistics, then ETL and reporting performance will be degraded. How much is projecting dependent, but the recommendation is to do it and that is therefore the default setting.

Exclude Table from Analyze Table?

Is it essential to do analyze all reporting and aggregate tables? Is there any option by which one can exclude table from analyze table step. One can turn off Analyze for all tables or turn it on for truncated tables only, in DAC UI.

How frequently should it happen?

In short it is beneficial to analyze table for ETL and reporting performance then next question is how frequently analyze table should be executed. Is there any way to perform the analyze step less frequently, e.g. not every day. The wisdom says if there is no significant change in data volumes of table, analyze table won’t help at all.

DAC support configuration to perform the analyze step less frequently, the steps to achieve these are as follows 

Setup View, in Physical Data Sources Tab: Analyze Frequencies Subtab Analyze frequencies in days by table type.

You can see the possible table types by looking in the Design View at the Tables Tab; each table has a type like Aggregate, Dimension...etc

You will need to rebuild your execution plan when you make changes, because this rebuild is where DAC decides which Tasks and sub-Tasks to execute

Scope of Analyze Table?

The default syntax DAC uses for analyzing tables is as follows. The custom.xml file, location in the <DAC_Config_Location>\CustomSQLs directory, contains the default syntax DAC

<SqlQuery name = "ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> DBMS_STATS.GATHER_TABLE_STATS(ownname => '@TABLEOWNER', tabname => '%1', estimate_percent => 30, method_opt => 'FOR ALL COLUMNS SIZE AUTO',cascade => true ) </SqlQuery>

The estimate percent, method opt and cascade are the parameters which decide the scope of analyze table.  It is important to choose right values for the mentioned parameters as it determine the execution time for analyze table step. For estimation percent 100% it is bound to take much longer than if you set it to 15% or 20%. While method opt may have option to choose all columns or indexed columns only.

Global - Changing Behavior of Analyze Table ETL Step

For changing behavior of analyze table ETL step, one has to change the customsql.xml file. Note that modifying the cusomsql.xml affect the syntax on a global level means it is applicable to all tables chosen for analyze and for all execution plan.

To edit the Analyze Table syntax, the steps are as follows  

Open the customsql.xml file located in the <DAC_Config_Location>\CustomSQLs directory.
Locate the Analyze Table syntax for the appropriate database type.

For example, to gather statistics for only the indexes column, one can edit the syntax as follows

<SqlQuery name = "ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> DBMS_STATS.GATHER_TABLE_STATS(ownname => '@TABLEOWNER', tabname => '%1', estimate_percent => 30, method_opt => 'FOR ALL INDEXED COLUMNS',cascade => true ) </SqlQuery>

The Action functionality can be used to change behavior of the analyze table ETL step for a selected table.

Analyze but Less Frequently

Is there any way by which one can set the analyze table but less frequently rather than executing it every day. In DAC and above, in Setup View, in Physical Data Sources Tab: Analyze Frequencies Subtab Analyze frequencies in days by table type. It is done for the specific table type e.g. Aggregate Dimension etc.

Once done, the execution plan is to rebuild to take this change in to effect. 

Careful about Configurations

It is very essential to make sure that you set configuration right, the wrong configuration parameters might add in your performance vows.

Check CreateQueryIndexesAtTheEnd parameter if when this property is set to True, these tables will be analyzed twice.

If any indices are marked as Query type indices and are used by ETL processes it can adversely affect the performance of the ETL process.

The hope you find this information useful. 

No comments:

Post a Comment