Performance of daily incremental load is
vital with increasing demand for more and more data from structural and
non-structural data sources. There is always a desire to limit the daily
incremental load window to accommodate more data acquisition routines to enhance
analytical capabilities of business users. A reporting solution spanning across
different time zones make it more difficult.
There is no substitute for a good code. This
blog is an attempt to explain the overall incremental/full load process, different
phases of incremental/full load, performance bottlenecks across these phases in
general, and how one can improve performance when Analyze Table ETL step is
performing very slowly. This blog is
based on an oracle tech note published around analyze table configuration.
OBIA Load Phases
OBIA
Load Process has different phases such as dimension load, fact load, post load
process etc. I would like to categories them into following areas based on
loading phase and activities performed by tasks.
- Change Capture Phase
- SDE – Source Dependent Extract
- SIL – Source Independent Loading
- PLP – Post Load Process
- Analyze Tables
- Drop and Create Indexes
To investigate issues around performance it
is very important to look into current extraction and loading timings around
these categories. The performance issue might be anywhere in the ETL spectrum
drawn.
Analyze Table ETL step runs after SIL or PLP
data load, the purpose of this step is to gather statistics of reporting table
which offers benefits in performance while executing reports against the
reporting tables. This blog is an attempt to discuss the necessity of analyze
tables, its cost VS benefits analysis and also discuss number of options to
improve overall performance of the analyze table ETL step.
Why Analyze Tables?
Costs Vs Benefits
Whether or not execute "analyze table"
is a decision taken based on costs and benefits of performance. The
recommendation is to analyze because, the database is gathering statistics
about the data. It uses the statistics to access the data more efficiently,
both during ETL and during BI reporting i.e. when your end-users execute BI
reports or dashboards.
If you do less or no analysis of db
statistics, then ETL and reporting performance will be degraded. How much is
projecting dependent, but the recommendation is to do it and that is therefore
the default setting.
Exclude Table from
Analyze Table?
Is it essential to do analyze all reporting
and aggregate tables? Is there any option by which one can exclude table from
analyze table step. One can turn off Analyze for all tables or turn it on for
truncated tables only, in DAC UI.
How frequently should
it happen?
In short it is beneficial to analyze table
for ETL and reporting performance then next question is how frequently analyze
table should be executed. Is there any way to perform the analyze step less
frequently, e.g. not every day. The wisdom says if there is no significant
change in data volumes of table, analyze table won’t help at all.
DAC support configuration to perform the
analyze step less frequently, the steps to achieve these are as follows
Setup View, in Physical Data Sources Tab:
Analyze Frequencies Subtab Analyze frequencies in days by table type.
You can see the possible table types by looking
in the Design View at the Tables Tab; each table has a type like Aggregate,
Dimension...etc
You will need to rebuild your execution plan when you make changes, because this rebuild is where DAC decides which Tasks and sub-Tasks to execute
Scope of Analyze
Table?
The default syntax DAC uses for analyzing
tables is as follows. The custom.xml file, location in the <DAC_Config_Location>\CustomSQLs
directory, contains the default syntax DAC
<SqlQuery name =
"ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE">
DBMS_STATS.GATHER_TABLE_STATS(ownname => '@TABLEOWNER', tabname => '%1',
estimate_percent => 30, method_opt => 'FOR ALL COLUMNS SIZE AUTO',cascade
=> true ) </SqlQuery>
The estimate percent, method opt and cascade
are the parameters which decide the scope of analyze table. It is important to choose right values for
the mentioned parameters as it determine the execution time for analyze table
step. For estimation percent 100% it is bound to take much longer than if you
set it to 15% or 20%. While method opt may have option to choose all columns or
indexed columns only.
Global - Changing
Behavior of Analyze Table ETL Step
For changing behavior of analyze table ETL step,
one has to change the customsql.xml file. Note that modifying the cusomsql.xml
affect the syntax on a global level means it is applicable to all tables chosen
for analyze and for all execution plan.
To edit the Analyze Table syntax, the steps
are as follows
Open the customsql.xml file located in the
<DAC_Config_Location>\CustomSQLs directory.
Locate the Analyze Table syntax for the
appropriate database type.
For example, to gather statistics for only
the indexes column, one can edit the syntax as follows
<SqlQuery name =
"ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '@TABLEOWNER', tabname => '%1', estimate_percent => 30, method_opt
=> 'FOR ALL INDEXED COLUMNS',cascade => true ) </SqlQuery>
The Action functionality can be used to
change behavior of the analyze table ETL step for a selected table.
Analyze but Less
Frequently
Is there any way by which one can set the
analyze table but less frequently rather than executing it every day. In DAC
10.1.3.4.1 and above, in Setup View, in Physical Data Sources Tab: Analyze
Frequencies Subtab Analyze frequencies in days by table type. It is done for
the specific table type e.g. Aggregate Dimension etc.
Once done, the execution plan is to rebuild
to take this change in to effect.
Careful about
Configurations
It is very essential to make sure that you
set configuration right, the wrong configuration parameters might add in your
performance vows.
Check CreateQueryIndexesAtTheEnd parameter if
when this property is set to True, these tables will be analyzed twice.
If any indices are marked as Query type
indices and are used by ETL processes it can adversely affect the performance
of the ETL process.
The hope you find this information useful.