Statistical Functions included in Oracle 11g Database is a compelling array of statistical functions accessible from through SQL. These include descriptive statistics, hypothesis testing, correlations analysis, test for distribution fit, cross tabs with Chi-square statistics, and analysis of variance (ANOVA).
The purpose of this blog is to discuss how oracle statistical functions can add a value to BI Offerings to the customer and end users. The idea is to demonstrate how oracle statistical functions can be used to deliver something extra via OBIEE front end.
What is Statistics?
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities. Statistics are also used for making informed decisions – and misused for other reasons – in all areas of business and government.
Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics.
In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is called inferential statistics.
Both descriptive and inferential statistics comprise applied statistics.
In-database Statistical Function
Unlike statistical software vendors, which charge separate license fees for these statistical capabilities, Oracle 11g Database includes this wide feature set with the database license. The statistical function within database can also eliminate the movement and staging of data to external systems to perform statistical analysis.
Eliminating data movement and staging increases security since copies of the data need not reside in insecure environment.
Eliminating data movement and staging decreases time to results since the overhead associated with moving the data is eliminated.
Because the statistical results are immediately available through SQL and integrated in Oracle Database, statistical results can be immediately used across many applications or reporting solutions.
Statistical Functions Usage
The statistical functions in the database can be used in a variety of ways, for example, users can call Oracle's DBMS_STAT_FUNCS to obtain basic count, mean, max, min and standard deviation information for their dataset; or users can determine the strength of relationships using hypothesis testing statistics such as a t-test, f-test or ANOVA. Users are able to not only complete a wide range of statistics, but also include these results in more advanced SQL queries and analytical pipelines.
Oracle Statistics & SQL Analytics
Oracle supports following in-database statistics.
Average, standard deviation, variance, min, max, median (via percentile_count), mode, group-by & roll-up
DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values
Correlations Analysis (parametric and nonparametric)
Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric).
Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa
Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA
Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential
Pareto Analysis (documented)
80:20 rule, cumulative results table
Rank, dense_rank, cume_dist, percent_rank, ntile
Window Aggregate functions (moving and cumulative)
Avg, sum, min, max, count, variance, stddev, first_value, last_value
Direct inter-row reference using offsets
Reporting Aggregate Functions
Sum, avg, min, max, variance, stddev, count, ratio_to_report
Correlation, linear regression family, covariance
Fitting of an ordinary-least-squares regression line to a set of number pairs.
Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions.
OBIEE Integration Opportunities
Computing statistics is just beginning. Integration of result with the business data is vital. As the statistical results are immediately available through SQL and integrated in Oracle Database, statistical results can be immediately used across the Oracle stack - unleashing many more opportunities to leverage your results in spontaneous and unexpected ways.
There are number of options by which we can integrate the statistical function into the business. I would like to focus on its integration with Oracle Business Intelligence. Oracle Business Intelligence can leverage results, combining the statistical results with other data.The following are four possibilities by which Oracle Statistical Functions can be integrated with OBIEE.
The OBIEE Direct Database Request option allows executing SQL statement which uses the oracle statistical function directly against a data source. The Oracle BI Server sends unprocessed, user-entered, physical SQL directly to chosen database. The returned results can be rendered in Oracle Presentation Services, which can be treated as an Oracle BI Analyze.
The EVALUATE function within the report definition allows calling the database analytic functions by passing the necessary parameters. The statistical function in Oracle database can be invoked from OBIEE using the EVALUATE function. The function would be executed as part of the report generation SQL against the chosen database.
Integrate in Common Enterprise Information Model
This is the standard metadata design method by which all required database objects and statistical functions and output columns are exposed to physical layer, then modeled within logical layer of OBIEE and then exposed to presentation layer.
Direct Interactions with Opaque Views by Using Session Variables
This is an option where one can define Opaque View (view within the OBIEE physical layer) which uses the statistical function within the SQL definition of opaque view. The session variables are used as substitute variables within the SQL definition. The value of session variables can be set by user selected prompt to drive the opaque view output. This allows direct interactions with the statistical function and open opportunities to carry out what if analysis.
Statistical Function – OBIEE Integration ExampleThis section attempts to demonstrate using analytical/statistical function within the report definition by using EVALUATE function.
This function passes the specified database function that is not supported by Oracle BI Server with optional referenced columns as parameters to the back-end data source for evaluation.
The embedded database function may require one or more columns. These columns are referenced by %1 ... %N within the function. The actual columns must be listed after the function.
EVALUATE('db_function(%1...%N)' [AS data_type] [, column1, columnN])
SELECT EVALUATE ('instr(%1, %2)', address, 'Foster City') FROM employees
Note – The ability to use EVALUATE is disabled by default. To enable support for this function, change the EVALUATE_SUPPORT_LEVEL parameter in NQSConfig.INI.
This example uses the database function LAG to compute % Difference from previous row. The database analytic function PERCENT_RANK computes the percent rank value and CUME_DIST computes the cumulative distribution.
An analysis is designed with following columns
The column definitions are as follows.
Organization: - "Offices"."D3 Organization"
Revenue: - "Base Facts"."1- Revenue"
Product: - "Products"."P1 Product"
% Diff - Previous row :- ("Base Facts"."1- Revenue"- CAST(EVALUATE ('LAG (%1,1,0) OVER
(PARTITION BY %2 ORDER BY %1) ' ,"Base Facts"."1- Revenue" , "Offices"."D3 Organization") AS DOUBLE))/ CAST(EVALUATE ('LAG (%1,1,0) OVER (PARTITION BY %2 ORDER BY %1) ' ,"Base Facts"."1- Revenue" , "Offices"."D3 Organization") AS DOUBLE) * 100
Cumulative Dist :- EVALUATE('ROUND(CUME_DIST () OVER (PARTITION BY %1 ORDER BY %2),3)', "Offices"."D3 Organization", "Base Facts"."1- Revenue")
Percent Rank: - CAST(EVALUATE (' PERCENT_RANK() OVER (PARTITION BY %1 ORDER BY %2)', "Offices"."D3 Organization","Base Facts"."1- Revenue") AS DOUBLE) * 100
The analysis output for the selected columns and calculated columns with EVALUATE function for invoking the statistical function LAG (), CUME_DIST () and PERCENT_RANK () is shown below.
The statistical results can be immediately used within Oracle Business Intelligence which unleashes number of opportunities to combine these outputs in business data. This can add a real value in your BI Offerings.
The Oracle Advanced Analytics Option, a combination of Oracle Data Mining and Oracle R Enterprise, delivers predictive analytics, data mining, text mining, statistical analysis, advanced numerical computations and interactive graphics inside the database. This brings powerful computations to the database resulting in dramatic improvements in information discovery, scalability, security, and savings.