Pages

Friday, 29 June 2012

Oracle Business Intelligence Enterprise Edition (OBIEE) 11g Integration Possibilities with Oracle Data Mining (ODM) - Part A - Direct Database Request (Direct SQL)

Oracle Data Mining (ODM), is a component of the Oracle Advanced Analytics Option for Oracle Database 11g Enterprise Edition. ODM provides a collection of in-database data mining algorithms that solve a wide range of business problems. It provides the necessary mining model building, testing and scoring capabilities. Because of data, models, and results remain in Oracle Database, data movement is eliminated, security is maximized, and information latency is minimized.
The purpose of this blog is to introduce oracle data miner, list down the different data mining algorithm supported by oracle data miner and possibilities of combining OBIEE and ODM. The focus of the first part is to explain the integration of ODM by using Direct Database Request option of OBIEE.
Oracle Data Mining (ODM) Architecture 
Oracle Data Mining (ODM) has two components a server and one or more clients.
·         Oracle Database 11g (11.2) Enterprise Edition Server
·         Oracle Data Miner, the client, is an integrated feature of Oracle SQL Developer

In addition to the database, Data Miner requires the installation of a Data Miner Repository account. The repository is a separate account in the database named ODMRSYS. This repository is shared by all user accounts in the database that have been granted the appropriate privileges to use the Data Miner Repository.
Full Set of Mining Algorithms
Oracle Data Mining provides support for a wide range of data mining functionality.


OBIEE Integration Opportunities with ODM
Executing the algorithm/data mining model is just beginning. Integration of data mining model execution result with the business data is vital.

As the data mining model execution results are immediately available through SQL and integrated in Oracle Database, data mining model execution results can be immediately used across the Oracle stack - unleashing many more opportunities to leverage your results in spontaneous and unexpected ways.

There are number of options by which we can integrate the data mining model execution results into the business. I would like to focus on its integration with Oracle Business Intelligence in general and Direct SQL in particular within this blog entry. Oracle Business Intelligence can leverage data results, combining the data mining model execution results with other data.
The following are four possibilities by which Oracle Statistical Functions can be integrated with OBIEE.

Direct Database Request ( Direct SQL)  
The OBIEE Direct Database Request option allows executing SQL statement which uses the oracle data mining algorithm directly against a data source. The Oracle BI Server sends unprocessed, user-entered, physical SQL directly to chosen database. The returned results can be rendered in Oracle Presentation Services, which can be treated as an Oracle BI Analyze.

Use Evaluate Function
The EVALUATE function within the report definition allows calling the database analytic/data mining functions, result tables by passing the necessary parameters. The data mining PL/SQL’s and functions in Oracle database can be invoked from OBIEE using the EVALUATE function. The function would be executed as part of the report generation SQL against the chosen database.

Integrate in Common Enterprise Information Model
This is the standard metadata design method by which all required database objects and data mining result tables and output columns are exposed to physical layer, then modeled within logical layer of OBIEE and then exposed to presentation layer.

Direct Interactions with Opaque Views by Using Session Variables
This is an option where one can define Opaque View (view within the OBIEE physical layer) which uses the data mining function and result tables within the SQL definition of opaque view. The session variables are used as substitute variables within the SQL definition. The value of session variables can be set by user selected prompt to drive the opaque view output. This allows direct interactions with the statistical function and open opportunities to carry out what if analysis.

ODM- OBIEE Integration Example – Database Direct Request ( Direct SQL)
This section attempts to demonstrate oracle data mining integration with OBIEE which uses the direct database request feature of OBIEE 10g/11g.

SQL with Prediction Verbs

Oracle Data mining supports verbs via SQL such as PREDICT, DETECT, CLUSTER, CLASSIFY, REGRESS, PROFILE, IDENTIFY FACTORS and ASSOCIATE.

SQL which uses the prediction verb is as follows


The SQL output comprises of the customer id, country name, customer marital status and prediction based on sample regression model.

Direct Database Request

Choose New > Analysis > Choose Create Direct Database Request.


Enter Connection Pool – Sample Relational Connection, this is connection pool defined within repository. Enter the SQL statement with prediction clause as shown below.


Analysis Output

The analysis output for the entered SQL statement with column outputs as customer id, country name, customer marital status and prediction is shown below.


Conclusion
The data mining results can be immediately used by database direct request option of OBIEE which can combine these outputs in business data with ease.

The returned results can be rendered in Oracle Presentation Services, which can be treated as an Oracle BI Analyze. We can leverage on OBIEE visuals such as graph, geo, OSSM, Google etc to render the data mining output.

The Oracle Advanced Analytics Option, a combination of Oracle Data Mining and Oracle R Enterprise, delivers predictive analytics, data mining, text mining, statistical analysis, advanced numerical computations and interactive graphics inside the database. This brings powerful computations to the database resulting in dramatic improvements in information discovery, scalability, security, and savings.

Setting up Oracle Data Miner 11g Release 2 by Using SQL Developer 3.1

Oracle Data Mining (ODM), is a component of the Oracle Advanced Analytics Option for Oracle Database 11g Enterprise Edition. ODM provides a collection of in-database data mining algorithms that solve a wide range of business problems. It provides the necessary mining model building, testing and scoring capabilities. Because of data, models, and results remain in Oracle Database, data movement is eliminated, security is maximized, and information latency is minimized.
The purpose of this blog is to explain various steps involved in setting up Oracle Data Miner 11g Release 2 by using SQL Developer 3.1.  
Pre-requisites for Setting up Oracle Data Miner
The following configuration elements and information is required for setting up Oracle Data Miner.
Oracle Database 11g Enterprise Edition, Release 2 (11.2.0.1) with following services which are critical in support of Oracle Data Miner
  • Oracle Data Mining- ODM provides the necessary mining model building, testing and scoring capabilities
  • Oracle XML DB- Provides services to manage the Data Miner Repository metadata, such as the details of the workflow specifications
  • Oracle Scheduler- Provides the engine of mining the Data Miner workflows
  • Oracle Text - Provides services necessary to support text mining
Oracle SQL Developer Version 3.1 (or Later)
Database Information for SYS user to perform the configuration work for the setup – such as password, database port number, SID value or Service Name
Steps to setup the Oracle Data Miner
For setting up the Oracle Data Miner, user needs to perform following steps
  • SQL Developer 3.1 ( or Later) Install
  • Set up Data Miner by using SQL Developer
Installing SQL Developer
Oracle SQL Developer can be downloaded from Oracle Technology Network (OTN), it is freeware. To download Oracle SQL Developer 3.1 (or later), click on following link, after accepting agreement, choose SQL Developer download for appropriate operating system.



After downloading file, unzip the archive in the empty folder. There is no formal installation routine and there is no in-pace upgrade available for SQL Developer.
Oracle SQL Developer can be downloaded from Oracle Technology Network (OTN), it is freeware. To download Oracle SQL Developer 3.1 (or later), click on following link, after accepting agreement, choose SQL Developer download for chosen operating system.
Setting up Data Miner
Create SQL Developer Connection for SYS

This connection will be used to create the Data Miner user and install the Oracle Data Miner repository.

Create a database user account for data mining

Create a database account for data mining user, the user information as follows

Name of User – dmuser
Password – XXX
Default Tablespace –USERS
Temporary Tablespace –TEMP


Grant CONNECT Role to dmuser and then set Tablespace Quota for USERS Tablespace to Unlimited as shown, then click on APPLY to create dmuser with all selected parameters.


Creation of user confirmed.


Create a SQL Developer connection for the data mining user

Create connection for Data Miner user (dmuser)

 
Enable the Data Miner GUI and user

Display data miner tab similar to the normal connection tab by selecting Tool > Data Miner > Make Visible

On data miner tab, click on Add Connection and then select the dmuser (data mining user) from the list of connections available.

The result of execution of these steps a data miner user should appear under the Connection node in the Data Miner Tab.

 

Install the Oracle Data Miner Repository

The following automated process is used to create the data miner repository

Double-click the data minder user on the Data Miner tab

A message tells you that the Data Miner Repository is not installed in the database and asks whether you want to install the repository, click ‘Yes’ to launch the installation.

Enter the password for SYS user and click ‘OK’ to continue

Select the Default and Temporary table spaces for the ODMSYS user, this user is created as part of the repository installation process. Click ‘OK’ to continue.


Start Install Data Mining Repository 


Once repository installation completion confirms that the Oracle Data Miner setup is finished. The data miner interface is ready for use.


Tuesday, 26 June 2012

Oracle Business Intelligence 11.1.1.6.0 and Oracle Statistical Functions – A Perfect Recipe to Add Value to BI Offering

Statistical Functions included in Oracle 11g Database is a compelling array of statistical functions accessible from through SQL. These include descriptive statistics, hypothesis testing, correlations analysis, test for distribution fit, cross tabs with Chi-square statistics, and analysis of variance (ANOVA).

The purpose of this blog is to discuss how oracle statistical functions can add a value to BI Offerings to the customer and end users. The idea is to demonstrate how oracle statistical functions can be used to deliver something extra via OBIEE front end.

What is Statistics?
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities. Statistics are also used for making informed decisions – and misused for other reasons – in all areas of business and government.
Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics.
In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is called inferential statistics.
Both descriptive and inferential statistics comprise applied statistics.
In-database Statistical Function
Unlike statistical software vendors, which charge separate license fees for these statistical capabilities, Oracle 11g Database includes this wide feature set with the database license. The statistical function within database can also eliminate the movement and staging of data to external systems to perform statistical analysis.

Eliminating data movement and staging increases security since copies of the data need not reside in insecure environment.

Eliminating data movement and staging decreases time to results since the overhead associated with moving the data is eliminated.

Because the statistical results are immediately available through SQL and integrated in Oracle Database, statistical results can be immediately used across many applications or reporting solutions.

Statistical Functions Usage
The statistical functions in the database can be used in a variety of ways, for example, users can call Oracle's DBMS_STAT_FUNCS to obtain basic count, mean, max, min and standard deviation information for their dataset; or users can determine the strength of relationships using hypothesis testing statistics such as a t-test, f-test or ANOVA. Users are able to not only complete a wide range of statistics, but also include these results in more advanced SQL queries and analytical pipelines.

 Oracle Statistics & SQL Analytics
Oracle supports following in-database statistics.
Descriptive Statistics
Average, standard deviation, variance, min, max, median (via percentile_count), mode, group-by & roll-up

DBMS_STAT_FUNCS:  summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values

Correlations Analysis (parametric and nonparametric)
Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric).

Cross Tabulation
Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa

Hypothesis Testing
Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA

Distribution Fitting
Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential

Pareto Analysis (documented)
80:20 rule, cumulative results table

Ranking Functions
Rank, dense_rank, cume_dist, percent_rank, ntile

Window Aggregate functions (moving and cumulative)
Avg, sum, min, max, count, variance, stddev, first_value, last_value

LAG/LEAD Functions
Direct inter-row reference using offsets

Reporting Aggregate Functions
Sum, avg, min, max, variance, stddev, count, ratio_to_report

Statistical Aggregates
Correlation, linear regression family, covariance

Linear Regression
Fitting of an ordinary-least-squares regression line to a set of number pairs.
Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions.

OBIEE Integration Opportunities
Computing statistics is just beginning. Integration of result with the business data is vital.  As the statistical results are immediately available through SQL and integrated in Oracle Database, statistical results can be immediately used across the Oracle stack - unleashing many more opportunities to leverage your results in spontaneous and unexpected ways.

There are number of options by which we can integrate the statistical function into the business. I would like to focus on its integration with Oracle Business Intelligence. Oracle Business Intelligence can leverage results, combining the statistical results with other data.
The following are four possibilities by which Oracle Statistical Functions can be integrated with OBIEE.

Direct SQL

The OBIEE Direct Database Request option allows executing SQL statement which uses the oracle statistical function directly against a data source. The Oracle BI Server sends unprocessed, user-entered, physical SQL directly to chosen database. The returned results can be rendered in Oracle Presentation Services, which can be treated as an Oracle BI Analyze.

Evaluate Function

The EVALUATE function within the report definition allows calling the database analytic functions by passing the necessary parameters. The statistical function in Oracle database can be invoked from OBIEE using the EVALUATE function. The function would be executed as part of the report generation SQL against the chosen database.

Integrate in Common Enterprise Information Model

This is the standard metadata design method by which all required database objects and statistical functions and output columns are exposed to physical layer, then modeled within logical layer of OBIEE and then exposed to presentation layer.

Direct Interactions with Opaque Views by Using Session Variables

This is an option where one can define Opaque View (view within the OBIEE physical layer) which uses the statistical function within the SQL definition of opaque view. The session variables are used as substitute variables within the SQL definition. The value of session variables can be set by user selected prompt to drive the opaque view output. This allows direct interactions with the statistical function and open opportunities to carry out what if analysis.
   
Statistical Function – OBIEE Integration Example
This section attempts to demonstrate using analytical/statistical function within the report definition by using EVALUATE function.

EVALUATE Function

This function passes the specified database function that is not supported by Oracle BI Server with optional referenced columns as parameters to the back-end data source for evaluation.
The embedded database function may require one or more columns. These columns are referenced by %1 ... %N within the function. The actual columns must be listed after the function.

Syntax 
EVALUATE('db_function(%1...%N)' [AS data_type] [, column1, columnN])

Example
SELECT EVALUATE ('instr(%1, %2)', address, 'Foster City') FROM employees

Note – The ability to use EVALUATE is disabled by default. To enable support for this function, change the EVALUATE_SUPPORT_LEVEL parameter in NQSConfig.INI.

Example OBIEE Analysis

This example uses the database function LAG to compute % Difference from previous row. The database analytic function PERCENT_RANK computes the percent rank value and CUME_DIST computes the cumulative distribution.

An analysis is designed with following columns



The column definitions are as follows.

Organization: -  "Offices"."D3  Organization"
Revenue: - "Base Facts"."1- Revenue"
Product: - "Products"."P1 Product"

% Diff - Previous row :- ("Base Facts"."1- Revenue"- CAST(EVALUATE ('LAG (%1,1,0) OVER
(PARTITION BY %2 ORDER BY %1) '  ,"Base Facts"."1- Revenue" , "Offices"."D3  Organization") AS DOUBLE))/ CAST(EVALUATE ('LAG (%1,1,0) OVER (PARTITION BY %2 ORDER BY %1) '  ,"Base Facts"."1- Revenue" , "Offices"."D3  Organization") AS DOUBLE) * 100

Cumulative Dist :- EVALUATE('ROUND(CUME_DIST () OVER (PARTITION BY %1 ORDER BY %2),3)', "Offices"."D3  Organization", "Base Facts"."1- Revenue")

Percent Rank: - CAST(EVALUATE (' PERCENT_RANK() OVER (PARTITION BY %1 ORDER BY %2)', "Offices"."D3  Organization","Base Facts"."1- Revenue") AS DOUBLE) * 100

Analysis Output

The analysis output for the selected columns and calculated columns with EVALUATE function for invoking the statistical function LAG (), CUME_DIST () and PERCENT_RANK () is shown below.



Conclusion
The statistical results can be immediately used within Oracle Business Intelligence which unleashes number of opportunities to combine these outputs in business data. This can add a real value in your BI Offerings.

The Oracle Advanced Analytics Option, a combination of Oracle Data Mining and Oracle R Enterprise, delivers predictive analytics, data mining, text mining, statistical analysis, advanced numerical computations and interactive graphics inside the database. This brings powerful computations to the database resulting in dramatic improvements in information discovery, scalability, security, and savings.

Monday, 25 June 2012

Oracle Data Mining (ODM) Architecture

Oracle Data Mining (ODM), is a component of the Oracle Advanced Analytics Option for Oracle Database 11g Enterprise Edition. ODM provides a collection of in-database data mining algorithms that solve a wide range of business problems. It provides the necessary mining model building, testing and scoring capabilities. Because of data, models, and results remain in Oracle Database, data movement is eliminated, security is maximized, and information latency is minimized.
The purpose of this blog is to discuss architecture of Oracle Data Mining in-database and its individual components.  
Oracle Data Mining Architecture
Oracle Data Mining (ODM) has two components a server and one or more clients.
·         Oracle Database 11g (11.2) Enterprise Edition Server
·         Oracle Data Miner, the client, is an integrated feature of Oracle SQL



In addition to the database, Data Miner requires the installation of a Data Miner Repository account. The repository is a separate account in the database named ODMRSYS. This repository is shared by all user accounts in the database that have been granted the appropriate privileges to use the Data Miner Repository.
Oracle Database Enterprise Edition includes following services which are critical in support of Oracle Data Miner:
Oracle Data Mining :-Provides the necessary mining model building, testing and scoring capabilities
Oracle XML DB :-Provides services to manage the Data Miner Repository metadata,m such as the detais of the workflow specifications
Oracle Scheduler :-Provides the engine of mining the Data Miner workflows
Oracle Text :-Provides services necessary to support text mining
Oracle Data Miner Client
Oracle Data Miner client embedded in SQL Developer 3.1 + screensnapshot is as below.

Oracle Applications and ODM
Following applications are powered by Oracle Data Mining