Monday, 25 June 2012

Oracle Data Mining (ODM) - In-database Data Mining Advantages over Traditional Data Mining

Oracle Data Mining (ODM), is a component of the Oracle Advanced Analytics Option for Oracle Database 11g Enterprise Edition. ODM provides a collection of in-database data mining algorithms that solve a wide range of business problems. It provides the necessary mining model building, testing and scoring capabilities. Because of data, models, and results remain in Oracle Database, data movement is eliminated, security is maximized, and information latency is minimized.
The purpose of this blog is to discuss the oracle data mining in-database data mining options in general and list down the advantages of using in-database data mining rather than traditional data mining algorithm and tools in market place.
Data Mining Phases

Any typical data mining project has following six phases.

Business Understanding
The first phase focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives
Data Understanding
The second phase starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data or to detect interesting subsets to form hypotheses for hidden information.
Data Preparation
The third step which usually takes 90% of time covers all activities to construct the final dataset from the initial raw data. Data preparation tasks are likely to be performed multiple times and not in any prescribed order. Tasks include table, record and attribute selection as well as transformation and cleaning of data for modeling tools.
This phase involves the collection, assessment, consolidation, cleaning, data selection and transformation activities.
Data Modeling
During this phase various modeling techniques are selected and applied and their parameters are calibrated to optimal values. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often necessary.
This phase thoroughly evaluate the model and review the steps executed to construct the model to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached
The knowledge gained will need to be organized and presented in a way that the customer can use it. However, depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise.
In - Database Mining Vs Traditional Data Mining
The data preparation step involves collection, assessment, consolidation, cleaning, data selection and transformation activities. Most data must be cleansed, filtered, normalized, sampled, and transformed in various ways before it can be mined. Often, a significant part of the effort (almost 90%) in a data mining project is devoted to data preparation.
In contrast with traditional data mining model/tools with Oracle Data Mining, everything occurs in the Oracle database- in a single, secure, scalable platform for advanced business intelligence. Oracle Data Mining (in-database mining) eliminates data movement and duplication and minimizes latency time from raw data to valuable information.

Oracle data mining also supports automatic and embedded data transformation which can significantly reduce the time and efforts involved in developing a data mining model. With Automatic Data Preparation (ADP) mode, the model itself transforms the build data according to the requirements of the algorithm and it embeds the transformation instructions in the model and reuses those instructions whenever the model is applied. ADP treatments include binning, normalization, outlier treatment and missing value handling.
In short the in-database mining has number of advantage over the traditional data mining modeling. The some of the advantages for in-database mining are listed with appropriate details below.
Advantages of In-Database Mining 
No data movement – Some data mining products require that the data be exported from corporate database and converted to a specialized format for mining. With ODM, no data movement or conversion is needed. This makes the entire mining process less complex, less time consuming and less error prone.
Security – Data is protected by the extensive security mechanisms of Oracle Database. Moreover, specific database privileges can score (apply) mining models.
Data preparation and administration – Most data must be cleansed, filtered, normalized, sampled and transformed in various ways before it can be mined. Up to 80% of the efforts in a data mining project is often devoted to data preparation. ODM can automatically manage key steps in the data preparation process. Additionally, Oracle Database provides extensive administrative tools for preparing and managing data.
Ease of data refresh –Mining processes within Oracle Database have ready access to refreshed data. ODM can easily deliver mining results based on current data, thereby maximizing its timeliness and relevance.
Oracle Database analytics –Oracle Database offers many features for advanced analytics and business intelligence. ODM can easily be integrated with other analytical features of the database, such as statistical analysis and OLAP.
Domain environment –Data mining models have to be built, tested, validated, managed, and deployed in their appropriate application domain environments. Data mining results may need to be post-processed as a part of domain-specific computations (for example, calculating estimated risks and response probabilities) and then stored in permanent repositories or data warehouses. With ODM, the pre- and post-mining activities can all be accomplished with the same environment.
Application programming interfaces –PL/SQL API and SQL language operators provide direct access to ODM functionality in Oracle Database.
Oracle Data Mining (ODM) in-database data mining option to the Oracle Database EE help users mine large data and define, save and share advanced analytical methodologies. The SQL interface and Java APIs for data mining algorithm is a plus which would be useful for developer to build applications to automate knowledge discovery. Being extensive user of data mining algorithms in past I am excited with the oracle’s in-database mining options and believe this would be more beneficial than the traditional data mining tools.
I hope this is one of the many articles on data mining, I would like to write many more on data mining in general , ODM-OBIEE integration in particular in future.

1 comment: