Monday, 4 November 2013

Big Data and In-database Analytics



The term Big Data refers to high variety, high volumes, high complexity, high velocity forms of data. Big Data typically is unstructured. Big Data may come in the forms of web logs, social media logs, email, equipment sensors, machine generated data such GPS or smart meter readings, videos, photographs, etc.
One of the biggest challenges is to integrate this unstructured data to enterprise wide structured data stored in current operational and data warehouse IT systems. So idea is to extract and load big data (non-structured/semi-structural data) into database alongside rest of enterprise structural data. The next step is try to find a way to analyze the data for making important business decisions to grow in new areas to better understand the information from acquired data. There are number of analytical applications available in the market place. Most of the analytical applications require transforming data and moving it back and forth between database and application.

In-database analytics allows the data processing to be conducted with the database by building analytic logic into the database itself. This blog is an attempt to list down In-database analytical offerings by DBMS in general and Oracle in particular. 

In-database Analytics

In-database analytics is a technology that allows data processing to be conducted within the database by building analytic logic into the database itself. Doing so eliminates the time and effort required to transform data and move it back and forth between a database and a separate analytics application.
An in-database analytics system consists of an enterprise data warehouse and acquired big data built on an analytic database platform. Such platforms provide parallel processing, partitioning, scalability and optimization features geared toward analytic functionality. 

In-database analytics allows analytical data marts to be consolidated in the enterprise data warehouse. Data retrieval and analysis are much faster and corporate information is more secure because it doesn’t leave the database.

This approach is useful for helping companies make better predictions about future business risks and opportunities identify trends, and spot anomalies to make informed decisions more efficiently and affordably.

Companies use in-database analytics for applications requiring intensive processing – for example, fraud detection, credit scoring, risk management, trend and pattern recognition, and balanced scorecard analysis. In-database analytics also facilitates ad hoc analysis, allowing business users to create reports that do not already exist or drill deeper into a static report to get details about accounts, transactions, or records.

Oracle In-database Analytics

Once big data and enterprise data loaded Oracle Database, end users can use number of easy-to-use tools for in-database, advanced analytics. Following are some of the in-database advanced analytics supported by Oracle  

Oracle R Enterprise - Oracle’s version of the widely used Project R statistical environment enables statisticians to use R on very large data sets without any modifications to the end user experience. Examples of R usage include predicting airline delays at a particular airports and the submission of clinical trial analysis and results.

In-Database Data Mining - the ability to create complex models and deploy these on very large data volumes to drive predictive analytics. End-users can leverage the results of these predictive models in their BI tools without the need to know how to build the models. For example, regression models can be used to predict customer age based on purchasing behavior and demographic data.

In-Database Text Mining -The ability to mine text from micro blogs, CRM system comment fields and review sites combining Oracle Text and Oracle Data Mining. An example of text mining is sentiment analysis based on comments. Sentiment analysis tries to show how customers feel about certain companies, products or activities.

In-Database Graph Analysis – the ability to create graphs and connections between various data points and data sets. Graph analysis creates, for example, networks of relationships determining the value of a customer’s circle of friends. When looking at customer churn customer value is based on the value of his network, rather than on just the value of the customer.
In-Database Spatial – the ability to add a spatial dimension to data and show data plotted on a map. This ability enables end users to understand geospatial relationships and trends much more efficiently. For example, spatial data can visualize a network of people and their geographical proximity. Customers who are in close proximity can readily influence each other’s purchasing behavior, an opportunity which can be easily missed if spatial visualization is left out.
In-Database MapReduce – the ability to write procedural logic and seamlessly leverage Oracle Database parallel execution. In-database MapReduce allows data scientists to create high-performance routines with complex logic. In-database MapReduce can be exposed via SQL. Examples of leveraging in-database MapReduce are sessionization of weblogs or organization of Call Details Records (CDRs).

Conclusion

Every one of the analytical components in Oracle Database is valuable. Combining these components creates even more value to the business. Leveraging SQL or a BI Tool to expose the results of these analytics to end users gives an organization an edge over others who do not leverage the full potential of analytics in Oracle Database.

No comments:

Post a Comment