Monday, 4 November 2013

Big Data and In-database Analytics

The term Big Data refers to high variety, high volumes, high complexity, high velocity forms of data. Big Data typically is unstructured. Big Data may come in the forms of web logs, social media logs, email, equipment sensors, machine generated data such GPS or smart meter readings, videos, photographs, etc.
One of the biggest challenges is to integrate this unstructured data to enterprise wide structured data stored in current operational and data warehouse IT systems. So idea is to extract and load big data (non-structured/semi-structural data) into database alongside rest of enterprise structural data. The next step is try to find a way to analyze the data for making important business decisions to grow in new areas to better understand the information from acquired data. There are number of analytical applications available in the market place. Most of the analytical applications require transforming data and moving it back and forth between database and application.

In-database analytics allows the data processing to be conducted with the database by building analytic logic into the database itself. This blog is an attempt to list down In-database analytical offerings by DBMS in general and Oracle in particular. 

In-database Analytics

In-database analytics is a technology that allows data processing to be conducted within the database by building analytic logic into the database itself. Doing so eliminates the time and effort required to transform data and move it back and forth between a database and a separate analytics application.
An in-database analytics system consists of an enterprise data warehouse and acquired big data built on an analytic database platform. Such platforms provide parallel processing, partitioning, scalability and optimization features geared toward analytic functionality. 

In-database analytics allows analytical data marts to be consolidated in the enterprise data warehouse. Data retrieval and analysis are much faster and corporate information is more secure because it doesn’t leave the database.

This approach is useful for helping companies make better predictions about future business risks and opportunities identify trends, and spot anomalies to make informed decisions more efficiently and affordably.

Companies use in-database analytics for applications requiring intensive processing – for example, fraud detection, credit scoring, risk management, trend and pattern recognition, and balanced scorecard analysis. In-database analytics also facilitates ad hoc analysis, allowing business users to create reports that do not already exist or drill deeper into a static report to get details about accounts, transactions, or records.

Oracle In-database Analytics

Once big data and enterprise data loaded Oracle Database, end users can use number of easy-to-use tools for in-database, advanced analytics. Following are some of the in-database advanced analytics supported by Oracle  

Oracle R Enterprise - Oracle’s version of the widely used Project R statistical environment enables statisticians to use R on very large data sets without any modifications to the end user experience. Examples of R usage include predicting airline delays at a particular airports and the submission of clinical trial analysis and results.

In-Database Data Mining - the ability to create complex models and deploy these on very large data volumes to drive predictive analytics. End-users can leverage the results of these predictive models in their BI tools without the need to know how to build the models. For example, regression models can be used to predict customer age based on purchasing behavior and demographic data.

In-Database Text Mining -The ability to mine text from micro blogs, CRM system comment fields and review sites combining Oracle Text and Oracle Data Mining. An example of text mining is sentiment analysis based on comments. Sentiment analysis tries to show how customers feel about certain companies, products or activities.

In-Database Graph Analysis – the ability to create graphs and connections between various data points and data sets. Graph analysis creates, for example, networks of relationships determining the value of a customer’s circle of friends. When looking at customer churn customer value is based on the value of his network, rather than on just the value of the customer.
In-Database Spatial – the ability to add a spatial dimension to data and show data plotted on a map. This ability enables end users to understand geospatial relationships and trends much more efficiently. For example, spatial data can visualize a network of people and their geographical proximity. Customers who are in close proximity can readily influence each other’s purchasing behavior, an opportunity which can be easily missed if spatial visualization is left out.
In-Database MapReduce – the ability to write procedural logic and seamlessly leverage Oracle Database parallel execution. In-database MapReduce allows data scientists to create high-performance routines with complex logic. In-database MapReduce can be exposed via SQL. Examples of leveraging in-database MapReduce are sessionization of weblogs or organization of Call Details Records (CDRs).


Every one of the analytical components in Oracle Database is valuable. Combining these components creates even more value to the business. Leveraging SQL or a BI Tool to expose the results of these analytics to end users gives an organization an edge over others who do not leverage the full potential of analytics in Oracle Database.

Oracle Exalytics In Memory Machine – X3-4 Specifications

The specifications for X3-4 is as follows. 

Dimensions and Weight
Height: 129.9 mm (5.1 in.)
Width: 436.5 mm (17.2 in.)
Depth: 732.0 mm (28.8 in.)
Weight: 44 kg (97 lb.)

Four Intel Xeon E7-4800 series processors
Main Memory
Sixty Four 32 GB DDR3 ECC Registered DIMMs
Total Memory capacity of 2 TB

Four 10/100/1000Base-T on-board Ethernet ports
Two QDR (40Gb/s) InfiniBand ports
Two 10Gbps Ethernet ports based on Intel 82599 10Gbe controller
8Gb Dual-Port Fiber Channel PCI-Express Card

Standard I/O
One TIA/EIA-232-F asynchronous RJ-45 serial port
Five USB 2.0 ports (two front, two rear, one internal)
Two VGA 8 MB Ports; 1024 x 768 @ 60Hz (front and rear)

Internal Storage
Six 2.5” SAS-2 900GB 10,000 RPM SAS-2 HDDs
Disk Controller HBA with 512MB Battery Backed Write Cache
Six F40 400GB eMLC Flash PCIe

Remote Management
Oracle Integrated Lights Out Manager (ILOM)
One dedicated 10/100base-T Ethernet management port
One RJ-45 serial management port
Support for access via SSH 2.0, HTTPS, RADIUS, LDAP, and Microsoft Active Directory
Ability to monitor and report system and component status on all FRUs

Operating Systems
Oracle Enterprise Linux

Analytics Software
Oracle Business Intelligence Foundation Suite
Oracle TimesTen In-Memory Database for Exalytics

Operating temperature: 5°C to 35°C (41°F to 95°F) at sea level; 5°C to 31°C (41°F to 88°F) at altitude
Nonoperating temperature: -40°C to 68°C (-40°F to 154°F)
Operating relative humidity: 10%-90% relative humidity, noncondensing
Nonoperating relative humidity: 93% relative humidity, noncondensing,
Acoustic noise: LwAd: 8.9 B (idle and operating, room temp.), 8.9 B (max. ambient); LpAm: 76.9 dBA (bystander position, max. ambient)

Rated Line Voltage: 200 - 240 VAC
Rated Input Current: 12A at 200 VAC
Maximum power usage: 1467W

Safety: IEC 60950, UL/CSA 60950, EN60950, CB Scheme with all country differences
RFI/EMI: FCC CFR 47 Part 15 Class A, EN 55022 Class A, EN 61000-3-2, EN 61000-3-3, EN 300 386
Immunity: EN 55024, EN 300 386

Other: Complies with WEEE Directive (2002/96/EC) and RoHS Directive (2002/95/EC)

Mounting Accessories and Cables
Toolless rackmounting slide rail kit
Cable management arm
Two 5m InfiniBand cables
I hope you find this copy-pasted information useful.