Wednesday, 30 October 2013

Oracle Big Data Connectors –HDFS to Oracle RDBMS Data Connectors

One of the key issues facing organizations in the this big data era is integrating data stored in current operational and data warehouse IT systems (such as structured data in relational data warehouses) with data collected in Hadoop, Oracle NoSQL Database and other non-relational systems.

Oracle Big Data Connectors is a software suite that integrates big data across the enterprise – from OLTP systems to analytic data warehouses. The suite leverages emerging standards in Apache Hadoop combined with intuitive extensions to existing infrastructure to greatly simplify the development of big data applications.

This blog is an attempt to discuss different Oracle big data connector’s offerings which bridges the non-structured/semi-structural data with the rest of structural data of the Oracle ecosystem.

Oracle Big Data Connectors

The list of the Oracle Big Data Connector which facilitate rapid integration of big data (non-structured and semi-structured) with Oracle Database (structured)

  • Oracle SQL Connector for Hadoop Distributed File System  
  • Oracle Loader for Hadoop
  • Oracle Data Integrator Application Adapter for Hadoop
  • Oracle R Connector for Hadoop
  • Oracle XQuery Connector for Hadoop

These connectors come with Big Data Appliance but can be licensed for Hadoop running on non-Oracle hardware.


Oracle SQL Connector for Hadoop Distributed File System

Oracle SQL Connector for Hadoop Distributed File System (OSCH) is Oracle Direct Connector for HDFS for accessing data in HDFS directly from Oracle Database. This enables SQL data access on the Hive tables, delimited text files and Oracle Loader data pump files in HDFS. One can query or import data from HDFS as needed by the application.

A schematic representation of OSCH connector with HDFS is shown below.



Some of the important features are as follows

Automatic Table Creation
It automatically creates an OSCH-enabled Oracle Database external table that references files stored in HDFS.

Direct Query Access
Users simply query the external table as they would any other table in Oracle Database. Full SQL access allows users to join data in HDFS with data stored in Oracle Database, apply analytic functions, load data into tables, and more.

Parallel Query
Parallel queries with automatic load balancing enable fast data access; the degree of parallelism is controlled easily by setting a parameter.


Oracle Loader for Hadoop

Oracle Loader for Hadoop is a MapReduce utility used to optimize data loading from Hadoop into Oracle Database. Oracle Loader for Hadoop sorts, partitions, and converts data into Oracle Database formats in Hadoop, then loads the converted data into the database.  Oracle Loader for Hadoop minimizes data skew managing data processing across reducers. Preprocessing data on the Hadoop cluster dramatically reduces CPU and IO utilization on the database. The presorting data allows a faster index creation and compression after data is loaded into the database.



Some of the important features are as follows 

Online and Offline Load Option
It supports online load as reducer nodes connect to the database for load using JDBC or direct path load options, while for offline load option reducer write Oracle Data Pump binary files or delimited text files for loading into the database

Supports Multiple Input Formats
It supports multiple input formats such as delimited text files, regular expressions, Oracle NoSQL Database, Avro, Hive tables or custom inputs

Load Balancing
It offloads data pre-processing from the database server to Hadoop and also distributes work evenly to all reducers

Oracle Data Integrator Application Adapter for Hadoop

Oracle Data Integrator (ODI) Application Adapter for Hadoop provides native Hadoop integration within ODI. Specific ODI Knowledge Modules can be used to build Hadoop metadata within ODI, load data into Hadoop, transform data within Hadoop, and load data into Oracle Database using Oracle Loader for Hadoop and Oracle SQL Connector for HDFS.

ODI offers a graphical user interface to create complex java Map Reduce code programs. The ODI Application Adapter for Hadoop, ODI generates HivsQL which in turn generates native Map Reduce programs that are executed on the Hadoop cluster. Once the data is processed and organized on the Hadoop cluster, ODI loads the data directly into Oracle Database using Oracle Loader for Hadoop or Oracle SQL Connector for HDFS.




Some of the important features are as follows

Optimized for Developer Productivity
It simplifies creation of Hadoop and MapReduce code to boost productivity by

Native Integration
It integrates big data heterogeneously via industry standards such as Hadoop, MapReduce, Hive, NoSQL, and HDFS

Unification of Integration Tooling
It unifies integration tooling across unstructured/semi-structured and structured data


Oracle R Connector for Hadoop

Oracle R Connector for Hadoop (ORCH) is an R package that provides transparent access to Hadoop and data stored in HDFS. It offers users of the open-source statistical environment R the ability to analyze data stored in HDFS, and to run R models efficiently against large volumes of data leveraging Map Reduce processing without requiring R users to learn yet another API or language. End users can leverage Oracle-provided, highly scalable analytic techniques that are native to Hadoop – in addition to over 3500 open source R packages to analyze data stored in HDFS.

ORCH enables R scripts to run on data in Hive tables and files in HDFS – seamlessly leveraging the Map Reduce framework. Hadoop-based R programs can be deployed on a Hadoop cluster without needing to know Hadoop internals, command line interfaces, or IT infrastructure. ORCH can optionally be used with the Oracle Advanced Analytics Option for Oracle Database. Oracle Advanced Analytics Option enables R users to transparently work with database resident data without having to learn SQL or database concepts; R computations execute directly in Oracle Database.



Some of the important features are as follows

Interactive R Access to HDFS
It allows manipulate and explore and move data in/from HDFS using R Function

R Integration with Hadoop
It leverage map-reduce programming paradigm in the familiar context of R without having to learn Hadoop concepts, it offers native support for Hive tables

Oracle XQuery Connector for Hadoop

OXH is transformation engine for Big Data. Oracle XQuery is executed on the Map Reduce framework. Oracle XQuery for Hadoop (OHX) has integration with Oracle Loader for Hadoop, Oracle noSQL DB and no (XSD) schema.


Inference 

Oracle Big Data Connectors enables productive analysis of all data in the enterprise, harnessing massive volumes of both structured and unstructured data to deliver critical insights. I believe this a best way to integrate the non-structural and structural data.

I hope you find this information useful.

No comments:

Post a Comment