One of the key issues facing organizations in the this big
data era is integrating data stored in current operational and data warehouse IT
systems (such as structured data in relational data warehouses) with data
collected in Hadoop, Oracle NoSQL Database and other non-relational systems.
Oracle Big Data Connectors is a
software suite that integrates big data across the enterprise – from OLTP
systems to analytic data warehouses. The suite leverages emerging standards in
Apache Hadoop combined with intuitive extensions to existing infrastructure to
greatly simplify the development of big data applications.
This blog is an attempt to discuss
different Oracle big data connector’s offerings which bridges the
non-structured/semi-structural data with the rest of structural data of the
Oracle ecosystem.
Oracle Big Data Connectors
The list of the Oracle Big Data Connector which facilitate rapid
integration of big data (non-structured and semi-structured) with Oracle
Database (structured)
- Oracle SQL Connector for Hadoop Distributed File System
- Oracle Loader for Hadoop
- Oracle Data Integrator Application Adapter for Hadoop
- Oracle R Connector for Hadoop
- Oracle XQuery Connector for Hadoop
These connectors come with Big Data Appliance but can be
licensed for Hadoop running on non-Oracle hardware.
Oracle SQL Connector for Hadoop Distributed File
System
Oracle SQL Connector for Hadoop
Distributed File System (OSCH) is Oracle Direct Connector for HDFS for
accessing data in HDFS directly from Oracle Database. This enables SQL data
access on the Hive tables, delimited text files and Oracle Loader data pump
files in HDFS. One can query or import data from HDFS as needed by the
application.
A schematic representation of
OSCH connector with HDFS is shown below.
Some of the important features
are as follows
Automatic Table Creation
It automatically creates an
OSCH-enabled Oracle Database external table that references files stored in
HDFS.
Direct Query Access
Users simply query the external
table as they would any other table in Oracle Database. Full SQL access allows
users to join data in HDFS with data stored in Oracle Database, apply analytic
functions, load data into tables, and more.
Parallel Query
Parallel queries with automatic
load balancing enable fast data access; the degree of parallelism is controlled
easily by setting a parameter.
Oracle Loader for Hadoop
Oracle Loader for Hadoop is a
MapReduce utility used to optimize data loading from Hadoop into Oracle
Database. Oracle Loader for Hadoop sorts, partitions, and converts data into
Oracle Database formats in Hadoop, then loads the converted data into the
database. Oracle Loader for Hadoop
minimizes data skew managing data processing across reducers. Preprocessing
data on the Hadoop cluster dramatically reduces CPU and IO utilization on the
database. The presorting data allows a faster index creation and compression
after data is loaded into the database.
Some of the important features
are as follows
Online and Offline Load Option
It supports online load as
reducer nodes connect to the database for load using JDBC or direct path load
options, while for offline load option reducer write Oracle Data Pump binary
files or delimited text files for loading into the database
Supports Multiple Input
Formats
It supports multiple input
formats such as delimited text files, regular expressions, Oracle NoSQL
Database, Avro, Hive tables or custom inputs
Load Balancing
It offloads data pre-processing
from the database server to Hadoop and also distributes work evenly to all
reducers
Oracle Data Integrator Application Adapter for Hadoop
Oracle Data Integrator (ODI)
Application Adapter for Hadoop provides native Hadoop integration within ODI. Specific
ODI Knowledge Modules can be used to build Hadoop metadata within ODI, load
data into Hadoop, transform data within Hadoop, and load data into Oracle
Database using Oracle Loader for Hadoop and Oracle SQL Connector for HDFS.
ODI offers a graphical user
interface to create complex java Map Reduce code programs. The ODI Application
Adapter for Hadoop, ODI generates HivsQL which in turn generates native Map Reduce
programs that are executed on the Hadoop cluster. Once the data is processed
and organized on the Hadoop cluster, ODI loads the data directly into Oracle
Database using Oracle Loader for Hadoop or Oracle SQL Connector for HDFS.
Some of the important features
are as follows
Optimized for Developer
Productivity
It simplifies creation of Hadoop
and MapReduce code to boost productivity by
Native Integration
It integrates big data
heterogeneously via industry standards such as Hadoop, MapReduce, Hive, NoSQL, and
HDFS
Unification of
Integration Tooling
It unifies integration
tooling across unstructured/semi-structured and structured data
Oracle R Connector for Hadoop
Oracle R Connector for Hadoop
(ORCH) is an R package that provides transparent access to Hadoop and data
stored in HDFS. It offers users of the open-source statistical environment R
the ability to analyze data stored in HDFS, and to run R models efficiently
against large volumes of data leveraging Map Reduce processing without
requiring R users to learn yet another API or language. End users can leverage
Oracle-provided, highly scalable analytic techniques that are native to Hadoop
– in addition to over 3500 open source R packages to analyze data stored in
HDFS.
ORCH enables R scripts to run on
data in Hive tables and files in HDFS – seamlessly leveraging the Map Reduce
framework. Hadoop-based R programs can be deployed on a Hadoop cluster without
needing to know Hadoop internals, command line interfaces, or IT
infrastructure. ORCH can optionally be used with the Oracle Advanced Analytics
Option for Oracle Database. Oracle Advanced Analytics Option enables R users to
transparently work with database resident data without having to learn SQL or
database concepts; R computations execute directly in Oracle Database.
Some of the important features
are as follows
Interactive R Access to HDFS
It allows manipulate and explore and
move data in/from HDFS using R Function
R Integration with Hadoop
It leverage map-reduce
programming paradigm in the familiar context of R without having to learn
Hadoop concepts, it offers native support for Hive tables
Oracle XQuery Connector for Hadoop
OXH
is transformation engine for Big Data. Oracle XQuery is executed on the Map
Reduce framework. Oracle XQuery for Hadoop (OHX) has integration with Oracle
Loader for Hadoop, Oracle noSQL DB and no (XSD) schema.
Oracle Big Data Connectors enables productive analysis of
all data in the enterprise, harnessing massive volumes of both structured and
unstructured data to deliver critical insights. I believe this a best way to integrate the non-structural and structural data.
I hope you find this information useful.