Friday, 24 May 2013

Data Integration Functional Capabilities and Oracle Data Integration Tool Sets

Being in data warehouse and data integration field for many years I always found it difficult to get a complete set of data integration tools as a solution which goes hand in hand. There was always a demand to have service orientated architecture supported a data integration platform to support data movements across heterogeneous sources and targets, a complete data cleansing solution, data profiling, data quality and also offers the real time change data capture and replication to support high availability. Is it too much to expect?   Or It is a market requirement which is driving the excellence in data integration solutions across board.

This blog is an attempt to discuss various project drivers for data integration, what are Oracle Data Integration offerings and its market position on various evaluation factors against the Market Leaders. The information included in this blog is loosely linked with a report from a leading market research company.  

Data Integration Scenarios

The following are the data integration scenarios which data integration platforms to support

Data acquisition for business intelligence (BI) and data warehousing: Extracting data from one or more operational systems, transforming and merging that data, and delivering it to integrated data structures for analytics purposes. BI and data warehousing remain primary source of the demand for data integration tools.

Consolidation & Delivery of master data in support of master data management (MDM): Enabling the consolidation and rationalization of the data representing critical business entities, such as customers, products and employees. Data integration tools can be used to build the data consolidation and synchronization processes.

Data migrations/conversions: Data integration tools are increasingly addressing the data movement and transformation challenges inherent in the replacement of legacy applications and consolidation efforts during mergers and acquisitions.

Synchronization of data between operational applications: Data integration tools provide the ability to ensure database-level consistency across applications, both on an internal and an inter-enterprise basis for software-as-a-service [SaaS] or cloud-resident data sources and in a bidirectional or unidirectional manner.

Interenterprise data sharing: Organizations are increasingly required to provide data to, and receive data from, external trading partners (customers, suppliers, business partners and others). Data integration tools are relevant in addressing these challenges, which often consist of the same types of data access, transformation and movement components found in other common use cases.

Delivery of data services in an SOA context: An architectural technique, rather than a use of data integration itself, data services represent an emerging trend for the role and implementation of data integration capabilities within service-oriented architectures (SOAs). Data integration tools will increasingly enable the delivery of many types of data services.

Functional Capabilities Wish List from Data Integration Platforms

Data Source & Target Support

The data integration platform should to interact with structural, semi-structural and non-structural data types of data structure

·         Relational database, Legacy and non-relational database, file formats ( .txt,.csv ) , XML
·         CRM and SC Management Packaged Applications
·         SaaS and cloud-based applications and sources
·         Industry-standard message formats such as electronic data interchange (EDI), Swift and
·         Health Level Seven International (HL7)
·         HDFS and NoSQL repositories
·         Message queues – Middleware and Standards based products e.g. JMS
·         Emergent data types of a less structured nature, such as email, websites, office productivity tools and content repositories

Mode of Interactions

The data integration platform should support following interactions

·         Bulk acquisition and delivery
·         Granular trickle-feed acquisition and delivery
·         Changed data capture (CDC) — the ability to identify and extract modified data
·         Event-based acquisition (time-based or data-value-based)

Data Delivery

The data integration platform should to provide data to consuming applications, processes and databases in a variety of modes, including:

·         Physical bulk data movement between data repositories
·         Federated views formulated in memory
·         Message-oriented movement via encapsulation
·         Replication of data between homogeneous or heterogeneous database management systems (DBMSs) and schemas

In addition, support for the delivery of data across the range of latency requirements is important, including:

·         Scheduled batch delivery
·         Streaming/near-real-time delivery
·         Event-driven delivery of data based on identification of a relevant event

Data Transformation

The data integration built-in capabilities for achieving data transformation operations of varying complexity, including

·         Basic transformations, such as data type conversions, string manipulations and simple calculations

·         Intermediate-complexity transformations, such as lookup and replace operations, aggregations, summarizations, deterministic matching, and the management of slowly changing dimensions

·         Complex transformations, such as sophisticated parsing operations on free-form text and rich media

·         In addition, the tools must provide facilities for developing custom transformations and extending packaged transformations.
Metadata Management and Data Modelling

The data integration should support metadata management and data modelling requirements as below

·         Automated discovery and acquisition of metadata from data sources, applications and other tools
·         Data model creation and maintenance
·         Physical to logical model mapping and rationalization
·         Defining model-to-model relationships via graphical attribute-level mapping
·         Lineage and impact analysis reporting, via graphical and tabular format
·         An open metadata repository, with the ability to share metadata bi-directionally with other tools
·         Automated synchronization of metadata across multiple instances of the tools
·         Ability to extend the metadata repository with customer-defined metadata attributes and relationships
·         Documentation of project/program delivery definitions and design principles in support of requirements definition activities
·         Business analyst/end-user interface to view and work with metadata

Design and Development Environment

The data integration platform should facilitate for enabling the specification and construction of data integration processes such as

·         Graphical representation of repository objects, data models and data flows
·         Workflow management for the development process, addressing requirements such as approvals and promotions
·         Granular, role-based and developer-based security
·         Team-based development capabilities, such as version control and collaboration
·         Functionality to support reuse across developers and projects, and to facilitate the identification of redundancies
·         Support for testing and debugging

Data Governance

The data integration platform should support mechanisms to help the understanding and assurance of data quality over time along with interoperability

·         Data profiling tools
·         Data Mining tools
·         Data quality tools

Deployment Options and Runtime Platform

Breadth of support for the hardware and operating systems on which data integration processes may be deployed, and the choices of delivery model; specifically:

·         Mainframe environments, such as IBM z/OS and z/Linux
·         Midrange environments, such as IBM System i (formerly AS/400) or HP Tandem
·         Unix-based environments
·         Windows environments
·         Linux environments
·         Traditional on-premises (at the customer site) installation and deployment of software
·         Hosted off-premises software deployment (SaaS model)
·         Server virtualization (support for shared, virtualized implementations)
·         Parallel distributed processing (such as Hadoop, MapReduce)

Operations and Administration

The data integration toolset should facilities for enabling adequate ongoing support, management, monitoring and control of the processes implemented via tools such as

·         Error-handling functionality, both predefined and customizable
·         The monitoring and control of runtime processes, both via functionality in the tools and interoperability with other IT operations technologies
·         The collection of runtime statistics to determine use and efficiency, as well as an application style interface for visualization and evaluation
·         Security controls, for both data "in flight" and administrator processes
·         A runtime architecture that ensures performance and scalability

Architecture and Integration

The data integration components should have degree of commonality, consistency and interoperability between them such as follows

·         A minimal number of products (ideally one) supporting all data delivery modes
·         Common metadata (a single repository) and/or the ability to share metadata across all components and data delivery modes
·         A common design environment to support all data delivery modes
·         The ability to switch seamlessly and transparently between delivery modes (bulk/batch vs. granular real-time vs. federation) with minimal rework
·         Interoperability with other integration tools and applications, via certified interfaces and robust APIs
·         Efficient support for all data delivery modes, regardless of runtime architecture type (centralized server engine versus distributed runtime)

Service Enablement (SOA Enabled)

The data integration platform or toolset must exhibit service-oriented characteristics and provide support for SOA deployments

·         The ability to deploy all aspects of runtime functionality as data services
·         Management of publication and testing of data services
·         Interaction with service repositories and registries
·         Service enablement of development and administration environments, so that external tools and applications can dynamically modify and control the runtime behaviour of the tools

Oracle Data Integration Platform

The Oracle Data Integration Platform schematic is shown below.

Does it satisfy the functional capabilities wish list? We will discuss in next blog. I hope you find this shared information useful. 

1 comment: