ODI is an ELT product; hence no middle-tier
server is required. Everything runs in the databases, and all the operations
can be orchestrated by a very lightweight agent. So in short there is no need
to have a dedicated ETL server, then question is to where to install ODI
standalone agent. Should it be on Target? Should it be on Source System? Should
it be on File Server? What are implications of Firewall? Is JEE version on Web Logic is a little bit overkill?
I have came across a tech note on Oracle
Supports which talks about ODI standalone Agent Installation, this blog is an
attempt to discuss or share the tech note content with some commentary based on
my real experiences.
General Rule of Thumb
For data integration environment, source
systems are not ideal as they could be dispersed throughout the information
system. Dedicated systems could work, but if they are independent of your ETL
jobs, then it is dependent on physical resources that may not be tightly
coupled with processes. So in short
installing the agent on the target systems makes sense.
In particular if you are talking of a data
warehousing environment, where most of the staging of data will already occur
on the target system. So in general installation of ODI agent on target seems a
better option. But in the end, “target” is a convenience, not an all be all. So
rather than accepting this as an absolute truth, we will look into how the
agent works and from there provide a more detailed answer to this question.
Agent Connectivity and Impact on Location of Agent
The agent is supposed to perform up to
following 3 tasks for a process to run
Connect to the repository (always)
As agent uses JDBC to connect to the
repository, the agent does not have to be on the same machine as the
repository. The amount of data exchanged with the repository is limited to logs
generation and updates only. In short
the agent can be installed on pretty much any system that can physically
connect to the proper database ports to access the repository
The only recommendation is that the agent
should be on the same LAN as the repository.
Connect to the sources and targets (always)
As long as the agent is sending DDLs and DMLs
to the databases, it does not have to be physically installed on any of the
systems that host the databases. However, the location of the agent must be
strategically selected so that it can connect to all databases, sources and
targets.
Provide JDBC access to the data (if needed)
ODI processes can use multiple techniques
to extract from and load data into sources and targets:JDBC is one of these
techniques. If the processes executed by the agent use JDBC to move data from source
to target, then the agent itself establishes this connection: as a result the
data will physically flow through the agent.
This is a case we have to pay more attention
to the agent location, data flows through the agent. Placing the agent on
either the source server or the target server will in effect limit the number
of network hops required for the data.
In addition following things would be
important to consider while deciding on location of ODI standalone agent.
Accessing files, scripts, utilities
It is actually quite common to have the ODI
agent installed on a file server (along with the database loading utilities) so
that it can have local access to the files. This is easier than trying to share
directories across the network (and more efficient), in particular if you are
dealing with disparate operating systems.
Big Data
In a Hadoop environment, execution requests
are submitted to a NameNode. This Name node is then in charge of distributing
the execution across all Data Nodes that are deployed and operational. It would
be totally counter-productive for the ODI agent to try and bypass the NameNode.
From that perspective, the agent would have to be installed on the NameNode.
Firewall Configuration
One element that seems pretty obvious is
that no matter where you place your agents, you have to make sure that the
firewalls in your corporation will let you access the necessary resources. More
challenging can be the timeouts that some firewalls (or even servers in the
case of iSeries) will have. It is very important to adjust the firewall
configuration to deal with the disconnect issue of the ODI agents.
In short there are different scenarios of
data integration project where ODI agent location preference differ one from
another. I believe it should be good combination of JEE and stand alone ODI
agent to achieve data integration objective. It is good that it is very lighter
piece of software; no server is required to carry out the data extraction and
load process.
No comments:
Post a Comment