Thursday, 9 January 2014

ODI Standalone Agent – Where to Install? Is it on Target Database? On Source Systems? Or on Dedicated Server?

ODI is an ELT product; hence no middle-tier server is required. Everything runs in the databases, and all the operations can be orchestrated by a very lightweight agent. So in short there is no need to have a dedicated ETL server, then question is to where to install ODI standalone agent. Should it be on Target? Should it be on Source System? Should it be on File Server? What are implications of Firewall?  Is JEE version on Web Logic is a little bit overkill?

I have came across a tech note on Oracle Supports which talks about ODI standalone Agent Installation, this blog is an attempt to discuss or share the tech note content with some commentary based on my real experiences.

General Rule of Thumb

For data integration environment, source systems are not ideal as they could be dispersed throughout the information system. Dedicated systems could work, but if they are independent of your ETL jobs, then it is dependent on physical resources that may not be tightly coupled with processes.  So in short installing the agent on the target systems makes sense.

In particular if you are talking of a data warehousing environment, where most of the staging of data will already occur on the target system. So in general installation of ODI agent on target seems a better option. But in the end, “target” is a convenience, not an all be all. So rather than accepting this as an absolute truth, we will look into how the agent works and from there provide a more detailed answer to this question.

Agent Connectivity and Impact on Location of Agent

The agent is supposed to perform up to following 3 tasks for a process to run

Connect to the repository (always)
As agent uses JDBC to connect to the repository, the agent does not have to be on the same machine as the repository. The amount of data exchanged with the repository is limited to logs generation and updates only.  In short the agent can be installed on pretty much any system that can physically connect to the proper database ports to access the repository

The only recommendation is that the agent should be on the same LAN as the repository.

Connect to the sources and targets (always)
As long as the agent is sending DDLs and DMLs to the databases, it does not have to be physically installed on any of the systems that host the databases. However, the location of the agent must be strategically selected so that it can connect to all databases, sources and targets.

Provide JDBC access to the data (if needed)
ODI processes can use multiple techniques to extract from and load data into sources and targets:JDBC is one of these techniques. If the processes executed by the agent use JDBC to move data from source to target, then the agent itself establishes this connection: as a result the data will physically flow through the agent.

This is a case we have to pay more attention to the agent location, data flows through the agent. Placing the agent on either the source server or the target server will in effect limit the number of network hops required for the data.

In addition following things would be important to consider while deciding on location of ODI standalone agent.

Accessing files, scripts, utilities
It is actually quite common to have the ODI agent installed on a file server (along with the database loading utilities) so that it can have local access to the files. This is easier than trying to share directories across the network (and more efficient), in particular if you are dealing with disparate operating systems.

Big Data
In a Hadoop environment, execution requests are submitted to a NameNode. This Name node is then in charge of distributing the execution across all Data Nodes that are deployed and operational. It would be totally counter-productive for the ODI agent to try and bypass the NameNode. From that perspective, the agent would have to be installed on the NameNode.

Firewall Configuration
One element that seems pretty obvious is that no matter where you place your agents, you have to make sure that the firewalls in your corporation will let you access the necessary resources. More challenging can be the timeouts that some firewalls (or even servers in the case of iSeries) will have. It is very important to adjust the firewall configuration to deal with the disconnect issue of the ODI agents.

In short there are different scenarios of data integration project where ODI agent location preference differ one from another. I believe it should be good combination of JEE and stand alone ODI agent to achieve data integration objective. It is good that it is very lighter piece of software; no server is required to carry out the data extraction and load process.

No comments:

Post a Comment