In previous blog I have tried to explain the OBIA customization types and its impacts on various components of the solution, I have also covered database objects database objects modification standard practices and guidelines such as tables, columns and indexes in my second blog of the series.
You can find out the previous blogs at: link to blogs
Oracle Business Intelligence Application 7.9.6.3 Customization Types, Tips and Best Practice Guidelines – Part A
Oracle BI Apps 7.9.6.3 Tips and Best Practice Guidelines for Customization ( Database Objects)– Part B
In this blog my focus would be more on Informatica modification standard practices and guidelines such as mappings, map-lets and sessions. I would include ODI may be in future blog, the focus area of this blog is Informatica only.
Why Standards and Guidelines? –The standard and guidelines for Informatica folders, mappings, mapplets and sessions is for better manageability and simplify the future upgrades. In long term it would help to reduce the application day to day cost and successive upgrades.
Informatica Folder Structures
The Informatica repository is designed into mapping folders. There is a different folder for each source adapter which is prefixed with ‘SDE’. These represent the Source Dependant Extracts from the ERP system of your source system to the Business Analytics Warehouse staging tables.
There are 2 folders, ‘SIL_Vert’ and ‘SILOS’, that represent the Source Independent Loads ETL mappings from the staging to the Data Warehouse schema.
There is another folder ‘PLP’, which represents the Post Load Process ETL mappings which do aggregations, snapshots, cross subject area relationship updates and update flags etc after completion of Source Independent Loads ETL mappings.
OOTB folders related to E-Business Suite source
The following folders are used for the typical E-Business implementation project:
SDE_ORAR11510_Adaptor - Source Dependant Extracts from the E-Business Source System 11.5.1.0
SILOS - Source Independent Loads from staging to Data Warehouse schema.
PLP - Post Load Process for aggregations, snapshots etc
No change should be allowed to be done in the OOTB Folders. Treat this as a MUST requirement, any change should be carefully recorded (Best Practice ETL1)
Why? To maintain OOTB intact is very important which would help with better ETL content manageability and future upgrades. Folder may be overwritten in future upgrades. Careful recording of customization is useful for re-application if required.
Custom folders related to E-Business Suite source
So the best option is to create custom folders. The following custom folders are to be created for the typical E-Business implementation project:
Custom_SDE_ORAR11510_Adaptor - Source Dependant Extracts from the E-Business Source System 11.5.1.0 for custom process (modified from the existing one or newly designed)
Custom_SILOS - Source Independent Loads from staging to Data Warehouse schema for custom processes (modified from the existing one or newly designed)
Custom_PLP - Post Load Process for custom aggregations, snapshots etc (modified from the existing one or newly designed)
The naming conventions for folders defined in BI Data Warehouse should be followed if custom folders are created in the warehouse. (Best Practice ETL 2)
Informatica Mapping, Maplet, Session and Workflow
Customization of OOTB Components
The setting up session parameters such as commit interval etc is not in the scope of the customization best practices for abivous reasons (Best Practice ETL 3)
Mapping
Prefix the mapping with a custom identifier (e.g. customer identifier is ‘XYZ’), then folder type (“SDE_” or “SIL_”), and include a descriptive name.
For example, an ETL mapping to extract Invoice distribution fact from E-Business ERP, a name may be XYZ_SDE_ORA_APInvoiceDistributionFact
Don’t make any change in the mapping in OOTB folder, take a copy of the mapping and copy it to equivalent custom folder, and follow the naming convention as stated above (Best Practice ETL4)
Why? To protect mapping changes from future upgrade of BI Apps. Adding a custom identifier in the mapping name in the custom folder provides better manageability for migration of informatica metadata and upgrade.
Maplet
Prefix the maplet with a custom identifier (“XYZ”), the object type (“mplt_” = maplet), and include a descriptive name.
For example, a re-usable part of a mapping to load Party Organisation may be: XYZ_mplt_PartyOrgDimension
The maplet are copied when mapping is been copied to custom folder, no need to rename the existing maplet, but any change or customization in maplet should be done by following the naming convention suggested. (Best Practice ETL5)
Why? Any customization in maplet would be easily identified, as I believe most of the data load logic is within the maplet and there is a danger that any change in existing maplet would affect the functioning of other mapping as maplet is most reusable component.
Sessions
Prefix the session with a custom identifier (“XYZ”), the folder type (“SDE_” or “SIL_”) and include a descriptive name.
For example, running a session to extract Invoice distribution fact from E-Business ERP may be: XYZ_SDE_ORA_APInvoiceDistributionFact
Create session based on the modified mapping in Workflow Manager. I would prefer to create multiple sessions from the same mapping if required e.g. incremental load and full load (Best Practice ETL6)
Why? Session allows overriding the SQL Override if I would like to change the data load extraction logic from incremental load to full load or vice versa
Workflows
Prefix the Workflow with a custom identifier (“XYZ”), the folder type (“SDE_” or “SIL_”) and include a descriptive name.
For example, running a workflow to extract Invoice distribution fact from E-Business ERP may be: XYZ_SDE_ORA_APInvoiceDistributionFact
Create workflow which uses the session within based on the modified mapping in Workflow Manager. I would prefer to create multiple workflows from the same mapping if required e.g. incremental load and full load (Best Practice ETL7)
Why? Workflow is a fantastic option to manage dependency of data load, can access various operations such as sql, shell script and obvious Informatica sessions Design once and reuse often is the Mantra.
New Custom Components
Note: Based on my implementation experiences I believe the most of the customer really don’t follow the customization practices for OOTB mappings/sessions. Copying the OOTB mapping into the Custom Folder and modifying it within the custom folder is very essential and I would go easy if someone is not using the naming convention suggested for OOTB mapping customization.
The naming convention for new custom components is a must requirement. The best practices mentioned for OOTB custom components are also applicable for new custom components.
Mapping
Prefix the mapping with a custom identifier (e.g. customer identifier is ‘XYZ’), then folder type (“SDE_” or “SIL_”), and include a descriptive name.
For example, an ETL mapping to create a post load process to create invoice on hold snapshot be: XYZ_PLP_InvoiceOnHoldSnapshot
Maplet
Prefix the maplet with a custom identifier (e.g. customer identifier is ‘XYZ’), the object type (“mplt_” = maplet), and include a descriptive name.
For example, a re-usable part of a mapping to load Meeting List may be:
XYZ_mplt_MeetingListDimension
Sessions
Prefix the Workflow with a custom identifier (e.g. customer identifier is ‘XYZ’), the folder type (“SDE_” or “SIL_”) and include a descriptive name.
For example, an workflow to create a post load process to create invoice on hold snapshot be: XYZ_PLP_InvoiceOnHoldSnapshot
Workflows
Prefix the Workflow with a custom identifier (e.g. customer identifier is ‘XYZ’), the folder type (“SDE_” or “SIL_”) and include a descriptive name.
For example, a workflow to create a post load process to create invoice on hold snapshot be: XYZ_PLP_InvoiceOnHoldSnapshot
Transformation
Use Change Capture functionality for incremental updates it is different for different applications, the rule is to load delta rather than full load everytime.(Best Practice ETL8)
Why? Incremental data load make a perfect sense, I would recommend implementing incremental load for PLP/Aggregate data load also. I know almost 5/6 BI Apps implementations where full load is done every night, it is faster than incremental load. In reality the partial full load for certain areas e.g. Assignment/Positions etc is a reality at many places.
After creating a dimension in Informatica, try to create a reusable expression to handel surrogate key resolution (Best Practice ETL9)
Why? Reusability is the basic principal, treat Transformation, Maplet etc as a function in PL/SQL terms. Design once and reuse whenever possible.
Create a reusable lookup transformation LKP_W_dimension_D. It should be defined as the following:
Moreover, the mandatory conditions are:
o INTEGRATION_ID=IN_INTEGRATION_ID
o DATASOURCE_NUM_ID=IN_ DATASOURCE_NUM_ID
And the following conditions are only added if the dimension is specified as a SCD Type 2, 3 or 6:
o EFFECTIVE_FROM_DT<=IN_DATE
o EFFECTIVE_TO_DT>IN_DATE
Lookup SQL Override of Lookup should be commented to avoid order by clause in query and make sure it is cacheable if referenced multiple times within the different mappings (Best Practice ETL10)
Why? Removal of Order by Clause will improve the performance of lookup query and making it cacheable will improve performance of overall load process.
Setting up the cache for lookup queries is a double edge sword, make sure appropriate analysis of storage space allocated to ETL Server is must before setting up the caching in place. Also make sure the purging of caching is in place to avoid data refresh issues.
Use lookup transformation based on ETL_PROC_WID is essential for restar ability (Best Practice ETL11)
Why? To make sure that data will not be reloaded after session failure
Ensure that all INNER/OUTER joins performed in Source Qualifiers are correctly defined; resolve one-to-many relationship by using lookup instead of using SQL override (Best Practice ETL12)
Why? The inner joins always limit the dataset loaded into the DW database while outer join may end up with unnecessary data in DW, it is essential that the required and right data is loaded, use RIGHT join
E.g. For multiple assignment records use primary assignment condition within lookup rather than using it within SQL Override.
Note- I believe BI Apps Informatica data extraction logic is mostly defined within SQL Override followed by few transformation, lookup and update strategy, the OOTB is not really using any of the functionality of Informatica as Integration Platform can offer. I believe it is under utilization of ETL tool.
Non Indexes Join Paths should be replaced with Informatica lookups and convert non-index-able filters to index-able ones. Review all joins in SQL Override (Best Practice ETL13)
Why? A flag column filter is better than using name not like ‘%SP%’ where clause, data extraction (SDE) time should be minimum to avoid any adverse impacts on source system.
Miscellaneous Informatica for BI Apps Best Practices
As a best practice, you should comment custom code you have introduced. Comments should include at least the developer's name and the date the code was added.(Best Practice ETL14)
Why? It is useful for traceability and accountability. Developer name, code and reason (for example requirement/bug) would be useful for continuous support to reporting solution.
Any modification to a component must be checked-in and commented once the fix is validated (Best Practice ETL15).
Why? Be a good developer.
Add documentation to identify an integration_id column (MD070, Database comments…(Best Practice ETL16)
Why? The documentation to support development is useful.
The flag 'Fail parent if this task fails' should be checked for all the sessions within a workflow(Best Practice ETL17).
The flag 'Fail parent if this task does not run' should be checked for all the sessions within a workflow(Best Practice ETL18).
Why? It is a standard Informatica development practice.
The Stop on Errors parameter should be set to 1 for all sessions within a workflow. This parameter is located in the Error handling area of the Config Object tab in Informatica PowerCenter Designer(Best Practice ETL19).
Why? It is helpful to stop data processing on occurrence of first error.
For all entities, such as dimension and fact tables, create two workflows, one to be used for a full load and the other to be used for an incremental load. Both workflows are based on the same mapping. The same mapping gets executed during both full and incremental loads. This provides an opportunity to tune each of these load scenarios(Best Practice ETL20).
Why? A separate handler for tuning each of the load scenarios is a absolutely must.
Custom change capture img suffix : The image suffix should be three characters long. The recommended naming convention is to start with C. For example, use C01, C02 and so on (Best Practice ETL21).
Input fields of a mapplet are prefixed by ‘INP_’. Output fields of a mapplet are prefixed by ‘EXT_’ (Best Practice ETL22)
Why? This is for Siebel Application based BI Apps ( BI CRM).
The surrogate keys related to day date calendar (non-fiscal) are specified as NUMBER. The mask applied is YYYYMMDD where YYYY = Year, MM = Month #, DD = Day #. The following example for a meeting start date MEETING_START_DT_WID would be the result of the transformation TO_NUMBER(TO_CHAR(MEETING_START_DT,’YYYYMMDD’)) (Best Practice ETL23).
Why? A special treatment for date to date WID conversion as Date Dimension is referenced by WID from Fact tables.
I hope this blog Oracle BI Apps 7.9.6.3 Tips and Best Practice Guidelines for Customization (Informatica) is good enough to answer few of your questions/queries. I would appreciate your feeback/comments if any.
I hope this blog Oracle BI Apps 7.9.6.3 Tips and Best Practice Guidelines for Customization (Informatica) is good enough to answer few of your questions/queries. I would appreciate your feeback/comments if any.
No comments:
Post a Comment