Informatica Reference

Archive for the ‘Performance Tuning’ Category

Lookup transformations are used to lookup a set of values in another table.Lookups slows down the performance.

1. To improve performance, cache the lookup tables. Informatica can cache all the lookup and reference tables; this makes operations run very fast. (Meaning of cache is given in point 2 of this section and the procedure for determining the optimum cache size is given at the end of this document.)

2. Even after caching, the performance can be further improved by minimizing the size of the lookup cache. Reduce the number of cached rows by using a sql override with a restriction.

Cache: Cache stores data in memory so that Informatica does not have to read the table each time it is referenced. This reduces the time taken by the process to a large extent. Cache is automatically generated by Informatica depending on the marked lookup ports or by a user defined sql query.

Example for caching by a user defined query: –

Suppose we need to lookup records where employee_id=eno.

‘employee_id’ is from the lookup table, EMPLOYEE_TABLE and ‘eno’ is the

input that comes from the from the source table, SUPPORT_TABLE.

We put the following sql query override in Lookup Transform

select employee_id from EMPLOYEE_TABLE

If there are 50,000 employee_id, then size of the lookup cache will be 50,000.

Instead of the above query, we put the following:-

select emp employee_id from EMPLOYEE_TABLE e, SUPPORT_TABLE s

where e. employee_id=s.eno’

If there are 1000 eno, then the size of the lookup cache will be only 1000.But here the performance gain will happen only if the number of records in SUPPORT_TABLE is not huge. Our concern is to make the size of the cache as less as possible.

3. In lookup tables, delete all unused columns and keep only the fields that are used in the mapping.

4. If possible, replace lookups by joiner transformation or single source qualifier.Joiner transformation takes more time than source qualifier transformation.

5. If lookup transformation specifies several conditions, then place conditions that use equality operator ‘=’ first in the conditions that appear in the conditions tab.

6. In the sql override query of the lookup table, there will be an ORDER BY clause. Remove it if not needed or put fewer column names in the ORDER BY list.

7. Do not use caching in the following cases: –

-Source is small and lookup table is large.

-If lookup is done on the primary key of the lookup table.

8. Cache the lookup table columns definitely in the following case: –

-If lookup table is small and source is large.

9. If lookup data is static, use persistent cache. Persistent caches help to save and reuse cache files. If several sessions in the same job use the same lookup table, then using persistent cache will help the sessions to reuse cache files. In case of static lookups, cache files will be built from memory cache instead of from the database, which will improve the performance.

10. If source is huge and lookup table is also huge, then also use persistent cache.

11. If target table is the lookup table, then use dynamic cache. The Informatica server updates the lookup cache as it passes rows to the target.

12. Use only the lookups you want in the mapping. Too many lookups inside a mapping will slow down the session.

13. If lookup table has a lot of data, then it will take too long to cache or fit in memory. So move those fields to source qualifier and then join with the main table.

14. If there are several lookups with the same data set, then share the caches.

15. If we are going to return only 1 row, then use unconnected lookup.

16. All data are read into cache in the order the fields are listed in lookup ports. If we have an index that is even partially in this order, the loading of these lookups can be speeded up.

17. If the table that we use for look up has an index (or if we have privilege to add index to the table in the database, do so), then the performance would increase both for cached and un cached lookups.

Refer : http://informaticatutorials-naveen.blogspot.com/2011/04/performance-tuning-of-lookup.html

Advertisements
  1. If the source is a flat file, ensure that the flat file is local to the Informatica server. If source is a relational table, then try not to use synonyms or aliases.
  2. If the source is a flat file, reduce the number of bytes (By default it is 1024 bytes per line) the Informatica reads per line. If we do this, we can decrease the Line Sequential Buffer Length setting of the session properties.
  3. If possible, give a conditional query in the source qualifier so that the records are filtered off as soon as possible in the process.
  4. In the source qualifier, if the query has ORDER BY or GROUP BY, then create an index on the source table and order by the index field of the source table.

PERFORMANCE TUNING OF TARGETS

If the target is a flat file, ensure that the flat file is local to the Informatica server. If target is a relational table, then try not to use synonyms or aliases.

  1. Use bulk load whenever possible.
  2. Increase the commit level.
  3. Drop constraints and indexes of the table before loading.

PERFORMANCE TUNING OF MAPPINGS

Mapping helps to channel the flow of data from source to target with all the transformations in between. Mapping is the skeleton of Informatica loading process.

  1. Avoid executing major sql queries from mapplets or mappings.
  2. Use optimized queries when we are using them.
  3. Reduce the number of transformations in the mapping. Active transformations like rank, joiner, filter, aggregator etc should be used as less as possible.
  4. Remove all the unnecessary links between the transformations from mapping.
  5. If a single mapping contains many targets, then dividing them into separate mappings can improve performance.
  6. If we need to use a single source more than once in a mapping, then keep only one source and source qualifier in the mapping. Then create different data flows as required into different targets or same target.
  7. If a session joins many source tables in one source qualifier, then an optimizing query will improve performance.
  8. In the sql query that Informatica generates, ORDERBY will be present. Remove the ORDER BY clause if not needed or at least reduce the number of column names in that list. For better performance it is best to order by the index field of that table.
  9. Combine the mappings that use same set of source data.
  10. On a mapping, field with the same information should be given the same type and length throughout the mapping. Otherwise time will be spent on field conversions.
  11. Instead of doing complex calculation in query, use an expression transformer and do the calculation in the mapping.
  12. If data is passing through multiple staging areas, removing the staging area will increase performance.
  13. Stored procedures reduce performance. Try to keep the stored procedures simple in the mappings.
  14. Unnecessary data type conversions should be avoided since the data type conversions impact performance.
  15. Transformation errors result in performance degradation. Try running the mapping after removing all transformations. If it is taking significantly less time than with the transformations, then we have to fine-tune the transformation.
  16. Keep database interactions as less as possible.

PERFORMANCE TUNING OF SESSIONS

A session specifies the location from where the data is to be taken, where the transformations are done and where the data is to be loaded. It has various properties that help us to schedule and run the job in the way we want.

  1. Partition the session: This creates many connections to the source and target, and loads data in parallel pipelines. Each pipeline will be independent of the other. But the performance of the session will not improve if the number of records is less. Also the performance will not improve if it does updates and deletes. So session partitioning should be used only if the volume of data is huge and the job is mainly insertion of data.
  2. Run the sessions in parallel rather than serial to gain time, if they are independent of each other.
  3. Drop constraints and indexes before we run session. Rebuild them after the session run completes. Dropping can be done in pre session script and Rebuilding in post session script. But if data is too much, dropping indexes and then rebuilding them etc. will be not possible. In such cases, stage all data, pre-create the index, use a transportable table space and then load into database.
  4. Use bulk loading, external loading etc. Bulk loading can be used only if the table does not have an index.
  5. In a session we have options to ‘Treat rows as ‘Data Driven, Insert, Update and Delete’. If update strategies are used, then we have to keep it as ‘Data Driven’. But when the session does only insertion of rows into target table, it has to be kept as ‘Insert’ to improve performance.
  6. Increase the database commit level (The point at which the Informatica server is set to commit data to the target table. For e.g. commit level can be set at every every 50,000 records)
  7. By avoiding built in functions as much as possible, we can improve the performance. E.g. For concatenation, the operator ‘||’ is faster than the function CONCAT (). So use operators instead of functions, where possible. The functions like IS_SPACES (), IS_NUMBER (), IFF (), DECODE () etc. reduce the performance to a big extent in this order. Preference should be in the opposite order.
  8. String functions like substring, ltrim, and rtrim reduce the performance. In the sources, use delimited strings in case the source flat files or use varchar data type.
  9. Manipulating high precision data types will slow down Informatica server. So disable ‘high precision’.
  10. Localize all source and target tables, stored procedures, views, sequences etc. Try not to connect across synonyms. Synonyms and aliases slow down the performance.

DATABASE OPTIMISATION

To gain the best Informatica performance, the database tables, stored procedures and queries used in Informatica should be tuned well.

  1. If the source and target are flat files, then they should be present in the system in which the Informatica server is present.
  2. Increase the network packet size.
  3. The performance of the Informatica server is related to network connections.Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections.
  4. Optimize target databases.

Refer : http://informaticatutorials-naveen.blogspot.com/2011/04/optimizing-bottlenecks.html

Performance of Informatica is dependant on the performance of its several components like database, network, transformations, mappings, sessions etc. To tune the performance of Informatica, we have to identify the bottleneck first.

Bottleneck may be present in source, target, transformations, mapping, session,database or network. It is best to identify performance issue in components in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project.

Identify bottleneck in Source

If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of the mapping.In original case, without the test filter, total time taken is as follows:-

Total Time = time taken by (source + transformations + target load)

Now because of filter, Total Time = time taken by source

So if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck.

Identify bottleneck in Target

If the target is a relational table, then substitute it with a flat file and run the session. If the time taken now is very much less than the time taken for the session to load to table, then the target table is the bottleneck.

Identify bottleneck in Transformation

Remove the transformation from the mapping and run it. Note the time taken.Then put the transformation back and run the mapping again. If the time taken now is significantly more than previous time, then the transformation is the bottleneck.

But removal of transformation for testing can be a pain for the developer since that might require further changes for the session to get into the ‘working mode’.

So we can put filter with the FALSE condition just after the transformation and run the session. If the session run takes equal time with and without this test filter,then transformation is the bottleneck.

Identify bottleneck in sessions

We can use the session log to identify whether the source, target or transformations are the performance bottleneck. Session logs contain thread summary records like the following:-

MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_test_all_text_data] has completed: Total Run Time =[11.703201] secs, Total Idle Time = [9.560945] secs, Busy Percentage =[18.304876].

MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.764368] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].

If busy percentage is 100, then that part is the bottleneck.

Basically we have to rely on thread statistics to identify the cause of performance issues. Once the ‘Collect Performance Data’ option (In session ‘Properties’ tab) is enabled, all the performance related information would appear in the log created by the session.

Refer : http://informaticatutorials-naveen.blogspot.com/2011/04/identification-of-bottlenecks.html

The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune session performance, first identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance.

If you tune all the bottlenecks, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.

Because determining the best way to improve performance can be complex, change one variable at a time, and time the session both before and after the change. If session performance does not improve, you might want to return to the original configuration.

Complete the following tasks to improve session performance:

  1. Optimize the target. Enables the Integration Service to write to the targets efficiently.
  2. Optimize the source. Enables the Integration Service to read source data efficiently.
  3. Optimize the mapping. Enables the Integration Service to transform and move data efficiently.
  4. Optimize the transformation. Enables the Integration Service to process transformations in a mapping efficiently.
  5. Optimize the session. Enables the Integration Service to run the session more quickly.
  6. Optimize the grid deployments. Enables the Integration Service to run on a grid with optimal performance.
  7. Optimize the Power Center components. Enables the Integration Service and Repository Service to function optimally.
  8. Optimize the system. Enables Power Center service processes to run more quickly.

Refer : http://informaticatutorials-naveen.blogspot.com/2011/04/performance-tuning-overview.html