Informatica Reference

Archive for the ‘Advanced Concepts’ Category

You can push transformation logic to the source or target database using pushdown optimization. When you run a session configured for pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source or target database executes the SQL queries to process the transformations.

The amount of transformation logic you can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database.

Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization.

The following figure shows a mapping containing transformation logic that can be pushed to the source database:

clip_image002

This mapping contains an Expression transformation that creates an item ID based on the store number 5419 and the item ID from the source. To push the transformation logic to the database, the Integration Service generates the following SQL statement:

INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT CAST((CASE WHEN 5419 IS NULL THEN ” ELSE 5419 END) + ‘_’ + (CASE WHEN ITEMS.ITEM_ID IS NULL THEN ” ELSE ITEMS.ITEM_ID END) AS INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2 ITEMS

The Integration Service generates an INSERT SELECT statement to retrieve the ID, name, and description values from the source table, create new item IDs, and insert the values into the ITEM_ID, ITEM_NAME, and ITEM_DESC columns in the target table. It concatenates the store number 5419, an underscore, and the original ITEM ID to get the new item ID.

Pushdown Optimization Types

You can configure the following types of pushdown optimization:

  • Source-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the source database.
  • Target-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the target database.
  • Full pushdown optimization. The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization.

Running Source-Side Pushdown Optimization Sessions

When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database.

The Integration Service generates and executes a SELECT statement based on the transformation logic for each transformation it can push to the database. Then, it reads the results of this SQL query and processes the remaining transformations.

Running Target-Side Pushdown Optimization Sessions

When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the target database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQL on the Target database.

Running Full Pushdown Optimization Sessions

To use full pushdown optimization, the source and target databases must be in the same relational database management system. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. It generates and executes SQL statements against the source or target based on the transformation logic it can push to the database.

When you run a session with large quantities of data and full pushdown optimization, the database server must run a long transaction. Consider the following database performance issues when you generate a long transaction:

  • A long transaction uses more database resources.
  • A long transaction locks the database for longer periods of time. This reduces database concurrency and increases the likelihood of deadlock.
  • A long transaction increases the likelihood of an unexpected event. To minimize database performance issues for long transactions, consider using source-side or target-side pushdown optimization.

Working with Databases
 You can configure pushdown optimization for the following databases:

  • IBM DB2
  • Microsoft SQL Server
  • Netezza
  • Oracle
  • Sybase ASE
  • Teradata
  • Databases that use ODBC drivers

When you push transformation logic to a database, the database may produce different output than the Integration Service. In addition, the Integration Service can usually push more transformation logic to a database if you use a native driver, instead of an ODBC driver.

Comparing the Output of the Integration Service and Databases

The Integration Service and databases can produce different results when processing the same transformation logic. The Integration Service sometimes converts data to a different format when it reads data. The Integration Service and database may also handle null values, case sensitivity, and sort order differently.

The database and Integration Service produce different output when the following settings and conversions are different:

  • Nulls treated as the highest or lowest value. The Integration Service and a database can treat null values differently. For example, you want to push a Sorter transformation to an Oracle database. In the session, you configure nulls as the lowest value in the sort order. Oracle treats null values as the highest value in the sort order.
  • Sort order. The Integration Service and a database can use different sort orders. For example, you want to push the transformations in a session to a Microsoft SQL Server database, which is configured to use a sort order that is not case sensitive. You configure the session properties to use the binary sort order, which is case sensitive. The results differ based on whether the Integration Service or Microsoft SQL Server database process the transformation logic.
  • Case sensitivity. The Integration Service and a database can treat case sensitivity differently. For example, the Integration Service uses case sensitive queries and the database does not. A Filter transformation uses the following filter condition: IIF(col_varchar2 = ‘CA’, TRUE, FALSE). You need the database to return rows that match ‘CA.’ However, if you push this transformation logic to a Microsoft SQL Server database that is not case sensitive, it returns rows that match the values ‘Ca,’ ‘ca,’ ‘cA,’ and ‘CA.’
  • Numeric values converted to character values. The Integration Service and a database can convert the same numeric value to a character value in different formats. The database can convert numeric values to an unacceptable character format. For example, a table contains the number 1234567890. When the Integration Service converts the number to a character value, it inserts the characters ‘1234567890.’ However, a database might convert the number to ‘1.2E9.’ The two sets of characters represent the same value. However, if you require the characters in the format ‘1234567890,’ you can disable pushdown optimization.
  • Precision. The Integration Service and a database can have different precision for particular datatypes. Transformation datatypes use a default numeric precision that can vary from the native datatypes. For example, a transformation Decimal datatype has a precision of 1-28. The corresponding Teradata Decimal datatype has a precision of 1-18. The results can vary if the database uses a different precision than the Integration Service.

Rules and Guidelines for Functions in Pushdown Optimization

Use the following rules and guidelines when pushing functions to a database:

  • If you use ADD_TO_DATE in transformation logic to change days, hours, minutes, or seconds, you cannot push the function to a Teradata database.
  • When you push LAST_DAY () to Oracle, Oracle returns the date up to the second. If the input date contains sub seconds, Oracle trims the date to the second.
  • When you push LTRIM, RTRIM, or SOUNDEX to a database, the database treats the argument (‘ ‘) as NULL, but the Integration Service treats the argument (‘ ‘) as spaces.
  • An IBM DB2 database and the Integration Service produce different results for STDDEV and VARIANCE. IBM DB2 uses a different algorithm than other databases to calculate STDDEV and VARIANCE.
  • When you push SYSDATE or SYSTIMESTAMP to the database, the database server returns the timestamp in the time zone of the database server, not the Integration Service.
  • If you push SYSTIMESTAMP to an IBM DB2 or a Sybase database, and you specify the format for SYSTIMESTAMP, the database ignores the format and returns the complete time stamp.
  • You can push SYSTIMESTAMP (‘SS’) to a Netezza database, but not SYSTIMESTAMP (‘MS’) or SYSTIMESTAMP (‘US’).
  • When you push TO_CHAR (DATE) or TO_DATE () to Netezza, dates with sub second precision must be in the YYYY-MM-DD HH24: MI: SS.US format. If the format is different, the Integration Service does not push the function to Netezza.

Refer : http://informaticatutorials-naveen.blogspot.com/2011/04/push-down-optimisation.html

You can debug a valid mapping to gain troubleshooting information about data and error conditions. To debug a mapping, you configure and run the Debugger from within the Mapping Designer. The Debugger uses a session to run the mapping on the Integration Service. When you run the Debugger, it pauses at breakpoints and you can view and edit transformation output data.

You might want to run the Debugger in the following situations:

  • Before you run a session. After you save a mapping, you can run some initial tests with a debug session before you create and configure a session in the Workflow Manager.
  • After you run a session. If a session fails or if you receive unexpected results in the target, you can run the Debugger against the session. You might also want to run the Debugger against a session if you want to debug the mapping using the configured session properties.

Debugger Session Types:

You can select three different debugger session types when you configure the Debugger. The Debugger runs a workflow for each session type. You can choose from the following Debugger session types when you configure the Debugger:

  • Use an existing non-reusable session. The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs the non-reusable session and the existing workflow. The Debugger does not suspend on error.
  • Use an existing reusable session. The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs a debug instance of the reusable session And creates and runs a debug workflow for the session.
  • Create a debug session instance. You can configure source, target, and session configuration properties through the Debugger Wizard. When you run the Debugger, the Integration Service runs a debug instance of the debug workflow and creates and runs a debug workflow for the session.

clip_image002

Debug Process

To debug a mapping, complete the following steps:

1. Create breakpoints. Create breakpoints in a mapping where you want the Integration Service to evaluate data and error conditions.

2. Configure the Debugger. Use the Debugger Wizard to configure the Debugger for the mapping. Select the session type the Integration Service uses when it runs the Debugger. When you create a debug session, you configure a subset of session properties within the Debugger Wizard, such as source and target location. You can also choose to load or discard target data.

3. Run the Debugger. Run the Debugger from within the Mapping Designer. When you run the Debugger, the Designer connects to the Integration Service. The Integration Service initializes the Debugger and runs the debugging session and workflow. The Integration Service reads the breakpoints and pauses the Debugger

when the breakpoints evaluate to true.

4. Monitor the Debugger. While you run the Debugger, you can monitor the target data, transformation and mapplet output data, the debug log, and the session log. When you run the Debugger, the Designer displays the following windows:

  • Debug log. View messages from the Debugger.
  • Target window. View target data.
  • Instance window. View transformation data.

clip_image004

5. Modify data and breakpoints. When the Debugger pauses, you can modify data and see the effect on transformations, mapplets, and targets as the data moves through the pipeline. You can also modify breakpoint information.

The Designer saves mapping breakpoint and Debugger information in the workspace files. You can copy breakpoint information and the Debugger configuration to another mapping. If you want to run the Debugger from another Power Center Client machine, you can copy the breakpoint information and the Debugger configuration to the other Power Center Client machine.

Running the Debugger:

When you complete the Debugger Wizard, the Integration Service starts the session and initializes the Debugger. After initialization, the Debugger moves in and out of running and paused states based on breakpoints and commands that you issue from the Mapping Designer. The Debugger can be in one of the following states:

  •  Initializing. The Designer connects to the Integration Service.
  •  Running. The Integration Service processes the data.
  •  Paused. The Integration Service encounters a break and pauses the Debugger.

Note: To enable multiple users to debug the same mapping at the same time, each user must configure different port numbers in the Tools > Options > Debug tab.

The Debugger does not use the high availability functionality.

Monitoring the Debugger :

When you run the Debugger, you can monitor the following information:

  •  Session status. Monitor the status of the session.
  •  Data movement. Monitor data as it moves through transformations.
  •  Breakpoints. Monitor data that meets breakpoint conditions.
  •  Target data. Monitor target data on a row-by-row basis.

The Mapping Designer displays windows and debug indicators that help you monitor the session:

  • Debug indicators. Debug indicators on transformations help you follow breakpoints and data flow.
  • Instance window. When the Debugger pauses, you can view transformation data and row information in the Instance window.
  • Target window. View target data for each target in the mapping.
  • Output window. The Integration Service writes messages to the following tabs in the Output window:
  • Debugger tab. The debug log displays in the Debugger tab.
  • Session Log tab. The session log displays in the Session Log tab.
  • Notifications tab. Displays messages from the Repository Service.

While you monitor the Debugger, you might want to change the transformation output data to see the effect on subsequent transformations or targets in the data flow. You might also want to edit or add more breakpoint information to monitor the session more closely.

Restrictions

You cannot change data for the following output ports:

  • Normalizer transformation. Generated Keys and Generated Column ID ports.
  •  Rank transformation. RANKINDEX port.
  •  Router transformation. All output ports.
  •  Sequence Generator transformation. CURRVAL and NEXTVAL ports.
  •  Lookup transformation. NewLookupRow port for a Lookup transformation configured to use a dynamic cache.
  •  Custom transformation. Ports in output groups other than the current output group.
  •  Java transformation. Ports in output groups other than the current output group.

Additionally, you cannot change data associated with the following:

  • Mapplets that are not selected for debugging
  • Input or input/output ports
  • Output ports when the Debugger pauses on an error breakpoint

In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements:

  • Active source. Related target tables must have the same active source.
  • Key relationships. Target tables must have key relationships.
  • Target connection groups. Targets must be in one target connection group.
  • Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint based loading.

Active Source:

When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the Integration Service reverts to normal loading for both targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the Integration Service performs constraint-based loading: loading the primary key table first, then the foreign key table.

Key Relationships:

When target tables have no key relationships, the Integration Service does not perform constraint-based loading.

Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The Integration Service cannot enforce constraint-based loading for these tables. It reverts to a normal load.

Target Connection Groups:

The Integration Service enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow. To verify that all targets are in the same target connection group, complete the following tasks:

  • Verify all targets are in the same target load order group and receive data from the same active source.
  • Use the default partition properties and do not add partitions or partition points.
  • Define the same target type for all targets in the session properties.
  • Define the same database connection name for all targets in the session properties.
  • Choose normal mode for the target load type for all targets in the session properties.

Treat Rows as Insert:

Use constraint-based loading when the session option Treat Source Rows As is set to insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading.

When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping using one of the following options:

  • Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to load the primary table.
  • Perform inserts in one mapping and updates in another mapping.

Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Integration Service reads the sources in each target load order group in the mapping. A target load order group is a collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint based loading establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a single source qualifier.

Example

The following mapping is configured to perform constraint-based loading:

clip_image002

In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.

Since these tables receive records from a single active source, SQ_A, the Integration Service loads rows to the target in the following order:

1. T_1

2. T_2 and T_3 (in no particular order)

3. T_4

The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a primary key in T_3.After loading the first set of targets, the Integration Service begins reading source B. If there are no key relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets.

If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order:

  • T_5
  • T_6

T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. The Integration Service includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets.

Enabling Constraint-Based Loading:

When you enable constraint-based loading, the Integration Service orders the target load on a row-by-row basis. To enable constraint-based loading:

  1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property.
  2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.
  3. Click OK.

Refer : http://informaticatutorials-naveen.blogspot.com/2011/04/constraint-based-loading-in-workflow.html

When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan for sources within the mapplet.

Setting the Target Load Order

You can configure the target load order for a mapping containing any type of target definition. In the Designer, you can set the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. You can set the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables that have the primary key and foreign key constraints.

The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups sequentially.

To specify the order in which the Integration Service sends data to targets, create one source qualifier for each target within a mapping. To set the target load order, you then determine in which order the Integration Service reads each source in the mapping.

The following figure shows two target load order groups in one mapping:

 
In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS. The second target load order group includes all other objects in the mapping, including the TOTAL_ORDERS target. The Integration Service processes the first target load order group, and then the second target load order group.

When it processes the second target load order group, it reads data from both sources at the same time.

To set the target load order:

  1. Create a mapping that contains multiple target load order groups.
  2. Click Mappings > Target Load Plan.
  3. The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that receive data from each source qualifier.
  4. Select a source qualifier from the list.
  5. Click the Up and Down buttons to move the source qualifier within the load order.
  6. Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.

Refer : http://informaticatutorials-naveen.blogspot.com/2011/04/target-load-order.html

When a Power Center domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability.
You create the grid and configure the Integration Service in the Administration Console. To run a workflow on a grid, you configure the workflow to run on the Integration Service associated with the grid. To run a session on a grid, configure the session to run on the grid.

The Integration Service distributes workflow tasks and session threads based on how you configure the workflow or session to run:

  • Running workflows on a grid – The Integration Service distributes workflows across the nodes in a grid. It also distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid.
  • Running sessions on a grid – The Integration Service distributes session threads across nodes in a grid.

Note: To run workflows on a grid, you must have the Server grid option. To run sessions on a grid, you must have the Session on Grid option.

Running Workflows on a Grid:

When you run a workflow on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks, which it may distribute to other nodes. The master service process is the Integration Service process that runs the workflow, monitors service processes running on other nodes, and runs the Load Balancer. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows.

The Load Balancer is the component of the Integration Service that dispatches Session, Command, and predefined Event-Wait tasks to the nodes in the grid. The Load Balancer distributes tasks based on node availability. If the Integration Service is configured to check resources, the Load Balancer also distributes tasks based on resource availability.

For example, a workflow contains a Session task, a Decision task, and a Command task. You specify a resource requirement for the Session task. The grid contains four nodes, and Node 4 is unavailable. The master service process runs the Start and Decision tasks. The Load Balancer distributes the Session and Command tasks to

nodes on the grid based on resource availability and node availability.

Running Sessions on a Grid:

When you run a session on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks as it does when you run a workflow on a grid. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows. In addition, the Load Balancer distributes session threads to DTM processes running on different nodes.

When you run a session on a grid, the Load Balancer distributes session threads based on the following factors:

  •  Node availability :- The Load Balancer verifies which nodes are currently running, enabled, and available for task dispatch.
  •  Resource availability :- If the Integration Service is configured to check resources, it identifies nodes that have resources required by mapping objects in the session.
  •  Partitioning configuration. The Load Balancer dispatches groups of session threads to separate nodes based on the partitioning configuration.

You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run.

Grid Connectivity and Recovery

When you run a workflow or session on a grid, service processes and DTM processes run on different nodes. Network failures can cause connectivity loss between processes running on separate nodes. Services may shut down unexpectedly, or you may disable the Integration Service or service processes while a workflow or session is running. The Integration Service failover and recovery behavior in these situations depends on the service process that is disabled, shuts down, or loses connectivity. Recovery behavior also depends on the following factors:

  • High availability option:-When you have high availability, workflows fail over to another node if the node or service shuts down. If you do not have high availability, you can manually restart a workflow on another node to recover it.
  • Recovery strategy:- You can configure a workflow to suspend on error. You configure a recovery strategy for tasks within the workflow. When a workflow suspends, the recovery behavior depends on the recovery strategy you configure for each task in the workflow.
  • Shutdown mode:- When you disable an Integration Service or service process, you can specify that the service completes, aborts, or stops processes running on the service. Behavior differs when you disable the Integration Service or you disable a service process. Behavior also differs when you disable a master service process or a worker service process. The Integration Service or service process may also shut down unexpectedly. In this case, the failover and recovery behavior depend on which service process shuts down and the configured recovery strategy.
  • Running mode:-If the workflow runs on a grid, the Integration Service can recover workflows and tasks on another node. If a session runs on a grid, you cannot configure a resume recovery strategy.
  • Operating mode:- If the Integration Service runs in safe mode, recovery is disabled for sessions and workflows.

Note: You cannot configure an Integration Service to fail over in safe mode if it runs on a grid.

When we enable the session option-> Incremental Aggregation the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally.

When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes incrementally and you can capture changes, you can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session.

For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation.

When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the Integration Service to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The Integration Service then processes the new data and updates the target accordingly.Consider using incremental aggregation in the following circumstances:

  • You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process new data.
  • Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with complete source data.

Note: Do not use incremental aggregation if the mapping contains percentile or median functions. The Integration Service uses system memory to process these functions in addition to the cache memory you configure in the session properties. As a result, the Integration Service does not store incremental aggregation values for percentile and median functions in disk caches.

Integration Service Processing for Incremental Aggregation

The first time you run an incremental aggregation session, the Integration Service processes the entire source. At the end of the session, the Integration Service stores aggregate data from that session run in two files, the index file and the data file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties.

Each subsequent time you run the session with incremental aggregation, you use the incremental source changes in the session. For each input record, the Integration Service checks historical information in the index file for a corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration Service creates a new group and saves the record data.

When writing to the target, the Integration Service applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session.

If the source changes significantly and you want the Integration Service to continue saving aggregate data for future incremental changes, configure the Integration Service to overwrite existing aggregate data with new aggregate data.

Each subsequent time you run a session with incremental aggregation, the Integration Service creates a backup of the incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for two sets of the files.

When you partition a session that uses incremental aggregation, the Integration Service creates one set of cache files for each partition.

The Integration Service creates new aggregate data, instead of using historical data, when you perform one of the following tasks:

  • Save a new version of the mapping.
  • Configure the session to reinitialize the aggregate cache.
  • Move the aggregate files without correcting the configured path or directory for the files in the session properties.
  • Change the configured path or directory for the aggregate files without moving the files to the new location.
  • Delete cache files.
  • Decrease the number of partitions.

When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.

Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files.

Preparing for Incremental Aggregation:

When you use incremental aggregation, you need to configure both mapping and session properties:

  • Implement mapping logic or filter to remove pre-existing data.
  • Configure the session for incremental aggregation and verify that the file directory has enough disk space for the aggregate files.

Configuring the Mapping

Before enabling incremental aggregation, you must capture changes in source data. You can use a Filter or Stored Procedure transformation in the mapping to remove pre-existing source data during a session.

Configuring the Session

Use the following guidelines when you configure the session for incremental aggregation:

(i) Verify the location where you want to store the aggregate files.

  • The index and data files grow in proportion to the source data. Be sure the cache directory has enough disk space to store historical data for the session.
  • When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then, enter the appropriate directory for the process variable, $PMCacheDir, in the Workflow Manager. You can enter session-specific directories for the index and data files. However, by using the process variable for all sessions using incremental aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir.
  • Changing the cache directory without moving the files causes the Integration Service to reinitialize the aggregate cache and gather new aggregate data.
  • In a grid, Integration Services rebuild incremental aggregation files they cannot find. When an Integration Service rebuilds incremental aggregation files, it loses aggregate history.

(ii) Verify the incremental aggregation settings in the session properties.

  • You can configure the session for incremental aggregation in the Performance settings on the Properties tab.
  • You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the Workflow Manager displays a warning indicating the Integration Service overwrites the existing cache and a reminder to clear this option after running the session.

Refer : http://informaticatutorials-naveen.blogspot.com/2011/04/incremental-aggregation_20.html

Suppose, you have 10 flat files of same structure. All the flat files have same number of columns and data type. Now we need to transfer all the 10 files to same target.

Names of files are say EMP1, EMP2 and so on.

Solution 1:

1. Import one flat file definition and make the mapping as per need.

2. Now in session give the Source File name and Source File Directory location of one file.

3. Make workflow and run.

4. Now open session after workflow completes. Change the Filename and Directory to give information of second file. Run workflow again.

5. Do the above for all 10 files.

Solution 2:

1. Import one flat file definition and make the mapping as per need.

2. Now in session give the Source Directory location of the files.

3. Now in Fieldname use $InputFileName. This is a session parameter.

4. Now make a parameter file and give the value of $InputFileName.

$InputFileName=EMP1.txt

5. Run the workflow

6. Now edit parameter file and give value of second file. Run workflow again.

7. Do same for remaining files.

Solution 3:

1. Import one flat file definition and make the mapping as per need.

2. Now make a notepad file that contains the location and name of each 10 flat files.

Sample:

D:\EMP1.txt

E:\EMP2.txt

E:\FILES\DWH\EMP3.txt and so on

3. Now make a session and in Source file name and Source File Directory location fields, give the name and location of above created file.

4. In Source file type field, select Indirect.

5. Click Apply.

6. Validate Session

7. Make Workflow. Save it to repository and run.

Refer : http://informaticatutorials-naveen.blogspot.com/2011/04/indirect-loading-for-flat-files.html