Informatica Reference

Archive for the ‘DataWareHousing Basics’ Category

There are many tasks available , which are used to create workflows and worklets.

Types of Tasks 

Task Name
Reusable
Description
Session
Yes
Set of instructions to run a mapping
Command
 Yes
Specifies shell commands to run during the workflow. You can choose to run the Command task only if the previous task in the workflow completes.
Email
Yes
Sends email during the workflow.
Control
No
Stops or aborts the workflow.
Decision
No
Specifies a condition to evaluate in the workflow. Use the Decision task to create branches in a workflow.
Event-Raise
No
Represents the location of a user-defined event. The Event-Raise task triggers the user-defined event when the PowerCenter Server runs the Event-Raise task.
Event-Wait
No
Waits for a user-defined or a pre-defined event to occur. Once the event occurs, the PowerCenter Server completes the rest of the workflow.
Timer
No
Waits for a specified period of time to run the next task.
  • In star schema, If the dimension tables are spitted into one or more dimension tables
  • The de-normalized dimension tables are spitted into a normalized dimension table
  1. It increases the number of joins and poor performance in retrieval of data.
  2. In few organizations, they try to normalize the dimension tables to save space.
  3. Since dimension tables hold less space snow flake schema approach may be avoided.
  4. Bit map indexes can not be effectively utilized

Important aspects of Star Schema & Snow Flake Schema

  1. In a star schema every dimension will have a primary key.
  2. In a star schema, a dimension table will not have any parent table.
  3. Whereas in a snow flake schema, a dimension table will have one or more parent tables.
  4. Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
  5. Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies help to drill down the data from topmost hierarchies to the lowermost hierarchies.
  1. Conceptual Data Modeling
  2. Logical Data Modeling
  3. Physical Data Modeling
  4. Dimensional Data Modeling

Conceptual Data Modeling

  • Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE
  • Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review.
  • Conceptual data modeling gives an idea to the functional and technical team about how business requirements would be projected in the logical data model.

Logical Data Modeling

  • This is the actual implementation and extension of a conceptual data model. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.

Physical Data Modeling

  • Physical data model includes all required tables, columns, relationships, database properties for the physical implementation of databases. Database performance, indexing strategy, physical storage and demoralization are important parameters of a physical model.

Dimensional Data Modeling

  • Dimension model consists of fact and dimension tables
  • It is an approach to develop the schema DB designs

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalizaton process:

Eliminating redundant data.
Ensuring data dependencies.

First Normal Form

First Normal form(1 NF) sets the very basic rules for an organized database.

  • Eliminate duplicative columns from the same table
  • Create seperate tables for each group of related data and identify each row with a unique column or set of columns(the promary key)

Second Normal Form

Second Normal form(2 NF) further addresses the concept of removing duplicative data.

  • Meet all the requirements of teh first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in seperate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form

Third Normal form(3 NF) remove columns which are not dependent upon the primary key.

  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.
  1. Confirmed Dimensions are nothing but Reusable Dimensions.
  2. The dimensions which u r using multiple times or in multiple data marts.
  3. Those are common in different data marts.

Conformed dimensions are either identical or strict mathematical subsets of the most granular, detailed dimension. Dimension tables are not conformed if the attributes are labeled differently or contain different values. Conformed dimensions come in several different flavors. At the most basic level, conformed dimensions mean the exact same thing with every possible fact table to which they are joined. The date dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts.

A “junk” dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. where as A degenerate dimension is data that is dimensional in nature but stored in a fact table.

Degenerative Dimension is something dimensional in nature but exist fact table.(Invoice No)

A column of the key section of the fact table that does not have the associated dimension table but used for reporting and analysis, such column is called Degenerate dimension or Line item dimension.

For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no, and date in key section and price, quantity, amount in measure section. In this fact table, bill_no from key section is a single value; it has no associated dimension table. Instead of creating a Separate dimension table for that single value, we can Include it in fact table to improve performance. SO here the column, bill_no is a degenerate dimension or line item dimension.