JOINER TRANSFORMATION
- Connected and Active Transformation
- Used to join source data from two related heterogeneous sources residing in Different locations or file systems. Or, we can join data from the same source.
- If we need to join 3 tables, then we need 2 Joiner Transformations.
- The Joiner transformation joins two sources with at least one matching port. The Joiner transformation uses a condition that matches one or more pairs of Ports between the two sources.
Example: To join EMP and
DEPT tables.
- EMP and DEPT will be source table.
- Create a target table JOINER_EXAMPLE in target designer. Table should Contain all ports of EMP table plus DNAME and LOC as shown below.
- Create the shortcuts in your folder.
Creating Mapping:
1. Open folder where we
want to create the mapping.
2. Click Tools ->
Mapping Designer.
3.Click Mapping->
Create-> Give mapping name. Ex: m_joiner_example
4. Drag EMP, DEPT, and
Target. Create Joiner Transformation. Link as shown below.
5. Specify the join
condition in Condition tab. See steps on next page.
6. Set Master in Ports
tab. See steps on next page.
7. Mapping ->
Validate
8. Repository ->
Save.
- Create Session and Workflow as described earlier. Run the Work flow and see the data in target table.
- Make sure to give connection information for all tables.
JOIN CONDITION:
The join condition
contains ports from both input sources that must match for the Power Center
Server to join two rows.
Example: DEPTNO=DEPTNO1 in
above.
1. Edit Joiner
Transformation -> Condition Tab
2. Add condition
- We can add as many conditions as needed.
- Only = operator is allowed.
If we join Char and
Varchar data types, the Power Center Server counts any spaces that pad Char
values as part of the string. So if you try to join the following:
Char (40) = “abcd” and
Varchar (40) = “abcd”
Then the Char value is
“abcd” padded with 36 blank spaces, and the Power Center Server does not join
the two fields because the Char field contains trailing spaces.
Note: The Joiner
transformation does not match null values.
MASTER and DETAIL TABLES
In Joiner, one table is
called as MASTER and other as DETAIL.
- MASTER table is always cached. We can make any table as MASTER.
- Edit Joiner Transformation -> Ports Tab -> Select M for Master table.
Table with less number
of rows should be made MASTER to improve Performance.
Reason:
- When the Power Center Server processes a Joiner transformation, it reads rows from both sources concurrently and builds the index and data cache based on the master rows. So table with fewer rows will be read fast and cache can be made as table with more rows is still being read.
- The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
JOINER TRANSFORMATION
PROPERTIES TAB
- Case-Sensitive String Comparison: If selected, the Power Center Server uses case-sensitive string comparisons when performing joins on string columns.
- Cache Directory: Specifies the directory used to cache master or detail rows and the index to these rows.
- Join Type: Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer.
Tracing Level
Joiner Data Cache
Size
Joiner Index Cache
Size
Sorted Input
JOIN TYPES
In SQL, a join is a
relational operator that combines data from multiple tables into a single
result set. The Joiner transformation acts in much the same manner, except that
tables can originate from different databases or flat files.
Types of Joins:
- Normal
- Master Outer
- Detail Outer
- Full Outer
Note: A normal or master
outer join performs faster than a full outer or detail outer join.
Example: In EMP, we have
employees with DEPTNO 10, 20, 30 and 50. In DEPT, we have DEPTNO 10, 20, 30 and
40. DEPT will be MASTER table as it has less rows.
Normal Join:
With a normal join, the
Power Center Server discards all rows of data from the master and detail source
that do not match, based on the condition.
- All employees of 10, 20 and 30 will be there as only they are matching.
Master Outer Join:
This join keeps all rows
of data from the detail source and the matching rows from the master source. It
discards the unmatched rows from the master source.
- All data of employees of 10, 20 and 30 will be there.
- There will be employees of DEPTNO 50 and corresponding DNAME and LOC Columns will be NULL.
Detail Outer Join:
This join keeps all rows
of data from the master source and the matching rows from the detail source. It
discards the unmatched rows from the detail source.
- All employees of 10, 20 and 30 will be there.
- There will be one record for DEPTNO 40 and corresponding data of EMP columns will be NULL.
Full Outer Join:
A full outer join keeps
all rows of data from both the master and detail sources.
- All data of employees of 10, 20 and 30 will be there.
- There will be employees of DEPTNO 50 and corresponding DNAME and LOC Columns will be NULL.
- There will be one record for DEPTNO 40 and corresponding data of EMP Columns will be NULL.
USING SORTED INPUT
- Use to improve session performance.
- to use sorted input, we must pass data to the Joiner transformation sorted by the ports that are used in Join Condition.
- We check the Sorted Input Option in Properties Tab of the transformation.
- If the option is checked but we are not passing sorted data to the Transformation, then the session fails.
- We can use SORTER to sort data or Source Qualifier in case of Relational tables.
JOINER CACHES
Joiner always caches the
MASTER table. We cannot disable caching. It builds Index cache and Data Cache
based on MASTER table.
1) Joiner Index Cache:
- All Columns of MASTER table used in Join condition are in JOINER INDEX CACHE.
· Example: DEPTNO in our mapping.
2) Joiner Data Cache:
- Master column not in join condition and used for output to other transformation or target table are in Data Cache.
· Example: DNAME and LOC in our
mapping example.
Performance Tuning:
- Perform joins in a database when possible.
- Join sorted data when possible.
- For a sorted Joiner transformation, designate as the master source the source with fewer duplicate key values.
- Joiner can't be used in following conditions:
1. Either input pipeline
contains an Update Strategy transformation.
2. We connect a Sequence
Generator transformation directly before the Joiner transformation
The tutorial is very helpful for a first timer like me, by the way if you could have given a
ReplyDeletetutorial on creating a free blog, it would have been very helpful for newbies like me.
click for info