Joining

<< Click to Display Table of Contents >>

ETL > 14.0 > Implementation Guide > Tutorial and implementation guide > Steps 

Joining

Joining is an operation which takes two or more tables and combines them to a single on. Combination of rows is performed using a special column or set of columns, called joining keys. They should be locally unique to make actual sense.

Since two tables may but do not have to have all the same rows and keys, conflicts may arise:

Left table may have some unmatched rows in the right table

The right table may have some unmatched rows in the left table

clip0024

There are three basic strategies which dictate what to do in this case:

Outer joining
Will combine all matching rows, and for all unmatched it will take them as they-are, using NULL values are default for unmatched values. Typical use case: list of devices scanned by system A and system B may have a lot of entries in common, but some devices may be present only in system A and some devices only in system B. Outer join will ensure that all devices are present in the target table, even unmatched devices from system A and unmatched devices from system B.

Left joining
Will treat the left column as a master table and write all rows which were matched between left and right + any unmatched rows from the left table. Unmatched rows from the right column will be discarded. A typical example is lookup, in which the left table is the list of customers and the right table is the list of addresses. In the result table, we are interested in all results from the left table (customers) with as much extra information as possible from the address table, but we do not want to include unmatched addresses.

Inner joining
Will write only rows which exist in both tables. Unmatched rows from the left table and the right table will be ignored. Typical example is where the left table contains names and the right tables surnames. The output table should have name and the surname, we do not want partial matches with only name or only surname.

Join step requires a bit of extra information about the key used to join (ID). This key must be present in all joined tables, otherwise an exception will be thrown.

You may also join more tables at once (within the same strategy).

Joined tables are by default optional, meaning that the tables do not have to be existing. If any table is missing it will not be joined. If only a single table exists, then it is taken as-is and returned as the output. If all tables are missing then the step will not be executed and its target table will not be written. There is a way to define required tables, in which case in case of a missing source table the step will fail and report an error. More information about setting up required steps can be found in the following chapter: Optional and required tables.