Chaining steps

<< Click to Display Table of Contents >>

ETL > 14.0 > Implementation Guide > Tutorial and implementation guide 

Chaining steps

Frequently, a step is just an intermediary operation for another steps. For example, having two raw tables, before we can join them at all we need to make sure that:

Both have normalized columns, including joining keys which must have the same names

That the values in these columns are normalized and comparable

That we only have the columns that we need

 

In this typical scenario, the user would introduce first a single mapping step for each table (to prepare its data), and then the second actual join step which uses the prepared tables. In workflow editor, this is represented by the following schema:

clip0041

 

You can define as many steps as required, and just tell ETL engine how to link them (chain) together. The steps are never executed in the order they appear in JSON file, but more like in the actual order determined by the set of dependencies and inter-links.

 

Previously, we defined sources for steps linking to source tables, for example:

    "steps": [

        {

            "id": 2,

            "name": "Normalize Sample",

            "type": "map",

            "source": "sample",
            [...]

 

In order to define a source to be another step, simply write its ID instead:

    "steps": [

        {

            "id": 2,

            "name": "Normalize Sample",

            "type": "map",

            "source": 1,
            [...]

 

Using this example, output of step 1 will be used as an input for step 2.

The same principle follows for the join tables:

{

    "id": 3,

    "type": "join",

    "name": "join prepared tables",

    "sources": [1, 2],

    "on": ["Computer"],

    "strategy": "outer",

    "target": "joined_os_tables"

}    

 

You can also combine steps and source tables, but since JSON array may contain either strings or numbers, there is a special syntax for that. To combine table abc, def and results of steps 1 and 2 use the following:

{

    "id": 3,

    "type": "join",

    "name": "join prepared tables",

    "sources": [

        {

            "table": "abc"

        },

        {

            "table": "def"

        },

        {

            "step": 1

        },

        {

            "step": 2

        }

    ],

    "on": ["Computer"],

    "strategy": "outer",

    "target": "joined_os_tables"

}