Multiple columns

<< Click to Display Table of Contents >>

ETL > 14.0 u1 > Implementation Guide > Tutorial and implementation guide > Steps > Mapping 

Multiple columns

It is possible to calculate the value of a column based on the two or more other columns.

The full syntax for the multiple column mapping is

    "target-column-name": {

      "columns": ["required-source-column-name-1", "required-source-column-name-2"...],

      "aggregation": string|object

    }

 

or

    "target-column-name": {

      "columns": ["required-source-column-name-1", "required-source-column-name-2"...],

      "aggregation": string|object

      "type""sql-type"

    }

 
When the property aggregation is omitted, the default concatenation with pipe (|) as a a separator will be used.

Aggregation may be defined either as a string or as an object. For more information about the syntax and supported aggregation methods, see chapter Defining aggregations.

papercliper

Note:

Multiple column mapping support basic aggregations, like coalescing, selecting average/minimum/maximum values or concatenating the strings. You can use more complex values and cover more complicated cases by using custom mapping via SQL.

 

In this example, value from column2, column3 and column4 will be concatenated using | as the separator. This is the default behavior should no other options be specified. You can instruct ETL layer to use specific algorithm by using object notation:

{

    "steps": [

        {

            "id": 2,

            "name": "Normalize sample",

            "type": "map",

            "source": "sample2",

            "target": "sample2_Normalized",

            "columns": {

                "column1": "column1",

                "column2_together":{
                    "columns": ["column2", "column3", "column4"],
                    "aggregation": "coalesce"

                }                

            }

        }
    ]

}

 

This definition will select first non-null value from the list, starting from the left to right. To concatenate strings, replace coalesce with concat. If you need a custom separator, use again an object notation for aggregation and specify it there.

papercliper

Note:

Bear in mind that null values are omitted from the result of CONCAT. This is a handy feature for table join, where you want to track the source table. Join aggregation rules are the same as column rules for mapping.

Example

Given the following table SourceValues:

col1

col2

col3

col4

Row1

1

<null>

<null>

Row2

2

2

<null>

Row3

3

3

4

Row4

<null>

4

5

Row5

<null>

<null>

6

Row6

4

<null>

7

 
And the following JSON step definition:

    {

      "id": 2,

      "name""Example of column aggregation",

      "type""map",

      "source""SourceValues",

      "target""SourceValues_Aggregated",

      "columns": {

        "col1""col1",

        "coalesce": {

          "columns": [ "col2""col3""col4" ],

          "aggregation""coalesce"

        },

        "max": {

          "columns": [ "col2""col3""col4" ],

          "aggregation""max"

        },

        "concat_default": {

          "columns": [ "col2""col3""col4" ]

        },

        "concat_custom": {

          "columns": [ "col2""col4""col3" ],

          "aggregation": {

            "type""concat",

            "separator"","

          }

        }

      }

    }

 

The following result table SourceValues_Aggregated is expected:

col1

coalesce

max

concat_default

concat_custom

Row1

1

1

1

1

Row2

2

2

2 | 2

2,2

Row3

3

4

3 | 3 | 4

3,4,3

Row4

4

5

4 | 5

5,4

Row5

6

6

6

6

Row6

4

7

4 | 7

4,7

Arbitrary types, casting and converting

Using the type property lets you define a custom cast or custom type length. This common property is available for all column types, see more information in chapter Arbitrary column types.