Regular expression match

<< Click to Display Table of Contents >>

ETL > 14.0 > Implementation Guide > Tutorial and implementation guide > Steps > Mapping 

Regular expression match

papercliper

Note:

This column mapping type is available in ETL module version 1.1.271 and newer.

 

The value can be checked against a Regular Expression mask. This mapping returns 1 (true) if the given column matches the expression, or 0 (false) if it does not match the expression.

To define a Regular Expression match, an object notation in the following format is required:

{

    "columns": {

        "target_column_name": 

        {

            "column": "<name-of-source-column>",

            "regex": "<regular-expression>"

        },
        [...]            

    }

}

 
Where <regex> is a valid Regular Expression pattern. There are no more properties required or defined by regex-columns.

 

 

papercliper

Note:

More information about Regular Expressions and valid patterns can be found in internet. The following website is a great resource for both basic and advanced concepts:
https://www.regular-expressions.info/

 

Special characters and escaping sequences in Regular Expression patterns

Regular expressions use \ (backslash) as an escape sequence. Bear in mind the same escape sequence is used by JSON schema. This means, that when escaping a regular expression inside a JSON file, it is necessary to perform "double-escaping".

For example, to test whether the name consists of letters, followed by a dot, followed by numbers, the following Regular Expression can be used:

^[a-zA-Z]+\.[0-9]+$

Explanation:

^ means this is the beginning of the string

$ means the string ends here

[a-zA-Z]+ - one or more instances of characters from range a-z or A-Z.

[0-9]+ - one or more digit

\. - literal dot. Dot has a special meaning in Regular Expressions (any character), and to use it literally it must be escaped with \ (backslash).

However, the backslash in the regular expression must be escaped once more when used in JSON file, so that it is not interpreted as escape JSON sequence. The following JSON would be valid:

"regex": "^[a-zA-Z]+\\.[0-9]+$"

Example

Given the following table SourceTable:

Name

Mail

Marcin

marcin@raynet.de

John

john@raynet.ch

Kate

kate@contoso.com

 
And the following JSON step definition:

    {

      "id": 1,

      "type""map",

      "name""Example of auto-columns",

      "source""SourceTable",

      "columns": {

        "Name""Name",

        "WorksInRaynet": { 

          "column""Mail",

          "regex""@raynet\\.(de|ch|com)$// this will check if the Mail column ends with @raynet.de, or @raynet.ch or @raynet.com.

        }

      },

      "target""TargetTable_Regex"

    }

 

The following result table TargetTable_Regex is expected:

Name

WorksInRaynet

Marcin

1

John

1

Kate

0

 

Fallback values

It is possible to define a custom value returned, if the source column does not exist in the table. For more information about the usage of fallback values, see the Fallback columns chapter.