Deduplicate

<< Click to Display Table of Contents >>

Raynet One Data Hub > 14.0 > Administration and User Guide > Transformations > Creating and Editing Transformation Steps > Source 

Deduplicate

Deduplicating is a process of taking a table as an input, group the similar records by one or more columns, and then decide on each set how to proceed with the records.

 

The deduplication step has the following syntax:

 

{

   "id": 11,
   "name": "Deduplicate table Dublicates, take random",
   "type": "deduplicate",
   "source": "Duplicates",
   "target": "Normalized_Duplicates_Random",
   "by": [ "cn", "dn" ],
   "strategy": "random"

}

Configure Deduplicate in Raynet One Data Hub

Deduplicate

 

SOURCE TABLE: Enter the table that is used as source for the data. It supports auto-completion and will offer all available tables matching the current input string for selection.

GROUP COLUMNS: Enter  a comma separated list of columns which will be used to group the rows.

STRATEGY: Select the strategy that will be used. The following strategies are available.

oNone

oRandom

oAny

oAll

oMax value

oMin value

 

Deduplicate_MinMaxValue

 

When Max value or Min value are defined as STRATEGY there will be an additional field available. In the COLUMN field, a column from which the value is used needs to be selected.