Advanced Topics > Setting Column Attributes

In DataHub, you can define specific attributes for columns in result tables to improve database performance and enforce data integrity. Column attributes such as primary keys, indexes, and unique constraints can be set using a special syntax within column definitions.

Benefits of Setting Column Attributes

Defining column attributes provides several advantages:

•Performance Improvement: Indexes and primary keys can dramatically increase query performance, particularly for large datasets.

•Data Integrity: Primary keys and unique constraints ensure data consistency and prevent duplicate entries.

•Automated Identification: Auto-increment attributes automatically generate unique values for new records.

Supported Column Attributes

DataHub supports the following column attributes:

•k or p - Sets the column as a primary key for the table

•u - Marks column values as unique per row

•i - Adds an index to the column

•a - Marks the column to use auto-increment

Attributes can be combined. For example, |ki sets both a primary key and an index on the column.

Syntax for Column Attributes

Column attributes are specified by appending them to the column type definition using the following format:

columnName [dataType|attributes]

Where:

•columnName is the name of the column

•dataType is the SQL data type (such as nvarchar(200), int, datetime)

•attributes is one or more attribute characters (k, u, i, or a)

DataHub automatically parses these attributes and applies the appropriate column properties.

Setting Column Attributes in Different Contexts

Column attributes can be defined in multiple DataHub components. The following sections demonstrate how to set column attributes in transformations, SQL tasks, and CSV imports.

Setting Attributes in Transformations

In transformation steps, you can define column attributes by adding them after the column type specification. This ensures that the output table is created with the correct column properties.

Example:

To set a primary key on the device_key column, append |k after the column type in the transformation configuration. You can also combine multiple attributes, such as |ki, which sets both a primary key and an index on the column.

Setting Attributes in SQL Tasks

When using SQL tasks, column attributes are defined within the column alias. Use the AS keyword to specify the column name, data type, and attributes.

Microsoft SQL Server Example:

SELECT TOP (10) [bios]

,[cpu_chip_count]

,[cpu_core_count]

,[device_key] AS "device_key [nvarchar(1024)|k]"

FROM [DataHub_Default].[dbo].[DataTransformation-result_devices]

In this example, the device_key column is defined with the data type nvarchar(1024) and the k attribute to mark it as the primary key.

Setting Attributes in CSV Imports

When importing data from CSV files, you can specify column attributes directly in the CSV header row. Modify the column definition to include the data type and attributes.

Example CSV Header:

"id[nvarchar(200)|k]","userId","deviceName","managedDeviceOwnerType"

In this example, the id column is defined with the data type nvarchar(200) and the k attribute to mark it as the primary key. All other columns without attributes will use default settings.

Setting Attributes in PowerShell and Python Scripts

PowerShell and Python scripts can also define column attributes by including the type and attributes within the result column names. DataHub automatically parses these attributes when processing the results.

Example Column Names in Script Results:

"id[nvarchar(200)|k]", "userId", "deviceName"

Ensure that the result column names follow the format columnName[dataType|attributes] so that DataHub correctly interprets and applies the column properties.

Best Practises

•Always Define Primary Keys: Setting a primary key on your result tables ensures data integrity and optimises query performance.

•Use Indexes Strategically: Add indexes to columns frequently used in WHERE clauses, JOIN conditions, or ORDER BY statements to improve query speed.

•Combine Attributes When Necessary: You can combine multiple attributes on a single column, such as |ki, to set both a primary key and an index.

•Be Consistent: Apply the same column attribute conventions across all tasks, transformations, and imports for maintainability and clarity.