Configuration

The following subsections go through all available settings of a drift detection job. Note that the parameters may overlap with the configuration of other jobs e.g. TIM Detect's KPI-driven anomaly detection.

The settings are divided into two main parts:

configuration - mathematical configuration used to build model
data - data preprocessing and configuration

Mathematical configuration

Configuration parameter	build-model	detect	default
Threshold	☑	☐	0.01

☑ available in a given method
☐ not available in a given method

Threshold

This parameter is used to set the threshold that indicates the presence of data drift. If the Jensen-Shannon divergence between reference and test data exceeds the threshold, it is considered that data drift has occurred. The default value for the threshold is 0.01, but you can adjust this value based on your specific use case and the level of sensitivity required to detect drift.

"configuration": {
    "threshold": 0.1
}

Data configuration

Configuration parameter	build-model	detect	default
Version	☑	☑	Last version of the dataset
Reference rows	☑	☐	All rows which are not test rows / reference rows from model
Test rows	☑	☐	Required - no default value
Rows	☐	☑	All rows
Columns	☑	☐	All columns / columns from model
Time scale	☑	☐	Originally estimated from dataset

☑ available in a given method
☐ not available in a given method

Version

This setting specifies id of dataset version which should be used for outlier detection. If not specified, last valid (successfully updated) version of dataset will be used.

"version": {
    "id": "afdbb647-22cf-4576-8b82-4b71d4a10e5f"
}

Reference rows

This setting defines which samples should be used as reference rows when building-model. The user can specify the rows as an array of timestamp ranges. If not set, all timestamps but the ones defined in the test rows will be used.

"referenceRows": [
    {
        "from": "2009-06-01 00:00:00",
        "to": "2009-06-10 23:00:00"
    },
    {
        "from": "2009-05-01 00:00:00",
        "to": "2009-05-10 23:00:00"  
    }
]

Alternatively, a relative notation can be used, expressed as an integer number n with its base unit (one of Day, Hour, Minute, Second and Sample). This defines the length of the time range. The type of the relative range defines the start and the direction from which it is calculated. The Last starts from the last non-missing observation (the newest observation) in the dataset going backwards and the First starts from the first non-missing observation in the dataset (the oldest observation) going forward. If no type is specified, default value is Last.

"referenceRows": {
    "type": "Last",
    "baseUnit": "Day",
    "value": 2
}

If there is an intersection of the referenceRows with the testRows, observations in the intersection will be considered as follows:

by default, observations in the intersection will be considered as test,
when testRows are defined as a relative range starting from the first timestamp (type First), the observations in the intersection will be considered as reference; the reasoning here is that for test validation data towards the end of the dataset are more relevant.

This setting does not exist for detect jobs. In that case, reference rows are taken from the model.

Test rows

This setting defines which samples should be tested for drift in build-model jobs.

There are two ways to configure the test rows:

as an array of timestamp ranges:

"testRows": [
    {
        "from": "2020-06-01 00:00:00",
        "to": "2020-06-10 23:00:00"
    },
    {
        "from": "2020-05-01 00:00:00",
        "to": "2020-05-10 23:00:00"  
    }
]

as an integer number n with base unit (one of Day, Hour, Minute, Second and Sample), defining the relative time range and the type of the relative range defining the start and direction (First and Last calculated from the first / last non-missing target observation, default is Last).

"testRows": {
    "type": "Last",
    "baseUnit": "Day",
    "value": 2
}

If there is an intersection of the referenceRows with the testRows, observations in the intersection will be considered as follows:

by default, observations in the intersection will be considered as testRows,
when testRows are defined as a relative range starting from the first timestamp (type First), the observations in the intersection will be considered as referenceRows; the reasoning here is that for out-of-sample validation data towards the end of the dataset are more relevant.

Rows

This setting defines which samples should be tested for drift in detect jobs.

There are two ways to configure the test rows:

as an array of timestamp ranges:

"rows": [
    {
        "from": "2020-06-01 00:00:00",
        "to": "2020-06-10 23:00:00"
    },
    {
        "from": "2020-05-01 00:00:00",
        "to": "2020-05-10 23:00:00"  
    }
]

as an integer number n with base unit (one of Day, Hour, Minute, Second and Sample), defining the relative time range and the type of the relative range defining the start and direction (First and Last calculated from the first / last non-missing target observation, default is Last).

"rows": {
    "type": "Last",
    "baseUnit": "Day",
    "value": 2
}

Columns

This setting lists all columns (given either by their names or numbers) that should be used for model building. If not provided, TIM will use all available columns. For detect jobs, columns are determined from the model.

"columns": [5, "y"]

Time scale

This setting determines the rescaling of the original dataset to another sampling period. The baseUnit of the rescaling is limited to one of Day, Hour, Minute or Second). If not set, the original estimated sampling period will be used. Time scaling only works from lower sampling periods to higher sampling periods, and does not work for data sampled monthly.

"timeScale": {
  "baseUnit": "Day",
  "value": 2
}

Mathematical configuration​

Threshold​

Data configuration​

Version​

Reference rows​

Test rows​

Rows​

Columns​

Time scale​