Configuration

This part of the documentation covers all the available mathematical and data-related configuration options of TIM Detect's anomaly detection with a system-driven approach. Each subsection is comprised of a description accompanied by its TIM API notation. Note that there is some overlap in terms of configuration parameters with the configuration of TIM Detect's kpi-driven anomaly detection and the TIM Forecasting configuration.

To learn more, see the tables below.

Domain-specific configuration

Configuration parameter	build-model	rebuild-model*	detect
Sensitivity	☑	☑	☐
Minimum sensitivity	☑	☑	☐
Maximum sensitivity	☑	☑	☐
Anomaly indicator window	☑	☑	☐

☑ available in a given method
☐ not available in a given method
* method is not implemented yet

Sensitivity

The sensitivity is a percentual number defining the underlying model's sensitivity to anomalies. In general, the higher the sensitivity, the more anomalies are detected. If the parameter is not specified, TIM finds the sensitivity automatically. Read more about sensitivity.

"domainSpecifics": {
  "sensitivity": 0.5
}

Instead of setting an exact sensitivity, TIM Detect also supports setting a range in the form of a minimum sensitivity and a maximum sensitivity.

Minimum sensitivity

The minimum sensitivity setting is a percentual number that defines the lower limit for the sensitivity. This setting makes sense if the sensitivity parameter is chosen automatically. By default, it is equal to 0%.

"domainSpecifics": {
  "minSensitivity": 0.1
}

Maximum sensitivity

The maximum sensitivity setting is a percentual number that defines the upper limit for the sensitivity. This setting makes sense if the sensitivity parameter is selected automatically. By default, it is equal to 5%.

"domainSpecifics": {
  "maxSensitivity": 0.8
}

Anomaly indicator window

The anomaly indicator window serves to smooth the original anomaly indicator by averaging over the window of its last successive values. This is useful when an anomaly is considered as a longer period of higher anomaly scores.

Possible base units for averaging over are Day, Hour, Minute, Second and Sample.

If not specified, the window is set to 1 sample representing the initial anomaly indicator (no averaging).

"domainSpecifics": {
  "anomalyIndicatorWindow": {
    "baseUnit": "Hour",
    "value": 24
  }
}

Model configuration

Configuration parameter	build-model	rebuild-model*	detect
Number of trees	☑	☐	☐
Subsample size	☑	☐	☐
Max tree depth	☑	☐	☐
Extension level	☑	☐	☐
Normalization	☑	☐	☐

☑ available in a given method
☐ not available in a given method
* method is not implemented yet

Number of trees

The number of trees determines the number of trees in the model. The more trees in the model, the longer the training time, but the less prone to overfitting.

If the parameter is not provided, TIM's default is 150.

"model": {
  "numberOfTrees": 80
}

Subsample size

The subsample size determines the number of samples used for estimating each tree. When the subsample size is larger than the provided number of samples, each tree uses all samples (no sampling).

If the parameter is not defined, TIM's default is 256.

"model": {
  "subSampleSize": 120
}

Max tree depth

The max tree depth defines the maximal depth of the branching process of each tree. If not set, TIM will automatically calculate the max tree depth based on the subsample size.

"model": {
  "maxTreeDepth": 7
}

Extension level

The extension level is a number determining the process of branching. The value is in the range [0, V-1], where V is the number of variables.

A value of 0 means that the hyperplanes splitting the V-dimensional space are parallel to V-1 axes, which corresponds to the Isolation Forest (not extended) model.

With an increasing value for the extension level, the number of axes that the hyperplane is parallel to decreases. The V-1 stands for a full extension; the hyperplane does not have to be parallel with any axis.

If not provided, TIM defaults to the full extension level. If the provided extension level is larger than the provided number of columns-1, the default is used.

"model": {
  "extensionLevel": 2
}

Normalization

The normalization setting is a boolean determining whether time series (variables) should be normalized (scaled by their mean and standard deviation).

By default, it is set to true.

"model": {
  "normalization": true
}

Data configuration

Configuration parameter	build-model	rebuild-model*	detect
Rows	☑	☐	☑
Columns	☑	☐	☐
Label column	☑	☐	☐
Imputation	☑	☐	☑
Timescale	☑	☐	☐

☑ available in a given method
☐ not available in a given method
* method is not implemented yet

Rows

The rows setting defines which samples should be used for model building, rebuilding or detecting (based on the used method); if not set, all timestamps are used.

There are two possibilities for specifying them:

they can be specified as an array of timestamp ranges:

"rows": [
  {
   "from": "2009-06-01 00:00:00",
   "to": "2009-06-10 23:00:00"
  },
  {
   "from": "2009-05-01 00:00:00",
   "to": "2009-05-10 23:00:00"  
  }
]

Alternatively, a relative notation can be used as an integer number n with its base unit (one of Month, Day, Hour, Minute, Second and Sample) defining the length of the time range. The type of the relative range defines the start and the direction from which it is calculated. Type Last begins from the last non-missing variable observation (the most recent observation of the variables) going backwards and type First starts from the first non-missing variable observation (the oldest observation) going forward. If no type is specified, the default type is Last.

"rows": {
  "type": "Last",
  "baseUnit": "Day",
  "value": 14
}

NOTE : Currently, the maximal supported number of rows is 100000.

Columns

The columns setting lists all columns (identified by names or indices) that should be used for model building. If not provided, TIM will use all available columns.

"columns": [5, "variable1", "variable2", 10]

NOTE : Currently, the maximal supported number of columns is 100.

Label column

The label column determines which column (identified by name or index) represents the anomaly label. A binary variable (0 - no anomaly, 1 - anomaly) allows TIM to calculate error measurements.

If not set, no column is defined as a label column.

"labelColumn": ["label"]

Imputation

The imputation setting applies if there are missing values in the dataset. Using this setting, TIM will impute all gaps in the data that are not longer than the maxGapLength parameter (in amount of samples). There are two available imputation methods or types: Linear (for linear interpolation) and LOCF (for Last Observation Carried Forward or imputation with the last non-missing observation). The type None turns off imputation. The default setting is None.

"imputation": {
  "type": "Linear",
  "maxGapLength": 2
}

Timescale

This setting specifies the rescaling of the original dataset to another sampling period by aggregating the data. The aggregation function is mean for numerical variables and maximum for boolean variables. The baseUnit of the rescaling is limited to one of Day, Hour, Minute or Second.

If not set, the original estimated sampling is used.

"timeScale": {
  "baseUnit": "Day",
  "value": 1
}

NOTE : Time scaling only works from more frequent sampling periods to less frequent sampling periods, and does not work for data sampled monthly.

Domain-specific configuration​

Sensitivity​

Minimum sensitivity​

Maximum sensitivity​

Anomaly indicator window​

Model configuration​

Number of trees​

Subsample size​

Max tree depth​

Extension level​

Normalization​

Data configuration​

Rows​

Columns​

Label column​

Imputation​

Timescale​

Domain-specific configuration

Sensitivity

Minimum sensitivity

Maximum sensitivity

Anomaly indicator window

Model configuration

Number of trees

Subsample size

Max tree depth

Extension level

Normalization

Data configuration

Rows

Columns

Label column

Imputation

Timescale