# Configuration

This part of the documentation covers all the available mathematical and data-related configuration options of TIM Detect's anomaly detection with a system-driven approach. Each subsection is comprised of a description accompanied by its **TIM API** notation. Note that there is some overlap in terms of configuration parameters with ** the configuration of TIM Detect's kpi-driven anomaly detection** and **the TIM Forecasting configuration**.

To learn more, see the tables below.

## Domain-specific configuration

Configuration parameter | build-model | rebuild-model* | detect |
---|---|---|---|

Sensitivity | ☑ | ☑ | ☐ |

Minimum sensitivity | ☑ | ☑ | ☐ |

Maximum sensitivity | ☑ | ☑ | ☐ |

Anomaly indicator window | ☑ | ☑ | ☐ |

☑ available in a given method

☐ not available in a given method

* method is not implemented yet

### Sensitivity

The sensitivity is a percentual number defining the underlying model's sensitivity to anomalies. In general, the higher the sensitivity, the more anomalies are detected. If the parameter is not specified, TIM finds the sensitivity automatically. Read more about **sensitivity**.

`"domainSpecifics": {`

"sensitivity": 0.5

}

Instead of setting an exact sensitivity, **TIM Detect** also supports setting a range in the form of a **minimum sensitivity** and a **maximum sensitivity**.

### Minimum sensitivity

The minimum sensitivity setting is a percentual number that defines the lower limit for the sensitivity. This setting makes sense if the **sensitivity parameter** is chosen automatically. By default, it is equal to 0%.

`"domainSpecifics": {`

"minSensitivity": 0.1

}

### Maximum sensitivity

The maximum sensitivity setting is a percentual number that defines the upper limit for the sensitivity. This setting makes sense if the **sensitivity parameter** is selected automatically. By default, it is equal to 5%.

`"domainSpecifics": {`

"maxSensitivity": 0.8

}

### Anomaly indicator window

The anomaly indicator window serves to smooth the original anomaly indicator by averaging over the window of its last successive values. This is useful when an anomaly is considered as a longer period of higher anomaly scores.

Possible base units for averaging over are *Day*, *Hour*, *Minute*, *Second* and *Sample*.

If not specified, the window is set to 1 sample representing the initial anomaly indicator (no averaging).

`"domainSpecifics": {`

"anomalyIndicatorWindow": {

"baseUnit": "Hour",

"value": 24

}

}

## Model configuration

Configuration parameter | build-model | rebuild-model* | detect |
---|---|---|---|

Number of trees | ☑ | ☐ | ☐ |

Subsample size | ☑ | ☐ | ☐ |

Max tree depth | ☑ | ☐ | ☐ |

Extension level | ☑ | ☐ | ☐ |

Normalization | ☑ | ☐ | ☐ |

☑ available in a given method

☐ not available in a given method

* method is not implemented yet

### Number of trees

The number of trees determines the number of trees in the model. The more trees in the model, the longer the training time, but the less prone to overfitting.

If the parameter is not provided, TIM's default is 150.

`"model": {`

"numberOfTrees": 80

}

### Subsample size

The subsample size determines the number of samples used for estimating each tree. When the subsample size is larger than the provided number of samples, each tree uses all samples (no sampling).

If the parameter is not defined, TIM's default is 256.

`"model": {`

"subSampleSize": 120

}

### Max tree depth

The max tree depth defines the maximal depth of the branching process of each tree. If not set, TIM will automatically calculate the max tree depth based on the subsample size.

`"model": {`

"maxTreeDepth": 7

}

### Extension level

The extension level is a number determining the process of branching. The value is in the range [0, V-1], where V is the number of variables.

A value of 0 means that the hyperplanes splitting the V-dimensional space are parallel to V-1 axes, which corresponds to the Isolation Forest (not extended) model.

With an increasing value for the extension level, the number of axes that the hyperplane is parallel to decreases. The V-1 stands for a full extension; the hyperplane does not have to be parallel with any axis.

If not provided, TIM defaults to the full extension level. If the provided extension level is larger than the provided number of columns-1, the default is used.

`"model": {`

"extensionLevel": 2

}

### Normalization

The normalization setting is a boolean determining whether time series (variables) should be normalized (scaled by their mean and standard deviation).

By default, it is set to true.

`"model": {`

"normalization": true

}

## Data configuration

Configuration parameter | build-model | rebuild-model* | detect |
---|---|---|---|

Rows | ☑ | ☐ | ☑ |

Columns | ☑ | ☐ | ☐ |

Label column | ☑ | ☐ | ☐ |

Imputation | ☑ | ☐ | ☑ |

Timescale | ☑ | ☐ | ☐ |

☑ available in a given method

☐ not available in a given method

* method is not implemented yet

### Rows

The rows setting defines which samples should be used for model building, rebuilding or detecting (based on the used method); if not set, all timestamps are used.

There are two possibilities for specifying them:

they can be specified as an array of timestamp ranges:

`"rows": [`

{

"from": "2009-06-01 00:00:00",

"to": "2009-06-10 23:00:00"

},

{

"from": "2009-05-01 00:00:00",

"to": "2009-05-10 23:00:00"

}

]

Alternatively, a relative notation can be used as an integer number n with its base unit (one of *Month*, *Day*, *Hour*, *Minute*, *Second* and *Sample*) defining the length of the time range. The type of the relative range defines the start and the direction from which it is calculated. Type *Last* begins from the last non-missing variable observation (the most recent observation of the variables) going backwards and type *First* starts from the first non-missing variable observation (the oldest observation) going forward. If no type is specified, the default type is *Last*.

`"rows": {`

"type": "Last",

"baseUnit": "Day",

"value": 14

}

*NOTE* : Currently, the maximal supported number of rows is 100000.

### Columns

The columns setting lists all columns (identified by names or indices) that should be used for model building. If not provided, TIM will use all available columns.

`"columns": [5, "variable1", "variable2", 10]`

*NOTE* : Currently, the maximal supported number of columns is 100.

### Label column

The label column determines which column (identified by name or index) represents the anomaly label. A binary variable (0 - no anomaly, 1 - anomaly) allows TIM to calculate error measurements.

If not set, no column is defined as a label column.

`"labelColumn": ["label"]`

### Imputation

The *imputation* setting applies if there are missing values in the dataset. Using this setting, TIM will impute all gaps in the data that are not longer than the *maxGapLength* parameter (in amount of samples). There are two available imputation methods or *types*: *Linear* (for linear interpolation) and *LOCF* (for *Last Observation Carried Forward* or imputation with the last non-missing observation). The type *None* turns off imputation. The default setting is *None*.

`"imputation": {`

"type": "Linear",

"maxGapLength": 2

}

### Timescale

This setting specifies the rescaling of the original dataset to another sampling period by aggregating the data. The aggregation function is *mean* for numerical variables and *maximum* for boolean variables. The *baseUnit* of the rescaling is limited to one of *Day*, *Hour*, *Minute* or *Second*.

If not set, the original estimated sampling is used.

`"timeScale": {`

"baseUnit": "Day",

"value": 1

}

*NOTE* : Time scaling only works from more frequent sampling periods to less frequent sampling periods, and does not work for data sampled monthly.