Outputs

In this page we will go in detail for every part of the TIMModelBuildingStatusResponse.json, because it contains additional useful information.

Progress

The whole training process is supervised with a progress status. The data intake and validation takes values from 0 to 10 percent, the model building itself takes values from 10 to 90 percent and in-sample prediction values 90 to 100 percent.

{
      "dateTime": "2019-07-04T21:07:03.654Z",
      "eventType": "Progress",
      "message": "61.7"
}

Model

Model is returned as an encrypted string sequence. When predicting, TIM first decrypts it. Such model we also call a modelZoo because it actually contains more simple models suited for different data availability situations. Every modelZoo also includes a set of dummy models, that can be used even when the data are not provided or are missing. However, the quality of prediction is usually lower.

"model": "uedS9mPDQ6LOak0ZZWM2o2IpSLYxN+Ym0UAtsy44OCcR+daz1iT1azHvehsr6HSh3FcoTzcryEvbVW/1FJD4lTBDGSksV2H06akxQDw/Or9EIMheXYy0XwDEuG6iWE4nBDugNzwZndW0owF4Kc+6lt2D5YGnLaf63K+SjT/qNBAJJHZwTiXATQRxMVckAO"

Difficulty

This is a simple measure of how difficult it should be to model given data. It ranges from 0 to 100 percent and is calculated as 1 minus ratio of explained variance to original variance when using a simple regression model. Completely random data will have difficulty close to 100 and vice versa.

"dataDifficulty": 37.5

Importance

TIM provides two different measures of variable importance.

Simple importance

This is a measure related to original predictors. It is useful to evaluate how much they contribute to the model and whether it is worth it to collect or buy them. They sum up to 100 percent.

"simpleImportances": [
      {
        "predictorName": "Water_pressure",
        "importance": 100.0
      }
 ]

This will only be returned when demanded in the task file:

modelBuilding:
  configuration:
    extendedOutputConfiguration:
      returnSimpleImportances: true

Extended importance

This is a more complicated measure related to not original predictors, but their transformations done by TIM. It is also done separately for every specific time of the day if the model was built this way. On top of that, if the modelZoo consists of several models able to model one specific time of the day, the importance is aggregated across them. It sums up to 100 for every part of the day. Before the aggregation across several models happens, the measure represents a portion of variance explained by this transformed predictor. Extended importance can be used for creating nice treemaps like here.

{
"termName": "Load(t-31) & #MA(t-13, w:2)",
"importance": 10.35,
"time": "00:00:00",
"type": "Interaction"
}

This will only be returned when demanded in the task file:

modelBuilding:
  configuration:
    extendedOutputConfiguration:
      returnExtendedImportances: true

Data offsets

To avoid always plugging all data to TIM when making predictions, data offsets exist to tell you how much of history is really needed. For each of the original predictors you get an integer value that tells you how far in the history the model potentially looks (in number of samples) when predicting a specific timestamp. This way you know what your model needs and you do not have to transfer unnecessary loads of data when predicting with TIM.

"dataOffsets": [
    {
      "uniqueName": "STEAM",
      "from": {
        "baseUnit": "Sample",
        "offset": -430
      }
    }
  ]

Backtesting (Forecasting Only)

Since API 4.2

To properly evaluate how the forecasting model would perform in real production, TIM creates so called Raw and Aggregated predictions. You can learn more about them here. If you want to perform a backtest not following the typical scheme "build model on one set of data, predict on another", but wish to use RTInstantML right away, you can change the "backtestLength" parameter. This triggers your model building to ignore last "backtestLength" samples and calculate your predictions on those instead.

configuration:
  usage:
    backtestLength: 300

Low Quality Predictions (Forecasting Only)

Sometimes modelZoo doesn't contain models that can give you a reasonable prediction. This might be caused by not providing enough data, or missing values. In this case the set of dummy models is used. Quality of their predictions tends to be lower. To store information about which of the timestamps were forecast this way the JSON has a corresponding field.

"lowPredictionQualityDateTimes": [
      "2017-01-01T00:00:00.000Z",
      "2017-01-01T01:00:00.000Z",
      "2017-01-01T02:00:00.000Z",
      "2017-01-01T03:00:00.000Z"

]

Anomaly Indicator (Anomaly Detection Only)

When you want to detect anomalies in data it is not sufficient to just distinguish data being anomalous or not for certain point in time. Anomaly indicator is designed so that it can answer also the question "how much is this data point anomalous?".

It is a number in interval (0, infinity) returned for each data point specified in model building and anomaly detection datetime ranges (except small amount of data points in the beginning of each range where detection can't be done because of model offsets). Threshold is in the number 1 - if the indicator is below or equal to 1 we say the data point is not anomalous, if it is above we say it is anomalous. The higher the number the more anomalous that particular data point is.

Anomaly indicator is closely related to Sensitivity. By selecting sensitivity 'x' you are basically saying that you expect 'x'% of anomalies on model building datetime ranges which causes the anomaly indicator to exceed the threshold on exactly 'x'% of these ranges. Anomaly indicator with such sensitivity is then used for detecting anomalies on anomalyDetection ranges - here in general higher sensitivities will result in anomaly indicator exceeding the threshold more often than sensitivities closer to 0.

"anomalyIndicator": {
    "values": {
      "2015-03-02T12:00:00.000Z": 0.860717,
      "2015-03-02T12:10:00.000Z": 0.855001,
      "2015-03-02T12:20:00.000Z": 0.850813,
      "2015-03-02T12:30:00.000Z": 0.853215,
      "2015-03-02T12:40:00.000Z": 0.856282,
      "2015-03-02T12:50:00.000Z": 0.850461
}
}