An experiment is the working point in the TIM platform, where users will find the core of the analytics. Each experiment is focused on a single type of analytics: either forecasting or anomaly detection.
The experiments overview¶
As experiments can be found a single level deep inside use cases, the experiments overview page is the same page as the use case detail page. More information on the experiments overview page can be found in the section on the use case in detail.
The experiment in detail¶
In an experiment's detail page (jobs overview page), all of the information regarding the experiment can be found. That includes the name, description and type (forecasting or anomaly detection), as well as the dataset and dataset version it revolves around and information on all of the iterations (jobs) it contains.
The Iterations table provides an overview of all the jobs that are contained in the experiment. For each job (each row in the table), important configuration settings (in-sample and out-of-sample ranges, the forecasting horizon for forecasting jobs, and the job type for anomaly detection jobs) are displayed, as well as the job's status. If the job has been executed successfully (status Finished or Finished With Warning) its aggregated MAPE (Mean Average Percentage Error) is shown too. Clicking a job that has been executed successfully opens up that job's results.
That is not all; more on the specifics of this page follows below.
From this page, a user can edit or delete the experiment. From the iterations table, a user can delete a job/an iteration.
Editing: Editing an experiment allows the user to update its name and description.
Deleting: Be careful with deleting an experiment: deleting an experiment will also permanently delete any data contained in this experiment, including the ML jobs.
Deleting a job/an iteration: Deleting a job/an iteration will delete its
Opening an experiment: a clean slate¶
Within an experiment, users get to explore the data, again, and can browse through configuration options. An overview of the configuration options can be found in the dedicated sections on the configuration for forecasting and the configuration for anomaly detection. The user can also inspect the variables' availabilities relative to the target or KPI similar to how this can be done on the data availability component on a dataset's detail page
Once the desired settings are selected, an ML request can be triggered and TIM Studio will show how the calculation progresses. This request involves the creation of a new job, which will be added to the Iterations table.
The component visualizing the dataset version used in the experiment, along with some of its details, can be expanded to a larger card additionally displaying the availability of all variables in the dataset relative to that of the target or KPI. The card allows the user to adjust the desired version of the dataset and the intended target or KPI variable. Following this, TIM Studio automatically determines an appropriate scale for looking at the availabilities (in the example image below, the scale is set to Days even though the dataset is sampled hourly); it is however possible to manually adjust this to a user's specific needs.
Below this scale, each variable present in the dataset is displayed together with a time axis indicating relative availabilities. The availability of the target or KPI variable (always 0) is indicated by the blue vertical mark. This way each variable's relative availability can easily be read: for example, the variable called Windspeed is available until one day before the end of the target or KPI variable, while the variable called Hum_p is available for two days after the end of the target or KPI variable. The exact relative availability of each variable is also displayed: this way it's easy to check that Windspeed's availability is indeed exactly 24 hours less than that of the target or KPI, and Holiday, which seemingly goes on into the future based on the time axis, is available for 264 samples or hours (11 days) past the end of the target (or KPI) variable Cnt.
Timescale and aggregation¶
In the header section of the line chart card, a collapsed button can be found labeled Timescale. This button provides a summary of the sampling period (in the example below, 3600 seconds or 1 hour) and aggregation (in the example below, mean) of the data.
TIM makes it possible to adjust these settings, and scale and aggregate the data to the specific needs of the challenge a user is focusing on. A potential use case here can be found in sales data, where sales may be measured hourly, but a daily forecast is desired; scaling the data to a daily frequency while aggregation by summation before starting the forecast would achieve this.
Timescaling can happen on set amounts of base units, with available base units being day, hour, minute and second. Aggregation is available by mean, sum, minimum and maximum. The aggregation that is set relates to the target variable. By default, numerical variables are aggregated by mean and boolean variables are aggregated by maximum.
Any timescale and aggregation that is configured, will also take effect on model building and model application, and thus on the results of a forecasting or anomaly detection job.
The results of a job¶
After the model has been built and applied (or when browsing to a job that has already been executed), it’s time to examine the results. Users can get insights into the models that were used and review the performance.
By default, the predictor importances are shown, visualized in a treemap. This visualization represents the extent to which each predictor (each input variable) contribute to the models and thus the forecast or detection. The color scheme of the treemap matches the colors of the predictors in the line chart; hovering over the treemap will give the user more information about the relevant predictors and their importance.
Flipping the switch above the treemap changes to that of the feature importances. This visualization represents the extent to which each feature (generated transformation of input variables and other data) contribute to the model and thus the forecast or detection. If applicable, a slider bar will appear allowing users to browse through the different models used in calculating the requested forecast or detection. Again, the color scheme of the treemap matches the colors of the predictors in the line chart; this time a single feature may have multiple colors, meaning it represents an interaction between multiple (input or artificial) variables. Again, hovering over one of the treemaps will give the user more information about the relevant features and their importance.
The job's configuration¶
Users can adjust the configuration where needed by iterating over the job. Therefore, it is relevant to be able to review a particular job's configuration if it has already been successfully executed. The same configuration cards that are used to set a job's configuration when creating a job can be used to review an existing job's configuration. This time the settings will be disabled from changes, to avoid confusion as to what a user is looking at (the configuration used for an existing job versus the configuration to be used for a new job).
The process of iterating¶
After the results of a job have been examined, a user may want to continue experimenting and iterate over that job to find ways to improve the results. The Iterate button allows the user to do just that. When clicked, it will bring the user back to the clean slate state of the experiment, with one exception: the configuration will remain as it was in the previously looked at job. Therefore, the user does not have to restart from the default configuration when setting the configuration; they can adjust the previous configuration as desired.