Skip to main content

Glossary

TermDefinition
AnomalyIn data mining, anomaly detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.
Automated Machine LearningAutomated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning.
BacktestingAct of building your model on a set of In-Sample historical data and evaluating it on the Out-of-Sample historical data to get a feel for how the model would work in the real production.
Business caseBusiness cases are created to help decision-makers ensure that the proposed initiative will deliver value compared to alternative initiatives based on the objectives and expected benefits laid out in the business case. The performance indicators are defined to be used for evaluation of desired outcome.
DatasetA tabular structure containing Target and Predictor time-series in columns. Individual rows contain the data for various time stamps.
Data availabilityIn real life, not all values in data set are available for all timestamps. Typically, there is delay of getting actual values for target column, some predictors can contain predicted values up until end of prediction horizon (e.g. weather forecast, or information about public holidays), some are lagging.
DictionarySet of transformations of the same type used for feature generation. In the process of expansion, TIM creates many new features from original variables (predictors and/or target) to enhance the final model's performance.
Equidistant time seriesTime series sampled at a constant rate.
ExperimentWhen data scientist changes individual settings for model building definitions in order to get better performance of models built under it.
FeatureIn the process of expansion, TIM creates many new features from original variables to enhance the final model's performance. This is done through sets of common transformations (dictionaries) resulting in new features. After model building, the different features used in the model can be observed in the model's tree map.
Feature engineeringThe process of feature extraction from data to improve the performance of machine learning model.
ForecastValues calculated for future timestamps. Calculation is based on data set and model.
ForecastingForecasting is the process of making predictions of the future based on past and present data and most commonly by analysis of trends.
Forecasting scenarioSee Forecasting situation.
Forecasting situationData availability is very closely related to forecasting situation which is described by the following parameters: Timestamp at which you make forecast - this is the timestamp for which the last target value is available; Availability of data for each predictor with respect to the last target timestamp; Prediction horizon - how many steps ahead from the last target timestamp you are predicting.
In-SampleInterval of data set used for model building.
Key Performance Indicator (KPI)In anomaly detection, the variable that represents the main indicator of the system.
Math settingsParameters to TIM Engine that influence which transformations are used and how they are used.
ModelA model is a representation of what machine learning system learned from data. It is a structure that is able to solve tasks such as prediction, classification, anomaly detection etc.
Model ZOOSet of different models working for different availability of data.
Out-Of-SampleInterval of data set that was not used for model building. Performance of a model is typically measured on out-of-sample data.
PredictorA variable in data set that helps to explain variance of Target variable. Model features are derived from predictors.
Predictor candidateA variable in data set that may or may not help to explain variance of Target variable. It will be decided by TIM Engine during model building whether it will be used for model features.
Prediction horizonIs a parameter that tells how many samples ahead to forecast, ranging from seconds ahead until months or years ahead, depending on your data and use case.
Prediction startRepresents first point of a forecast (prediction).
Predictive analyticsPredictive analytics is the branch of advanced analytics which is used for to make predictions about unknown events in the future. It typically uses many techniques from data mining, statistics, modeling, machine learning and AI to analyze current data and create forecasts of the future.
Predictor availabilityThe difference between prediction start and the most recent value of a predictor.
Prescriptive analyticsPrescriptive analytics is the area of business analytics dedicated to finding the best course of action for a given situation. Prescriptive analytics is related to both descriptive and predictive analytics. While descriptive analytics aims to provide insight into what has happened and predictive analytics helps model and forecast what might happen, prescriptive analytics seeks to determine the best solution or outcome among various choices, given the known parameters.
REST APIREST or RESTful API design (Representational State Transfer) is designed to take advantage of existing protocols. While REST can be used over nearly any protocol, it usually takes advantage of HTTP(S) when used for Web APIs.
SampleOne record (row) in data set.
Sampling rateNumber of samples of equidistant time series per unit of time.
Sampling periodTime difference between two consecutive samples of equidistant time series.
TargetIn forecasting, the variable to be predicted. In classification, it is a variable to be classified. Equivalent used in anomaly detection tasks is KPI.
Target availabilityThe difference between prediction start and the most recent value of a target variable.
TimestampA sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second.
TIM Engine APIREST API that provides interface to the TIM model building engine.
TIM StudioTIM Studio is web application that offers an intuitive interface to TIM Engine. It allows users to: organize and explore datasets, experiment iteratively, inspect models, organize work with Use Cases.
Time seriesA time-series is a sequence of observations, usually ordered in time. Examples of time-series in a few domains - Meteorology: weather variables like temperature, pressure, wind; Economy and finance: GDP, stock price values, exchange rate spread; Industry: electric load, power consumption, voltage, sensors; Biomedicine: physiological signals (EEG), heart-rate, patient temperature, etc.
Training regionInterval of data set used for model building. Synonymous with In-Sample interval.
Training resultsTypically values of evaluation metrics for Out-of-Sample and In-Sample intervals.
TransformationMathematical operation applied to initial variables in order to get new features and extract more information from the data.
VariableData in data set are organized in tabular form with one Timestamp column, one Target column and/or Predictor columns. Vector can be also used instead of column in mathematical terminology. Variable is either target vector, or predictor vector without any transformations applied to it.