As explained in the section on date and time formatting, the TIM Platform complies to the ISO 8601 standard. That includes the possibility to send time zone-specific timestamps in datasets and as parameters in requests to TIM.
In time series analysis, correctly handling time zones is important, as it can impact results and thus conclusions. The examples that make up the rest of this section illustrate the implications of incorrectly working with time zones and serve as a guideline of how TIM works with time zones.
Timescale with time zones¶
Using TIM's transformation functionalities such as timescale and aggregation increases the importance of correctly handling time zones, specifically when working with sampling periods shorter than one day.
The data shown below, is uploaded as if it was collected in the UTC time zone.
Looking at the end of the dataset in the line chart below, it's clear to see that the target variable ends on March 23rd, 2007 at 00:00. Since this line chart displays CET local time, this means March 22nd, 23:00 in UTC time.
The table below displays the end of this dataset's target variable in a tabular format, indeed showing March 22nd, 23:00.
The TIM Platform uses the ISO 8601 standard and allows the user to specify the time zone in which the data they upload is expressed. When no time zone is specified, TIM assumes the data is to be in the UTC time zone. Imagine the above data was actually collected in the CET time zone, but the user forgot to configure this during data upload. That means the data is off by one hour, and should actually end on March 22nd, - 2007 at 22:00. Uploading this same data while correctly indicating it belongs to the CET time zone indeed shifts the data by one hour, as visualized in the graph below. This graph is again expressed in CET local time, thus it indicating March 22nd, 2007, 23:00 translates to 22:00 in UTC time.
Again, the table below displays the end of this dataset's target variable in a tabular format, indeed showing March 22nd, 22:00.
This might seem like a minor issue, but it can have more implications than one might expect at first. When timescaling data - for example as described above, from hourly to daily - TIM takes these time zones into account, and effectively groups data that belong to a single day in their time zone.
Timescaling the first dataset, the one that assumes the UTC time zone, (with default aggregation) groups the data as shown below. The last full day of data, March 22nd, 2007, is represented by a consumption value of 54 992.45.
Applying this same configuration (timescaling from hourly to daily with default aggregation) to the second dataset, the one dat assumes to CET time zone, groups the data as shown below. In this case, the last full day of data, again March 22nd, 2007, is represented by a consumption value of 55 607.86.
These differences can have a large impact on model building and results, especially when working with larger time zone differences and including predictors like public holidays.
Daily cycle models with time zones¶
This example starts from the UTC data from the example above. The results below show forecasting results of a forecast for the next 168 samples (thus 168 hours or 7 days) with default configuration, on a dataset that was uploaded as if it was collected in the UTC time zone. As displayed in the iterations table below the line chart, the overall in-sample MAPE of this result is 15.636.
Imagine again that the above data was actually collected in the CET time zone, but the user forgot to configure this during data upload. That means the data is off by one hour. After uploading this same data while correctly indicating it belongs to the CET time zone, executing the same forecasting job results in an overall in-sample MAPE of 13.810.
Where does this difference in MAPE come from? It's hard to say exactly, but several of the features TIM has to its disposal when building forecasting models are based on days somehow, and can thus be impacted by this shift.
This holds especially true for daily cycle models, which build a seperate model for each hour of the day that should be forecasted to. Additionally, mistakenly misinterpreting the time zone could also mean mistakenly looking at the wrong forecast (for the wrong time) when interpreting results. This is even more problematic, as it can mean that a model is being used for a specific hour of the day, while it is meant for an entirely different hour of the day.
The treemap below displays the predictor importances from the entire Model Zoo for the UTC forecasting job above. This shows for example that the features based on the target variable Consumption sum up to an importance of 1173.33.
The same treemap for the CET forecasting job above is shown below and again displays the predictor importances from the entire Model Zoo. In this case, the importance of the features based on the target variable Consumption sum up to 1093.18. This means that the importance of the target variable Consumption in the Model Zoo is higher when working as if the data belongs to the UTC time zone.