Portfolio Forecasting And Adapting To Changes
Many tasks across different industries involve forecasting a behavior of a portfolio that is composed of different individual assets. An example could be a portfolio of photo-voltaic farms that are located at different locations or a portfolio of different gas stations all around the country. The struggle with datasets like these is not only that different parts of the portfolio might behave completely differently (e.g. one solar farm is on top of the hill where snow plays a huge role), but also changes in the portfolio composition itself (e.g. contracting new gas stations). This section explains how to use TIM to solve these challenges.
Modelling each component separately¶
Imagine you own 3 different gas stations across the country. We will call them a, b and c. Your aggregated daily profit can then be expressed as a+b+c. Then, starting new year, you decide to sell gas station c and buy a new one called d. Now your profit time series changes to a+b+d. However, if you previously used TIM to model this aggregated signal, it may now be inaccurate, because it learned on a different portfolio signal than it is supposed to forecast now. That is why it makes sense to use TIM to model behavior of each of the gas stations separately. In this situation you would have two reliable models for a and b, one that you do not need any more for c and one that would incrementally learn using new data coming in every day for the gas station d (availability of any historical data for d is obviously advantageous). The new situation is modelled using a, b, and d signals with c being dropped.
It is worth mentioning that this approach may not be suitable if historical data of the new individual components are not available at the time when they join the portfolio and their behavior is expected to be highly seasonal or the data are in a low sampling rates (weeks, months, etc.). Any modelling efforts are then based on the new incrementally growing data only which is unlikely to be sufficient for capturing seasonality or other complex patterns.
There are also cases, when the information about separate components of the portfolio is missing entirely. Next sections describes how to approach such situations.
Retraining often while using a portfolio size predictor¶
One of TIM features is its ability to incorporate even small amounts of new in a new model when rebuilding (e.g. 1 year of hourly data is augmented with an additional day). The new model structure re-organizes so that it can incorporate the new information whilst keeps the desired numerical robustness and stability. With portfolios, even if separate components are not available for modelling, there is often a predictor called portfolio size which in some way indicates changes in the portfolio. An example might be a sum of maximum capacities of solar farms or a sum of average attendances in different cinemas. In this example we will work with a portfolio of different electricity consuming households.
Dataset used in this example has hourly sampling rate and contains data from 2017-02-16 05:00:00 to 2019-02-17 01:00:00. Prepared dataset can be downloaded here.
Target variable is an aggregation of different electricity consuming households.
Meteorological data, public holidays and portfolio size tracking variable.
We simulate 2-days ahead scenario – each day at midnight we want to forecast target 48 samples (hours) into the future. We assume all predictors are known for this horizon.
Model building and validation¶
The figure above shows the portfolio signal in blue and the portfolio size predictor in brown (in unit normal scaling so that both signals can visually be compared). It is easy to observe that the portfolio changed rapidly since 2017 with the biggest change happening in the beginning of 2019. Training our model on all data until the end of 2018 and forecasting following two days produce inaccurate results (forecast in orange):
TIM also returns a warning message (e.g. the portfolio size predictor and target jump out of their in-sample bounds)
Major improvement is observed if only the data from 1st of January 2019 are already included in the in-sample training.
After couple of days, the accuracy is top-notch once again. This kind of re-training is even more convenient when using RTInstantML. When dealing with signals like these it is therefore important to not judge TIM's performance by looking at the accuracy from the typical training/testing split perspective but rather by trying a real time retraining pipeline instead.
What to do when the first two options are not feasible¶
In cases where decomposition of the portfolio signal is not available and nor is portfolio size predictor, re-training still significantly improves performance. What can also be considered is toying with the training length and including only the most recent data that still cover all important seasonality (e.g. only last year of data).