Portfolio forecasting and adapting to changes

Problem description

Many tasks across different industries involve forecasting a behavior of a portfolio that is composed of different individual assets. An example could be a portfolio of photo-voltaic farms that are located at different locations or a portfolio of different gas stations all around the country. The struggle with datasets like these is not only that different parts of the portfolio might behave completely differently (e.g. one solar farm is on top of the hill where snow plays a huge role), but also changes in the portfolio composition itself (e.g. contracting new gas stations). This section explains how to use TIM to solve these challenges.

Modelling each component separately

Imagine you own 3 different gas stations across the country. We will call them a, b and c. Your aggregated daily profit can then be expressed as a+b+c. Then, starting new year, you decide to sell gas station c and buy a new one called d. Now your profit time series changes to a+b+d. However, if you previously used TIM to model this aggregated signal, it may now be inaccurate, because it learned on a different portfolio signal than it is supposed to forecast now. That is why it makes sense to use TIM to model behavior of each of the gas stations separately. In this situation you would have two reliable models for a and b, one that you do not need any more for c and one that would incrementally learn using new data coming in every day for the gas station d (availability of any historical data for d is obviously advantageous). The new situation is modelled using a, b, and d signals with c being dropped.

It is worth mentioning that this approach may not be suitable if historical data of the new individual components are not available at the time when they join the portfolio and their behavior is expected to be highly seasonal or the data are in a low sampling rates (weeks, months, etc.). Any modelling efforts are then based on the new incrementally growing data only which is unlikely to be sufficient for capturing seasonality or other complex patterns.

There are also cases, when the information about separate components of the portfolio is missing entirely. Next sections describes how to approach such situations.

Retraining often while using a portfolio size predictor

One of TIM features is its ability to incorporate even small amounts of new in a new model when rebuilding (e.g. 1 year of hourly data is augmented with an additional day). The new model structure re-organizes so that it can incorporate the new information whilst keeps the desired numerical robustness and stability. With portfolios, even if separate components are not available for modelling, there is often a predictor called portfolio size which in some way indicates changes in the portfolio. An example might be a sum of maximum capacities of solar farms or a sum of average attendances in different cinemas. In this example we will work with a portfolio of different electricity consuming households.

What to do when the first two options are not feasible

In cases where decomposition of the portfolio signal is not available and nor is portfolio size predictor, re-training still significantly improves performance. What can also be considered is toying with the training length and including only the most recent data that still cover all important seasonality (e.g. only last year of data).

Data Recommendation Template

In this solution template we will describe the approach of retraining often while using a portfolio size predictor. Therefore target used in this example is aggregation of different electricity consuming households and dataset contains portfolio size tracking variable as predictor. There are also other predictors as meteorological data and public holidays.

TIM Setup

TIM requires no setup of its mathematical internals and works well in the business user mode. All that is required from a user is to let TIM know the desired prediction horizon. Since we would like to observe the retraining effects we will be changing the backtest length.

Demo using Python API Client

Set up Python Libraries

In [1]:
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json

import tim_client

Credentials and logging

(Do not forget to fill in your credentials in the credentials.json file)
In [2]:
with open('credentials.json') as f:
    credentials_json = json.load(f)                     # loading the credentials from credentials.json

TIM_URL = 'https://timws.tangent.works/v4/api'          # URL to which the requests are sent

SAVE_JSON = False                                       # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/'                            # folder where the requests and responses are stored

LOGGING_LEVEL = 'INFO'
In [3]:
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
In [4]:
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)

api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
[INFO] 2020-11-06 12:57:02,102 - tim_client.api_client:save_json:74 - Saving JSONs functionality has been disabled
[INFO] 2020-11-06 12:57:02,105 - tim_client.api_client:json_saving_folder_path:89 - JSON destination folder changed to logs

Specify configuration

We would like to forecast one day ahead, therefore we will set the "prediction to" to 1 day. At first, model will be built using a range between 2017-02-20 00:00:00 and 2018-12-31 23:00:00. Out-of-sample forecasts will be made on the rest - from 2019-01-01 00:00:00 to 2019-01-10 23:00:00 (the last 240 samples). To get better insights from our model we will also want extended importance.

In [5]:
configuration_backtest = {
    'usage': {                                 
        'predictionTo': { 
            'baseUnit': 'Day',                 # units that are used for specifying the prediction horizon length (one of 'Day', 'Hour', 'QuarterHour', 'Sample')
            'offset': 1                        # number of units we want to predict into the future (24 hours in this case)
        },
        'backtestLength': 10*24                 # number of samples that are used for backtesting (note that these samples are excluded from model building period)
    },
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True      # flag that specifies if the importances of features are returned in the response
    }
}

Then we will move 2 days further, these two days will be added to training so model will be built using a range between 2017-02-20 00:00:00 and 2019-01-02 23:00:00. Out-of-sample forecasts will be made on the rest - from 2019-01-03 00:00:00 to 2019-01-10 23:00:00 (the last 240 samples). To get better insights from our model we will also want extended importance.

In [6]:
configuration_backtest_2 = {
    'usage': {                                 
        'predictionTo': { 
            'baseUnit': 'Day',                 # units that are used for specifying the prediction horizon length (one of 'Day', 'Hour', 'QuarterHour', 'Sample')
            'offset': 1                        # number of units we want to predict into the future (24 hours in this case)
        },
        'backtestLength': 8*24                 # number of samples that are used for backtesting (note that these samples are excluded from model building period)
    },
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True      # flag that specifies if the importances of features are returned in the response
    }
}

Data description

Dataset used in this example has hourly sampling rate and contains data from 2017-02-20 00:00:00 to 2019-01-11 23:00:00. Target ends at 2019-01-10 23:00:00.

Target

Target variable is an aggregation of different electricity consuming households.

Predictor candidates

Meteorological data, public holidays and portfolio size tracking variable.

Forecasting scenario

We simulate 1-day ahead scenario – each day at midnight we want to forecast target 24 samples (hours) into the future. We assume all predictors are known for this horizon.

In [7]:
data = tim_client.load_dataset_from_csv_file('data.csv', sep=';')                                  # loading data from data.csv
data                                                                                               # quick look at the data
Out[7]:
Date Consumption PublicHolidays Temperature Clouds Windspeed PortfolioSize
0 2017-02-20 00:00:00 32644.0 0 7.4 35 5.60 1350.561736
1 2017-02-20 01:00:00 34186.0 0 7.2 40 5.60 1350.561736
2 2017-02-20 02:00:00 35834.0 0 7.6 40 8.40 1350.561736
3 2017-02-20 03:00:00 40296.0 0 7.6 40 7.00 1350.561736
4 2017-02-20 04:00:00 55788.0 0 7.5 40 7.00 1350.561736
... ... ... ... ... ... ... ...
16579 2019-01-11 19:00:00 NaN 0 6.1 40 5.60 4515.571918
16580 2019-01-11 20:00:00 NaN 0 6.0 40 5.74 4515.571918
16581 2019-01-11 21:00:00 NaN 0 5.8 40 4.20 4515.571918
16582 2019-01-11 22:00:00 NaN 0 5.6 40 4.20 4515.571918
16583 2019-01-11 23:00:00 NaN 0 6.0 40 5.60 4515.571918

16584 rows × 7 columns

In [8]:
fig = plt.subplots.make_subplots(rows=1, cols=1, shared_xaxes=True, vertical_spacing=0.02)      # plot initialization

fig.add_trace(go.Scatter(x = data.loc[:, "Date"], y=data.loc[:, "Consumption"],
                         name = "target", line=dict(color='black')), row=1, col=1)              # plotting the target variable

fig.add_trace(go.Scatter(x = data.loc[:, "Date"], 
                         y=100*(data.loc[:, "PortfolioSize"] - min(data.loc[:, "PortfolioSize"])) + min(data.loc[:, "Consumption"]),
                         name = "PortfolioSize", line=dict(color='blue')), row=1, col=1)        # plotting the portfolio size variable
[INFO] 2020-11-06 12:57:03,306 - numexpr.utils:_init_num_threads:141 - NumExpr defaulting to 8 threads.