Electricity load

Problem Description

Forecasting electricity load is critical to many companies. For instance, it is fundamental input to operations of transmission system operators (TSOs), it is also important for industrial producers to balance their decisions on electricity procurement.

Data Recommendation Template

Load can be influenced by many factors, depending on prediction horizon, level of granularity etc. Let’s say that we want to predict load on hourly basis for particular region, or even whole country, for the next day.

In such case understanding past consumption, socio-economic factors, and/or weather parameters that influences human/industrial activities are valid factors to be considered. From meteo predictors temperature, irradiation and cloudiness are recommended.

TIM can derive various information from timestamp in dataset – day of week, the fact if it is weekend, month. It cannot however understand if certain day was holiday, and so it is useful to provide such information within the dataset.

TIM Setup

TIM requires no setup of mathematical internals and works well with the default values. It is often the case that there are some seasonal patterns in time series, usually daily, monthly or yearly. These will be detected by TIM engine automatically.

Demo using Python API Client

Set up Python Libraries

In [2]:
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json

import tim_client

Credentials and logging

(Do not forget to fill in your credentials in the credentials.json file)
In [3]:
with open('credentials.json') as f:
    credentials_json = json.load(f)                     # loading the credentials from credentials.json

TIM_URL = 'https://timws.tangent.works/v4/api'          # URL to which the requests are sent

SAVE_JSON = False                                       # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/'                            # folder where the requests and responses are stored

LOGGING_LEVEL = 'INFO'
In [4]:
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
In [5]:
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)

api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
[INFO] 2020-10-28 16:41:07,436 - tim_client.api_client:save_json:74 - Saving JSONs functionality has been disabled
[INFO] 2020-10-28 16:41:07,440 - tim_client.api_client:json_saving_folder_path:89 - JSON destination folder changed to logs

Specify configuration

To let TIM know that we want to forecast one day ahead we can set the "prediction to" to 1 day (or 24 hours / samples alternatively). We want our model to be built using data range of 2012-01-01 00:00:00 - 2013-12-31 23:00:00. The rest (range 2014-01-01 00:00:00 - 2014-12-31 23:00:00 = last 8736 samples) we want to leave out to be used for the validation. To achieve that we can set the "backtest length" to 8736. To get better insights from our model we will also want extended importance and prediction intervals to be returned.

In [6]:
configuration_backtest = {
    'usage': {                                 
        'predictionTo': { 
            'baseUnit': 'Day',                # units that are used for specifying the prediction horizon length (one of 'Day', 'Hour', 'QuarterHour', 'Sample')
            'offset': 1                       # number of units we want to predict into the future (24 hours in this case)
        },
        'backtestLength': 8736                 # number of samples that are used for backtesting (note that these samples are excluded from model building period)
    },
    "predictionIntervals": {
        "confidenceLevel": 90                  # confidence level of the prediction intervals (in %)
    },
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True      # flag that specifies if the importances of features are returned in the response
    }
}

Data description

Dataset used in this example has hourly sampling rate and contains data from 2012-01-01 to 2014-12-31.

Target

Target variable represents hourly electricity load.

Predictor candidates

Following predictor candidates are used: temperature, cloudiness, irradiation, public holidays (as a binary value).

Forecasting scenario

We simulate a day ahead scenario – each day at 23:00 we would want to forecast target one whole day into the future. We assume that values of all predictors are available till the end of the next day (the end of the prediction horizon). This means that predictors’ data columns are a combination of actual values and forecast values. The last value of the target is from the previous day. To let TIM know that this is how it would be used in the production we can simply use the dataset in a form that would represent a real situation (as can be seen in the view below - notice the NaN values representing missing data for the following day we wish to forecast).

In [7]:
data = tim_client.load_dataset_from_csv_file('data.csv', sep=',')                                  # loading data from data.csv
data                                                                                               # quick look at the data
Out[7]:
Date Load Temperature Cloudiness Irradiation PublicHolidays
0 2012-01-01 00:00:00 14873.3 -1.3 4.1 0.2 1
1 2012-01-01 01:00:00 14253.3 -1.4 5.0 0.2 1
2 2012-01-01 02:00:00 13510.0 -1.4 5.0 0.2 1
3 2012-01-01 03:00:00 13053.5 -1.4 5.0 0.2 1
4 2012-01-01 04:00:00 12759.5 -0.6 5.9 0.2 1
... ... ... ... ... ... ...
26299 2014-12-31 19:00:00 NaN -0.6 5.3 0.1 0
26300 2014-12-31 20:00:00 NaN -0.6 5.3 0.1 0
26301 2014-12-31 21:00:00 NaN -0.6 5.3 0.1 0
26302 2014-12-31 22:00:00 NaN -0.3 7.5 0.1 0
26303 2014-12-31 23:00:00 NaN -0.3 7.5 0.1 0

26304 rows × 6 columns

Run TIM

In [8]:
backtest = api_client.prediction_build_model_predict(data, configuration_backtest)                 # running the RTInstantML forecasting using data and defined configuration
backtest.status                                                                                    # status of the job
Out[8]:
'Finished'

Visualize backtesting

In [9]:
fig = plt.subplots.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.02)      # plot initialization

fig.add_trace(go.Scatter(x = data.loc[:, "Date"], y=data.loc[:, "Load"],
                         name = "target", line=dict(color='black')), row=1, col=1)              # plotting the target variable

fig.add_trace(go.Scatter(x = backtest.prediction.index, 
                         y = backtest.prediction.loc[:, 'Prediction'],
                         name = "production forecast", 
                         line = dict(color='purple')), row=1, col=1)                            # plotting production prediction

fig.add_trace(go.Scatter(x = backtest.prediction_intervals_upper_values.index,
                         y = backtest.prediction_intervals_upper_values.loc[:, 'UpperValues'],
                         marker = dict(color="#444"),
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)                           
fig.add_trace(go.Scatter(x = backtest.prediction_intervals_lower_values.index,
                         y = backtest.prediction_intervals_lower_values.loc[:, 'LowerValues'],
                         fill = 'tonexty',
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)                                     # plotting confidence intervals

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[0]['values'].index, 
                         y = backtest.aggregated_predictions[0]['values'].loc[:, 'Prediction'],
                         name = "in-sample MAPE: " + str(round(backtest.aggregated_predictions[0]['accuracyMetrics']['MAPE'], 2)), 
                         line=dict(color='goldenrod')), row=1, col=1)                           # plotting in-sample prediction

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[1]['values'].index, 
                         y = backtest.aggregated_predictions[1]['values'].loc[:, 'Prediction'],
                         name = "out-of-sample MAPE: " + str(round(backtest.aggregated_predictions[1]['accuracyMetrics']['MAPE'], 2)), 
                         line = dict(color='red')), row=1, col=1)                               # plotting out-of-sample-sample prediction

fig.add_trace(go.Scatter(x = data.loc[:, "Date"], y=data.loc[:, "Temperature"],
                         name = "Temperature", line=dict(color='forestgreen')), row=2, col=1)   # plotting the predictor Temperature

fig.update_layout(height=600, width=1000, 
                  title_text="Backtesting, modelling difficulty: " 
                  + str(round(backtest.data_difficulty, 2)) + "%" )                             # update size and title of the plot

fig.show()

Visualize predictor and feature importances

In [10]:
simple_importances = backtest.predictors_importances['simpleImportances']                                                                # get predictor importances
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True)                                           # sort by importance
extended_importances = backtest.predictors_importances['extendedImportances']                                                            # get feature importances
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True)                                       # sort by importance

si_df = pd.DataFrame(index=np.arange(len(simple_importances)), columns = ['predictor name', 'predictor importance (%)'])                 # initialize predictor importances dataframe
ei_df = pd.DataFrame(index=np.arange(len(extended_importances)), columns = ['feature name', 'feature importance (%)', 'time', 'type'])   # initialize feature importances dataframe
In [11]:
for (i, si) in enumerate(simple_importances):
    si_df.loc[i, 'predictor name'] = si['predictorName']                   # get predictor name
    si_df.loc[i, 'predictor importance (%)'] = si['importance']            # get importance of the predictor
    
for (i, ei) in enumerate(extended_importances):
    ei_df.loc[i, 'feature name'] = ei['termName']                          # get feature name
    ei_df.loc[i, 'feature importance (%)'] = ei['importance']              # get importance of the feature
    ei_df.loc[i, 'time'] = ei['time']                                      # get time of the day to which the feature corresponds
    ei_df.loc[i, 'type'] = ei['type']                                      # get type of the feature
In [12]:
si_df.head()                                                               # predictor importances data frame
Out[12]:
predictor name predictor importance (%)
0 Load 63.73
1 PublicHolidays 20.65
2 Temperature 9.91
3 Irradiation 4.77
4 Cloudiness 0.95
In [13]:
fig = go.Figure(go.Bar(x=si_df['predictor name'], y=si_df['predictor importance (%)']))      # plot the bar chart
fig.update_layout(height=400,                                                                # update size, title and axis titles of the chart
                  width=600, 
                  title_text="Importances of predictors",
                  xaxis_title="Predictor name",
                  yaxis_title="Predictor importance (%)")
fig.show()
In [14]:
ei_df.head()                                                               # first few of the feature importances
Out[14]:
feature name feature importance (%) time type
0 Load(t-1) 22.56 00:00:00 TargetAndTargetTransformation
1 Load(t-2) & Load(t-22) 22.13 01:00:00 Interaction
2 Load(t-23) & EMA_Load(t-3, w = 3) 19.6 02:00:00 Interaction
3 Load(t-24) & Load(t-16) 18.92 03:00:00 Interaction
4 Load(t-24) 18.21 23:00:00 TargetAndTargetTransformation
In [15]:
time = '12:00:00'                                                                            # time for which the feature importances are visualized
fig = go.Figure(go.Bar(x=ei_df[ei_df['time'] == time]['feature name'],                       # plot the bar chart
                       y=ei_df[ei_df['time'] == time]['feature importance (%)']))
fig.update_layout(height=700,                                                                # update size, title and axis titles of the chart
                  width=1000,
                  title_text="Importances of features (for {})".format(time),
                  xaxis_title="Feature name",
                  yaxis_title="Feature importance (%)")
fig.show()