Portfolio asset solar production¶

Problem description¶

Sometimes especially Energy traders are interested only in production of the whole portfolio of the PV plants. In forecasting production of a portfolio of PV plants we should consider 2 scenarios:

production of each PV plant is available and production of the whole portfolio is sum of individual productions: In this case we recommend creating model for each power plant and portfolio using recommended data as discussed previously in section ‘Single asset solar production forecasting’ and forecasts of these models sum up to forecast of the portfolio. Our experiments showed, that in this way you can get best accuracy out of TIM. Also, it makes it easy to consider scheduled maintenance of specific PV plants on total portfolio production (just do not add forecasted production of the plant that is being maintained).
only production of the whole portfolio is available: In this case the key is to find an appropriate GPS coordinates that represent portfolio the best. This task doesn’t have an exact solution, therefore knowledge of the PV plants in portfolio is required. We recommend to take into account distances between individual PV plants and their installed capacities and try out different sets of GPS coordinates. Our best practice in finding GPS coordinates for meteo data is to do clustering on locations of PV plants weighted by their installed capacity. Centroids of these clusters we use as our GPS coordinates for meteo data. However, the number of clusters that get best results is still the question.

Data Recommendation Template¶

For each chosen GPS coordinate obtain meteorological predictors Gobal Horizontal Irradiance (GHI), Direct Normal Irradiance (DNI), Diffuse Irradiace (DIF), Temperature (TEMP), Sun Elevation (SE) and Sun Azimuth (SA) as described previously in section ‘Single asset solar production forecasting’.

TIM Setup¶

TIM requires no setup of TIM's mathematical internals and works well in business user mode. All that is required from a user is to let TIM know a forecasting routine and desired prediction horizon. TIM can automatically learn that there is no weekly pattern, in some cases, however, (e.g. short datasets) it can be difficult to learn this and therefore we recommend switching off the WeekRest dictionary.

Demo using Python API Client¶

Set up Python Libraries¶

import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json

import tim_client

Credentials and logging¶

(Do not forget to fill in your credentials in the credentials.json file)¶

with open('credentials.json') as f:
    credentials_json = json.load(f)                     # loading the credentials from credentials.json

TIM_URL = 'https://timws.tangent.works/v4/api'          # URL to which the requests are sent

SAVE_JSON = False                                       # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/'                            # folder where the requests and responses are stored

LOGGING_LEVEL = 'INFO'

level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)

credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)

api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER

[INFO] 2020-10-29 23:13:04,063 - tim_client.api_client:save_json:74 - Saving JSONs functionality has been disabled
[INFO] 2020-10-29 23:13:04,066 - tim_client.api_client:json_saving_folder_path:89 - JSON destination folder changed to logs

Specify configuration¶

In this example we will simulate a day ahead scenario. Each day at 10:15 we wish to have forecast for each hour up until the end of the next day - we will set the "predictionTo" to 38 samples. Model is built using a range between 2015-01-01 00:00:00 - 2015-12-31 23:00:00. Out-of-sample forecasts are made in the range between 2016-01-01 00:00:00 - 2016-06-29 23:00:00 (the last 4330 samples). To get better insights from our model we will also want extended importance and prediction intervals to be returned.

configuration_backtest = {
    'usage': {
        'predictionTo': {
            'baseUnit': 'Sample',                # units that are used for specifying the prediction horizon length (one of 'Day', 'Hour', 'QuarterHour', 'Sample')
            'offset': 38                       # number of units we want to predict into the future (24 hours in this case)
        },
        'backtestLength': 4330                 # number of samples that are used for backtesting (note that these samples are excluded from model building period)
    },
    "predictionIntervals": {
        "confidenceLevel": 90                  # confidence level of the prediction intervals (in %)
    },
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True      # flag that specifies if the importances of features are returned in the response
    }
}

Data description¶

Dataset used in this example is assembled from an individual PV plant in Central Europe. It has hourly sampling rate and contains data from 2015-01-01 to 2016-06-30.

Target¶

Target variable represents production of the PV plant. It is the second column in CSV file, right after column with timestamps. In this case name of the target is 'PV_obs'.

Predictor candidates¶

As meteo predictors we use GHI, DNI, DIF, TEMP, SA and SE, as discussed in section 'Data Recommendation Template'. In this demo we use historical actuals for both model building and out-of-sample forecasting.

Timestamp¶

Timestamp is in UTC+01:00 timezone and each value of the timestamp is the beginning of the period it corresponds to i.e. 'PV_obs' in the row with timestamp 2015-01-01 00:00:00 corresponds to production of PV plant during the period between 2015-01-01 00:00:00 and 2015-01-01 01:00:00.

Forecasting scenario¶

We simulate a day ahead scenario – each day at 10:00 we would want to forecast target one whole day into the future. We assume that values of all predictors are available till the end of the next day (the end of the prediction horizon). This means that predictors’ data columns are a combination of actual values and forecast values. The last value of the target is from 09:00. To let TIM know that this is how it would be used in the production we can simply use the dataset in a form that would represent a real situation (as can be seen in the view below - notice the NaN values representing the missing data for the following day we wish to forecast). In this demo data set, out-of-sample validation is performed using historical actuals of meteorological data. More representative validation may be obtained by using historical forecasts of meteorological data instead.

data = tim_client.load_dataset_from_csv_file('data.csv', sep=',')                                  # loading data from data.csv
data                                                                                               # quick look at the data

Run TIM¶

backtest = api_client.prediction_build_model_predict(data, configuration_backtest)                 # running the RTInstantML forecasting using data and defined configuration
backtest.status                                                                                    # status of the job

'Finished'

Visualize backtesting¶

fig = plt.subplots.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.02)      # plot initialization

fig.add_trace(go.Scatter(x = data.loc[:, "timestamp"], y=data.loc[:, "PV_obs"],
                         name = "target", line=dict(color='black')), row=1, col=1)              # plotting the target variable

fig.add_trace(go.Scatter(x = backtest.prediction.index,
                         y = backtest.prediction.loc[:, 'Prediction'],
                         name = "production forecast",
                         line = dict(color='purple')), row=1, col=1)                            # plotting production prediction

fig.add_trace(go.Scatter(x = backtest.prediction_intervals_upper_values.index,
                         y = backtest.prediction_intervals_upper_values.loc[:, 'UpperValues'],
                         marker = dict(color="#444"),
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)
fig.add_trace(go.Scatter(x = backtest.prediction_intervals_lower_values.index,
                         y = backtest.prediction_intervals_lower_values.loc[:, 'LowerValues'],
                         fill = 'tonexty',
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)                                     # plotting confidence intervals

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[1]['values'].index,
                         y = backtest.aggregated_predictions[1]['values'].loc[:, 'Prediction'],
                         name = "in-sample MAE: " + str(round(backtest.aggregated_predictions[1]['accuracyMetrics']['MAE'], 2)),
                         line=dict(color='goldenrod')), row=1, col=1)                           # plotting in-sample prediction

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[3]['values'].index,
                         y = backtest.aggregated_predictions[3]['values'].loc[:, 'Prediction'],
                         name = "out-of-sample MAE: " + str(round(backtest.aggregated_predictions[3]['accuracyMetrics']['MAE'], 2)),
                         line = dict(color='red')), row=1, col=1)                               # plotting out-of-sample-sample prediction

fig.add_trace(go.Scatter(x = data.loc[:, "timestamp"], y=data.loc[:, "GHI_1"],
                         name = "GHI_1", line=dict(color='forestgreen')), row=2, col=1)   # plotting the predictor GHI_1

fig.update_layout(height=600, width=1000,
                  title_text="Backtesting, modelling difficulty: "
                  + str(round(backtest.data_difficulty, 2)) + "%" )                             # update size and title of the plot

fig.show()

Visualize predictor and feature importances¶

simple_importances = backtest.predictors_importances['simpleImportances']                                                                # get predictor importances
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True)                                           # sort by importance
extended_importances = backtest.predictors_importances['extendedImportances']                                                            # get feature importances
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True)                                       # sort by importance

si_df = pd.DataFrame(index=np.arange(len(simple_importances)), columns = ['predictor name', 'predictor importance (%)'])                 # initialize predictor importances dataframe
ei_df = pd.DataFrame(index=np.arange(len(extended_importances)), columns = ['feature name', 'feature importance (%)', 'time', 'type'])   # initialize feature importances dataframe

for (i, si) in enumerate(simple_importances):
    si_df.loc[i, 'predictor name'] = si['predictorName']                   # get predictor name
    si_df.loc[i, 'predictor importance (%)'] = si['importance']            # get importance of the predictor

for (i, ei) in enumerate(extended_importances):
    ei_df.loc[i, 'feature name'] = ei['termName']                          # get feature name
    ei_df.loc[i, 'feature importance (%)'] = ei['importance']              # get importance of the feature
    ei_df.loc[i, 'time'] = ei['time']                                      # get time of the day to which the feature corresponds
    ei_df.loc[i, 'type'] = ei['type']                                      # get type of the feature

si_df.head()                                                               # predictor importances data frame

fig = go.Figure(go.Bar(x=si_df['predictor name'], y=si_df['predictor importance (%)']))      # plot the bar chart
fig.update_layout(height=400,                                                                # update size, title and axis titles of the chart
                  width=600,
                  title_text="Importances of predictors",
                  xaxis_title="Predictor name",
                  yaxis_title="Predictor importance (%)")
fig.show()

ei_df.head()                                                               # first few of the feature importances

time = '12:00:00'                                                                            # time for which the feature importances are visualized
fig = go.Figure(go.Bar(x=ei_df[ei_df['time'] == time]['feature name'],                       # plot the bar chart
                       y=ei_df[ei_df['time'] == time]['feature importance (%)']))
fig.update_layout(height=700,                                                                # update size, title and axis titles of the chart
                  width=1000,
                  title_text="Importances of features (for {})".format(time),
                  xaxis_title="Feature name",
                  yaxis_title="Feature importance (%)")
fig.show()

	timestamp	PV_obs	GHI_1	DNI_1	DIF_1	TEMP_1	SA_1	SE_1	GHI_2	DNI_2	...	DIF_4	TEMP_4	SA_4	SE_4	GHI_5	DNI_5	DIF_5	TEMP_5	SA_5	SE_5
0	2015-01-01 00:00:00	0.0	0	0	0	-7.6	-158.45	-63.08	0	0	...	0	-8.2	-159.46	-63.05	0	0	0	-7.3	-157.99	-62.99
1	2015-01-01 01:00:00	0.0	0	0	0	-7.5	-133.05	-57.47	0	0	...	0	-8.1	-133.91	-57.59	0	0	0	-7.1	-132.73	-57.33
2	2015-01-01 02:00:00	0.0	0	0	0	-7.3	-114.29	-49.23	0	0	...	0	-7.7	-114.95	-49.44	0	0	0	-6.9	-114.06	-49.06
3	2015-01-01 03:00:00	0.0	0	0	0	-7.3	-100.07	-39.77	0	0	...	0	-7.7	-100.59	-40.04	0	0	0	-6.9	-99.89	-39.60
4	2015-01-01 04:00:00	0.0	0	0	0	-7.3	-88.26	-29.90	0	0	...	0	-7.6	-88.70	-30.21	0	0	0	-6.9	-88.10	-29.74
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
13123	2016-06-30 19:00:00	NaN	18	51	14	23.1	123.73	2.41	7	0	...	11	21.4	123.38	2.71	18	54	14	22.4	123.91	2.31
13124	2016-06-30 20:00:00	NaN	0	0	0	18.7	135.21	-5.17	0	0	...	0	17.2	134.85	-4.79	0	0	0	18.3	135.41	-5.28
13125	2016-06-30 21:00:00	NaN	0	0	0	17.1	147.69	-11.87	0	0	...	0	16.0	147.30	-11.58	0	0	0	16.8	147.90	-11.93
13126	2016-06-30 22:00:00	NaN	0	0	0	15.9	161.19	-16.15	0	0	...	0	15.6	160.77	-15.91	0	0	0	15.7	161.42	-16.17
13127	2016-06-30 23:00:00	NaN	0	0	0	14.9	175.45	-18.16	0	0	...	0	15.7	175.00	-17.99	0	0	0	14.8	175.68	-18.14

	predictor name	predictor importance (%)
0	PV_obs	18.62
1	GHI_2	14.28
2	GHI_3	9.58
3	SE_4	7.7
4	GHI_4	7.39

	feature name	feature importance (%)	time	type
0	(SE_3(t) - 1.00)⁺	36.63	02:00:00	Predictor
1	(-DIF_2(t) + 218.50)⁺ & (-DIF_5(t) + 210.83)⁺	30.27	03:00:00	Interaction
2	(GHI_3(t) - 0.00)⁺	29.66	19:00:00	Predictor
3	EMA_PV_obs(t-16, w = 15)	27.86	01:00:00	TargetAndTargetTransformation
4	(GHI_3(t) - 0.00)⁺	26.57	18:00:00	Predictor