Solar production forecasting - panel data

Title: Solar production forecasting
Author: Maria Starovska, Tangent Works
Industry: Utilities
Area: Solar production
Type: Forecasting

Description

Owners of the PV plants, electricity traders, system regulators need accurate forecasts of their production to optimize their maintenance, trading and regulation strategies. However, new open solar farms may have no or very short history, and it is challenging to train individual models. In that case, the user may train a complex model on more solar farms from the portfolio that has a long history and use the model to predict solar farms with short or no history. An important predictor in such an approach is capacity of the solar farm. It is a time-invariant predictor specific for each farm and helps to learn the correct scale for each farm.

This solution template will demonstrate how to use TIM with panel dataset. We will build one general model for farms in the same region. If you are interesting more in modeling each farm individually or modeling portfolio, check the solution templates: single asset solar production and portfolio solar production

Setup

Credentials and logging

(Do not forget to fill in your credentials in the credentials.json file)

Dataset

Description

The dataset contains data for 17 solar farms. Data are available for the years 2019 and 2020. The target variable is solar production, and the dataset contains forecasts from day zero of predictors PVOUT, TEMP, GHI and GTI, and we added the installed capacity of each farm.

Sampling period

Data are sampled hourly.

Data

Column name Description Type Availability
Profile Id of each farm Group key column
Date Timestamp Timestamp column
y Solar production Target D+0
PVOUT The amount of power generated per unit of the installed photovoltaic capacity Predictor D+1
TEMP Temperature Predictor D+1
GHI Global horizontal irradiance Predictor D+1
GTI Global uilted irradiance Predictor D+1
Capacity Installed capacity Predictor D+1

Forecasting situations

Our goal is to predict solar prediction for the next day. Data alignment for each farm indicates that all predictors are available throughout the prediction horizon.

Source

The dataset contains anonymized solar production from a farm in one region.

CSV file used in this experiment can be downloaded here.

Read and preview dataset

Upload dataset to the TIM DB

To work with panel data, we have to specify group keys when uploading data. In our case, it is only the column Store. Then we will specify format of the timestamp in the CSV file and timestamp column since in our dataset it is the second column, but by default, it is the first column.

Data Visualization

Experiment

Description

We will omit four farms (profiles: 4, 7, 8 and 12) from the training data. It has two reasons. First, we would like to demonstrate zero history models. If data have short or no history, a model trained on similar datasets can be used for prediction. Secondly, a model does not have to be trained on all groups to be accurate. Suppose the amount of data and generated features would exceed the amount of available RAM in the worker. In that case, TIM will get rid of the oldest observations, and it may switch off polynomial features. Training only on a subset shortens model building process and may improve accuracy.

We will set predictionTo to one day ahead. Year 2019 will be used as in-sample period and year 2020 as out-of-sample period. We will simulate no history models. Therefore we will not use offsets of the target at all. We will set modelQuality to medium. Then we will set offsetLimit to - 48 to make sure offsets of predictors will not be too deep. When building the model, we will not specify outOfSampleRows, so the amount of data downloaded to the worker will be lower. And finally, we will use only a subset of farms set under preprocessors in the CategoryFilter.

Build-model job configuration

Build-model job execution

Predict job configuration and registration

When writing this notebook, the endpoint to register a predict job was not implemented. Therefore, we will first authorize the user and call the API directly. We will register the job through the API call and then execute predict job with the python client. The model used for prediction will be the already trained model. When registering a predict job, we will specify the id of the already executed build-model job in the URL path. Since the amount of data is higher, we will split farms into three batches.

Visualize Results

Forecasts Visualization

Visualization of farms on which model was not trained

Visualization of farms on which model was trained

Importances

Variable Importances

Feature Importances

Accuracies

Accuracies per farm

We will count weighted MAE par capacity of each farm. We will display as percentage.