Single asset wind production

Problem Description

This section considers a single asset problem: the production of an individual wind farm or even an individual wind turbine. Both can be described with one single pair of GPS-coordinates, thus they can be considered as one single asset. Typical forecasting scenarios in this situation range from forecasting a few hours ahead to forecasting a few days ahead (usually 1h – 36h or 1h – 48h ahead). The data sampling rate in these scenarios tends to be 15 minutes, 30 minutes or 1 hour. Typical metrics used to evaluate forecast quality are the MAE and RMSE. To evaluate the percentual error nMAE, nRMSE, rMAE or rRMSE are used.

Data Recommendation Template

It is essential for wind production forecasting to have good windspeed forecasts. For single asset modelling, it is recommended to have wind speed and wind direction forecasts at the hub high of the individual wind turbines. TIM can generate good production forecast with a combination of wind speeds at different heights and the wind direction at least one of the heights. The best practice is to use historical actuals of meteorological predictors for model building and use meteorological forecasts for out-of-sample validation. This ensures the highest quality available data is used at every stage. Other meteorological predictors such as wind gusts, temperature, irradiation and pressure can also improve the quality of the forecast. It is recommended to only include those predictors for further fine-tuning of the model(s).

TIM Setup

TIM requires no setup of its mathematical internals and works well in business user mode. All that is required from a user is to let TIM know a forecasting routine and a desired prediction horizon. TIM can automatically recognize appropriate values for these mathematical internals, for example by recognizing that there is no weekly pattern. In some cases, however, (e.g. short datasets) it can be difficult to recognize this and therefore it is recommended to switch off the weekdays dictionary, as common sense already explains that it will not contribute to the quality of the models in this scenario.

Demo using Python API Client

Set up Python Libraries

In [1]:
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json

import tim_client

Credentials and logging

(Do not forget to fill in your credentials in the credentials.json file)
In [2]:
with open('credentials.json') as f:
    credentials_json = json.load(f)                     # loading the credentials from credentials.json

TIM_URL = 'https://timws.tangent.works/v4/api'          # URL to which the requests are sent

SAVE_JSON = False                                       # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/'                            # folder where the requests and responses are stored

LOGGING_LEVEL = 'INFO'
In [3]:
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
In [4]:
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)

api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
[INFO] 2020-10-29 23:10:04,222 - tim_client.api_client:save_json:74 - Saving JSONs functionality has been disabled
[INFO] 2020-10-29 23:10:04,225 - tim_client.api_client:json_saving_folder_path:89 - JSON destination folder changed to logs

Specify configuration

In this example we will simulate a day ahead scenario. Each day at 10:15 we wish to have forecast for each hour up until the end of the next day - we will set the "predictionTo" to 38 samples. Model is built using a range between 2018-04-16 00:00:00 - 2018-12-14 23:00:00. Out-of-sample forecasts are made in the range between 2018-12-15 00:00:00 - 2019-04-13 09:00:00 (the last 2842 samples). To get better insights from our model we will also want extended importance and prediction intervals to be returned.

In [5]:
configuration_backtest = {
    'usage': {
        'predictionTo': {
            'baseUnit': 'Sample',                # units that are used for specifying the prediction horizon length (one of 'Day', 'Hour', 'QuarterHour', 'Sample')
            'offset': 38                       # number of units we want to predict into the future (24 hours in this case)
        },
        'backtestLength': 2842                 # number of samples that are used for backtesting (note that these samples are excluded from model building period)
    },
    "predictionIntervals": {
        "confidenceLevel": 90                  # confidence level of the prediction intervals (in %)
    },
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True      # flag that specifies if the importances of features are returned in the response
    }
}

Data description

Dataset used in this example has hourly sampling rate and contains data from 2015-01-01 to 2017-01-12.

Target

The data used in this example is assembled from a wind farm in Spain. The GPS-coordinates of this wind farm are 43.3544, -7.8811. Production data is available and can be downloaded from the following web page: http://www.sotaventogalicia.com/en/real-time-data/historical. The production of this wind farm is the target variable. It corresponds to the second column in the CSV-file, right after column with timestamps. In this case, the name of the target is Energy. This dataset contains hourly data.

Predictor candidates

The meteorological predictors used in this scenario are the wind speeds at heights of 10m, 80m, 100m and 120m and the wind direction at a height of 100m. In this demo, historical actuals are used for both model building and out-of-sample forecasting. Data used in this example range from 2018-04-16 to 2019-04-14.

Forecasting scenario

We simulate a day ahead scenario – each day at 10:00 we would want to forecast target one whole day into the future. We assume that values of all predictors are available till the end of the next day (the end of the prediction horizon). This means that predictors’ data columns are a combination of actual values and forecast values. The last value of the target is from 09:00. To let TIM know that this is how it would be used in the production we can simply use the dataset in a form that would represent a real situation (as can be seen in the view below - notice the NaN values representing the missing data for the following day we wish to forecast). In this demo data set, out-of-sample validation is performed using historical actuals of meteorological data. More representative validation may be obtained by using historical forecasts of meteorological data instead.

In [6]:
data = tim_client.load_dataset_from_csv_file('data.csv', sep=',')                                  # loading data from data.csv
data                                                                                               # quick look at the data
Out[6]:
Date Energy wind_speed_mean_10m_1h.ms wind_speed_mean_80m_1h.ms wind_speed_mean_100m_1h.ms wind_speed_mean_120m_1h.ms wind_dir_mean_100m_1h.d
0 2018-04-16 00:00:00 1897.75 1.9 4.6 5.0 5.6 231.2
1 2018-04-16 01:00:00 1695.95 2.0 5.0 5.3 6.2 217.1
2 2018-04-16 02:00:00 1648.54 2.3 5.4 5.4 6.7 210.8
3 2018-04-16 03:00:00 1309.68 2.3 5.7 5.4 7.1 203.9
4 2018-04-16 04:00:00 774.02 2.3 5.9 5.5 7.4 199.2
... ... ... ... ... ... ... ...
8731 2019-04-14 19:00:00 NaN 1.5 4.7 3.2 5.2 222.0
8732 2019-04-14 20:00:00 NaN 1.3 4.9 2.9 5.7 213.1
8733 2019-04-14 21:00:00 NaN 1.2 5.0 2.7 6.0 196.4
8734 2019-04-14 22:00:00 NaN 1.3 5.0 2.4 6.1 176.9
8735 2019-04-14 23:00:00 NaN 1.0 5.0 2.1 6.2 174.8

8736 rows × 7 columns

Run TIM

In [7]:
backtest = api_client.prediction_build_model_predict(data, configuration_backtest)                 # running the RTInstantML forecasting using data and defined configuration
backtest.status                                                                                    # status of the job
Out[7]:
'Finished'

Visualize backtesting

In [8]:
fig = plt.subplots.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.02)      # plot initialization

fig.add_trace(go.Scatter(x = data.loc[:, "Date"], y=data.loc[:, "Energy"],
                         name = "target", line=dict(color='black')), row=1, col=1)              # plotting the target variable

fig.add_trace(go.Scatter(x = backtest.prediction.index,
                         y = backtest.prediction.loc[:, 'Prediction'],
                         name = "production forecast",
                         line = dict(color='purple')), row=1, col=1)                            # plotting production prediction

fig.add_trace(go.Scatter(x = backtest.prediction_intervals_upper_values.index,
                         y = backtest.prediction_intervals_upper_values.loc[:, 'UpperValues'],
                         marker = dict(color="#444"),
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)
fig.add_trace(go.Scatter(x = backtest.prediction_intervals_lower_values.index,
                         y = backtest.prediction_intervals_lower_values.loc[:, 'LowerValues'],
                         fill = 'tonexty',
                         line = dict(width=0),
                         showlegend = False), row=1, col=1)                                     # plotting confidence intervals

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[0]['values'].index,
                         y = backtest.aggregated_predictions[0]['values'].loc[:, 'Prediction'],
                         name = "in-sample MAE: " + str(round(backtest.aggregated_predictions[0]['accuracyMetrics']['MAE'], 2)),
                         line=dict(color='goldenrod')), row=1, col=1)                           # plotting in-sample prediction

fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[1]['values'].index,
                         y = backtest.aggregated_predictions[1]['values'].loc[:, 'Prediction'],
                         name = "out-of-sample MAE: " + str(round(backtest.aggregated_predictions[1]['accuracyMetrics']['MAE'], 2)),
                         line = dict(color='red')), row=1, col=1)                               # plotting out-of-sample-sample prediction

fig.add_trace(go.Scatter(x = data.loc[:, "Date"], y=data.loc[:, "wind_speed_mean_10m_1h.ms"],
                         name = "wind_speed_mean_10m_1h.ms", line=dict(color='forestgreen')), row=2, col=1)   # plotting the predictor wind_speed_mean_10m_1h.ms

fig.update_layout(height=600, width=1000,
                  title_text="Backtesting, modelling difficulty: "
                  + str(round(backtest.data_difficulty, 2)) + "%" )                             # update size and title of the plot

fig.show()

Visualize predictor and feature importances

In [9]:
simple_importances = backtest.predictors_importances['simpleImportances']                                                                # get predictor importances
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True)                                           # sort by importance
extended_importances = backtest.predictors_importances['extendedImportances']                                                            # get feature importances
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True)                                       # sort by importance

si_df = pd.DataFrame(index=np.arange(len(simple_importances)), columns = ['predictor name', 'predictor importance (%)'])                 # initialize predictor importances dataframe
ei_df = pd.DataFrame(index=np.arange(len(extended_importances)), columns = ['feature name', 'feature importance (%)', 'time', 'type'])   # initialize feature importances dataframe
In [10]:
for (i, si) in enumerate(simple_importances):
    si_df.loc[i, 'predictor name'] = si['predictorName']                   # get predictor name
    si_df.loc[i, 'predictor importance (%)'] = si['importance']            # get importance of the predictor

for (i, ei) in enumerate(extended_importances):
    ei_df.loc[i, 'feature name'] = ei['termName']                          # get feature name
    ei_df.loc[i, 'feature importance (%)'] = ei['importance']              # get importance of the feature
    ei_df.loc[i, 'time'] = ei['time']                                      # get time of the day to which the feature corresponds
    ei_df.loc[i, 'type'] = ei['type']                                      # get type of the feature
In [11]:
si_df.head()                                                               # predictor importances data frame
Out[11]:
predictor name predictor importance (%)
0 wind_speed_mean_100m_1h.ms 27.12
1 Energy 22.92
2 wind_speed_mean_120m_1h.ms 21.68
3 wind_dir_mean_100m_1h.d 13.27
4 wind_speed_mean_80m_1h.ms 9.2
In [12]:
fig = go.Figure(go.Bar(x=si_df['predictor name'], y=si_df['predictor importance (%)']))      # plot the bar chart
fig.update_layout(height=400,                                                                # update size, title and axis titles of the chart
                  width=600,
                  title_text="Importances of predictors",
                  xaxis_title="Predictor name",
                  yaxis_title="Predictor importance (%)")
fig.show()
In [13]:
ei_df.head()                                                               # first few of the feature importances
Out[13]:
feature name feature importance (%) time type
0 wind_speed_mean_100m_1h.ms & wind_speed_mean_1... 30.98 [14] Interaction
1 wind_speed_mean_100m_1h.ms & wind_speed_mean_8... 30.82 [31] Interaction
2 wind_speed_mean_100m_1h.ms & wind_speed_mean_8... 30.81 [30] Interaction
3 wind_speed_mean_100m_1h.ms & wind_speed_mean_1... 30.8 [13] Interaction
4 wind_speed_mean_100m_1h.ms & wind_speed_mean_8... 30.8 [32] Interaction
In [14]:
time = '[1]'                                                                            # time for which the feature importances are visualized
fig = go.Figure(go.Bar(x=ei_df[ei_df['time'] == time]['feature name'],                       # plot the bar chart
                       y=ei_df[ei_df['time'] == time]['feature importance (%)']))
fig.update_layout(height=700,                                                                # update size, title and axis titles of the chart
                  width=1000,
                  title_text="Importances of features (for {}-sample ahead model)".format(time),
                  xaxis_title="Feature name",
                  yaxis_title="Feature importance (%)")
fig.show()