Portfolio Wind Production
- Problem Description
- Data Recommendation Template
- TIM Setup
- Demo example
- Data description
- Predictor candidates
- Forecasting scenario
- Model building and validation
In these solution template we will describe forecasting of aggregated production of more wind farms. There are two possible cases, in this solution template we will describe the second scenario. 1. Production of each individual wind farm is available and production of the whole portfolio is sum of individual productions. In this case individual model and forecast can be generated for each wind farm (see Single Asset Wind Forecasting) and then summed up to obtain production forecast of whole portfolio. With this approach is easy to consider scheduled maintenance of individual wind farms or adapt to change in portfolio size because only active wind farms will be summed up. 2. Only production of the whole portfolio is available. In this case is model generated for whole portfolio considering location of individual wind farms and meteorological situation there. In this solution template we will describe this scenario. Typical scenarios in wind production forecasting are from few hours ahead to few days ahead (usually up to 36h or 48h ahead). Data sampling rate is mostly 15 minutes, 30 minutes or hourly. Typical metrics used to evaluate how good is the prediction are MAE and RMSE. To evaluate error in percent’s nMAE, nRMSE, rMAE or rRMSE are used.
Data Recommendation Template¶
Essential for wind production forecasting is to have good wind speed forecast. The most important are wind speed and wind direction forecasts at hub high of individual wind turbines.
The key is to find an appropriate GPS coordinates that represent portfolio the best. This task doesn’t have an exact solution. We recommend taking into account locations of individual wind farms and their installed capacities.
Our best practice in finding GPS coordinates for meteo data is to do clustering on locations of wind farms weighted by their installed capacity. We use centroids of these clusters as our GPS coordinates for meteo data. However, number of clusters that get best results is still the question.
Best practice is to use historical actuals of meteo predictors for model building and meteorological forecasts for out-of-sample validation.
Other meteo predictors as wind gusts, temperature, irradiation and pressure may improve models. They are recommended only for further fine-tuning. In general, the typical situation is that the portfolio is changing. Currently our best practice is to choose for training last stable part.
TIM requires no setup of TIM`s mathematical internals and works well in business user mode. All that is required from a user is to let TIM know a forecasting routine and desired prediction horizon. TIM can automatically learn that there is no weekly pattern, in some cases, however, (e.g. short datasets) it can be difficult to learn this and therefore we recommend switching off the weekdays dictionary. If target availability is higher than 6 hours, target offsets do not improve model, we may switch them off. Model building will be faster and accuracy will be the same.
A prepared dataset for this problem, along with a prepared configuration YAML-file, can be downloaded here. Data used in this example are from 2018-01-01 to 2019-08-15.
Data used in this example are from the UK. Production data are available and can be downloaded from the web page https://www2.bmreports.com/bmrs/?q=generation/windforcast/out-turn. Sum of production of all wind farms in the UK is our target. It is the second column in CSV file, right after column with timestamps. In this case name of the target is Quantity. Data are in half-hourly granularity.
We will use 10 GPS coordinates across the UK. As predictors will be used wind speeds at heights 100m, 120m and wind direction at high 100m for each of 10 given GPS coordinates. In this demo we are using historical actuals for model building and meteo forecasts for out-of-sample forecasting. In this example, we will use two csv files, one for TIM Studio and other for connector.
1.) CSV file for TIM Studio contains merged historical actuals with forecasts of meteo predictors. Predictors used for model building are historical actuals (up to timestamp 2019-06-30 23:30:00) and for out-of-sample validation are meteo forecasts (from timestamp 2019-07-01 00:00:00). (File location: TIM_Studio_data/data.csv)
2.) Since connector allows you to specify columns with forecasts. For connector actuals and forecasts are in separate columns in csv file, forecasts starts with prefix “F_” followed by the name of corresponding actual, e.g. if historical actual is named “WindSpeed” then historical forecasts of the same predictor are named “F_WindSeed”. Since forecasts before the 2019-06-20 are not available we filled columns before the 2019-06-20 with 0. Columns with actuals are at the beginning of the file and columns with forecasts are at the end. (File location: Wind_portfolio/data.csv)
We simulate day ahead scenario. Each day at 21:35 we wish to have forecast for each hour of the next day. Last target value will be energy produced from 21:00 to 21:30 because data are already updated at 21:35. TIM convention is to denote interval 21:00 - 21:30 as 21:00. That means we have last timestamp from 21:00 (21:00 - 21:30) we are in half-hour 21:30 (21:30 – 22:00) and want forecast for each hour of the next day (day + 1). That means target availability is S-1 (sample - 1) and forecast horizon is from start of D+1 to the end of D+1 (day + 1). Meteo predictors will be available for every hour of our prediction.
Model building and validation¶
Model is built using a range of 2018-01-01 00:00:00 - 2019-06-30 23:30:00.
Out-of-sample forecasts are made at range 2019-07-01 00:00:00 - 2019-08-15 23:30:00. In this demo data set, out-of-sample validation is performed using historical forecasts of meteorological data. Forecasts are stored in separate columns starting with “F_” followed by the name of corresponding actual predictor.
This section covers the use of TIM Studio to solve the challenge described above. Additional information regarding TIM Studio can be found here.
In the Workspaces screen, select the workspace in which the dataset should be added. If there is no available workspace, create one by clicking "Add Workspace". In this solution template, the workspace called "TIM Solution Templates" is used.
In the Datasets screen click on Add New Dataset. Stay in the tab CSV-File and insert name of the dataset. In this example, the dataset is called Wind_portfolio. Click "Browse" and select the dataset from the computer. Click "Add Dataset" to confirm.
Model building definition¶
Go to the Model Building Definition screen in the panel on the left. Click "Add New Definition" and fill in the desired definition name. In this demo, the MBD is called Wind_portfolio day-ahead. In the next screen, select the dataset that was previously uploaded (Wind_portfolio).
In step 2, define the desired forecasting scenario. In this example, the model is used each day at 21:35. Therefore, leave all "Weekdays ranges" checked on. Then, set "Hour ranges" to 21, "Minutes ranges" to 35 and leave "Seconds ranges" at 00. Look into the section about the Cron notation for more details. Since forecasts are to be made for each hour of the next day, leave default settings for "Forecast from" and "Forecast to", i.e. Day with offset 1. Look into the section about the relative time notation to learn more about this.
Click "Next" to advance to the next step. It is also possible to already finalize all settings at this point, in which case everything else would be set up automatically. In this example, some more changes will be made to the data updates in the third step. The target variable, Quantity, is updated at the fifth minute of each hour. Click on the small arrow next to Quantity and change the settings of this variable. Leave all "Weekdays ranges" checked on. Select "All" for the Hour, since this variable is updated each half-hour, and set the "Minutes ranges" to 5,35 since the data is updated in 5. and 35. minute of every hour. Leave the Second ranges at 0. Then, set "Update until" to Sample with offset -1, since the target variable is updated with a delay of one sample (in half-hour 21:30-22:00 only until 21:00 – 21:30).
Leave the default settings for all other predictors, i.e. they are set to update at 8:20 until Day+1. Since forecasts are made at 21:35 and forecasts of predictors values will be available for the next day, these default settings are alright; at what time they update exactly does not matter in this case.
Click "Next" to advance to step 4. Here, training regions can be selected. Since the goal is to move on to back-testing, this screen will be left in its default settings (i.e. Use All Data). Click "Next" to advance to the next screen.
In this fifth step, the mathematical settings can be changed. Weekdays will be switched off in TIM Transformations, since this dictionary) is not relevant for wind forecasting. Then, click "Finalize" to complete the model building definition.
Click "Experiments" in the panel on the left to move on to backtesting. Then, click "Make experiment" next to the correct model building definition (Wind_portfolio day-ahead).
Click "Build Model" and select the appropriate training range, i.e. 2018-01-01 00:00:00 - 2019-06-30 23:30:00.
The in-sample prediction as well as the Model Tree Map become visible.
Click "Validate model" and select the correct Out-of-sample period for backtesting, i.e. 2019-07-01 00:00:00 - 2019-08-15 23:30:00.
This generates the aggregated forecasts for the day D+1.
This section covers the use of TIM Connector to solve the challenge described above. Additional information regarding TIM Connector can be found in the respective section.
1. Download TIM Connector¶
You can find download links in TIM Connector's section.
2. Create folder with dataset¶
Create a folder e.g. TIM_Datasets, which contains dataset folder (Wind_portfolio) with dataset file [data.csv] and configuration file [conf.yaml].
TIM_Datasets/Wind_portfolio/ data.csv conf.yaml
The YAML configuration file defines the forecasting scenario described above.
Model building: Model building considers the range 2018-01-01 00:00:00- 2019-06-30 23:30:00. For model building are used columns 1-32 with timestamp, target and meteorological actuals. The target variable, Quantity, is updated with a delay of one sample (in half-hour 21:30-22:00 only until 21:00 – 21:30). It is updated at the fifth minute of each half-hour.
Configuration: The model is used repeatedly each day at 21:35. Forecasts are to be made for each hour of the next day. All features except calendar variables (WeekRest) are used; for more information about feature dictionaries take a look at the appropriate section.
Forecasting: Out-of-sample forecast are made on columns 1,2,33-62 i.e. time stamp, target and historical forecasts. Forecasts are made for the range 2019-07-01 00:00:00 - 2019-08-15 23:30:00.
version: "1.0" type: Forecasting modelBuilding: data: rows: - from: 2018-01-01 00:00:00 to: 2019-06-30 23:30:00 columns: 1:32 updates: - uniqueName: Quantity updateUntil: baseUnit: Sample offset: -1 updateTime: - type: Hour value: "*" - type: Minute value: "05,35" configuration: usage: usageType: Repeating usageTime: - type: Day value: "*" - type: Hour value: "21" - type: Minute value: "35" predictionFrom: baseUnit: Day offset: 1 predictionTo: baseUnit: Day offset: 1 features: [MovingAverage, Intercept, PiecewiseLinear, TimeOffsets, Polynomial, PeriodicComponents] forecasting: configuration: data: columns: 1:2,33:62 predictionScope: type: Ranges ranges: - from: 2019-07-01 00:00:00 to: 2019-08-15 23:30:00
3. Call connector from the command line¶
Using a terminal, change the current directory to TIM Connector's builddir with the command:
> cd pathToConnector\builddir.
Then, call the connector with the following command:
> pathToConnector\timconnect.exe path\to\TIM_Datasets.
4. Fill in user credentials¶
Following the previous command, the user will be prompted to fill in their user credentials. Fill in the correct information and click "OK" to continue.
Output in console::
Output in folder:
The following accuracies were reported by TIM: Model building stage: RMSE = 749.27, MAE = 554.4
Validation stage: RMSE = 858.17, MAE = 678.86