Contact center volumes forecasting (week-ahead)

Title: Contact center volumes forecasting (week-ahead)
Author: Tangent Works
Industry: Contact centers, Shared services centers
Area: Workforce management
Type: Forecasting

Description

Contact centers rely on pool of resources ready to help customers when they reach out via call, email, chat, or other channel. For contact centers, predicting volume of incoming requests at specific times is critical input to resource scheduling (very short- and short-term horizon) and resource management (mid to long term horizons). For short term forecasts, typical task would be predicting volumes for the next 7 days, hour by hour. High quality forecast would bring confidence that FTEs (full time equivalent - indicates workload of an employed person) planned for the next week are just right for delivering on SLAs. Not to mention other benefits, such as higher confidence when planning absence (due to vacation, education etc.), or improving morale of employees who would not face overload from "sudden" volume peaks.

To build a high-quality forecast, it is necessary to gather relevant, and valid data with predictive power. In such case it is possible to employ ML technology like TIM RTInstantML that can build models for time-series data in fraction of time.

We will showcase how TIM can predict volumes of requests for next 7 days on hourly basis in our sample use case.

Business parameters

Business objective: Reduce risk of resources shortage
Business value: Optimal resources planning
KPI: -
Business objective: Reduce risk of not meeting SLAs
Business value: Better customer relations, lower/no penalties
KPI: -
Business objective: Reduce effort on forecasting
Business value: Gain capacity of high skilled personnel
KPI: -
In [1]:
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json

import tim_client
In [2]:
with open('credentials.json') as f:
    credentials_json = json.load(f)                     # loading the credentials from credentials.json

TIM_URL = 'https://timws.tangent.works/v4/api'          # URL to which the requests are sent

SAVE_JSON = False                                       # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/'                            # folder where the requests and responses are stored

LOGGING_LEVEL = 'INFO'
In [3]:
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
In [4]:
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)

api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
[INFO] 2021-08-17 09:48:33,191 - tim_client.api_client:save_json:66 - Saving JSONs functionality has been disabled
[INFO] 2021-08-17 09:48:33,192 - tim_client.api_client:json_saving_folder_path:75 - JSON destination folder changed to logs

Dataset

Dataset contains information about request volumes, temperature, holiday, no. of regular customers, marketing campaign, no. of customers for which contract will expire within next 30 or 60 days, no. of invoices sent, flag whether particular timestamp is day when invoices are sent, flag if contact center is open at given timestamp.

Sampling

Hourly.

Data

Structure of CSV file:

Column name Description Type Availability
Date Timestamp Timestamp column
Volumes No. of requests Target t+0
Temperature Temperature in Celsius Predictor t+168
PublicHolidays Binary flag for holidays Predictor t+168
IsOpen Binary flag to show if contact center is open at given timestamp Predictor t+168
IsMktingCampaign Binary flag to show if product team is running marketing campaign at given timestamp Predictor t+168
ContractsToExpireIn30days No. of regular contracts that will expire within 30 days Predictor t+168
ContractsToExpireIn60days No. of regular contracts that will expire within 60 days Predictor t+168
RegularCustomers No. of active contracts for regular customers Predictor t+168
InvoiceDay Binary flag to show if invoices are sent at given timestamp Predictor t+168
InvoicesSent No. of invoices sent at given timestamp Predictor t+168

Data situation

We want to predict volume for the next 7 days, for each hour. Time of prediction is at 23:00 every day. This situation is reflected in values present in CSV file. TIM will simulate this situation throughout the whole out-of-sample interval to calculate accuracy metrics.

CSV file used in experiments can be downloaded here.

Source

This is synthetic dataset generated by simulating outcome of events relevant to operations of contact center.

In [5]:
data = tim_client.load_dataset_from_csv_file('data2B.csv', sep=',')          
                       
data      
Out[5]:
Date Volumes Temperature PublicHolidays IsOpen IsMktingCampaign ContractsToExpireIn30days ContractsToExpireIn60days RegularCustomers InvoiceDay InvoicesSent
0 2012-05-13 08:00:00 0.0 7.5 0 0 0 23663 42428 56976 0 0
1 2012-05-13 09:00:00 0.0 7.5 0 0 0 23665 42428 56976 0 0
2 2012-05-13 10:00:00 0.0 7.5 0 0 0 23667 42428 56976 0 0
3 2012-05-13 11:00:00 0.0 8.6 0 0 0 23669 42428 56976 0 0
4 2012-05-13 12:00:00 0.0 8.6 0 0 0 23670 42428 56976 0 0
... ... ... ... ... ... ... ... ... ... ... ...
23099 2014-12-31 19:00:00 NaN -0.6 0 0 0 57389 58832 79186 0 0
23100 2014-12-31 20:00:00 NaN -0.6 0 0 0 57390 58833 79187 0 0
23101 2014-12-31 21:00:00 NaN -0.6 0 0 0 57390 58834 79188 0 0
23102 2014-12-31 22:00:00 NaN -0.3 0 0 0 57391 58834 79189 0 0
23103 2014-12-31 23:00:00 NaN -0.3 0 0 0 57392 58835 79190 0 0

23104 rows × 11 columns

In [6]:
target_column = 'Volumes'

timestamp_column = 'Date'

Visualization

In [7]:
fig = go.Figure()

fig.add_trace( go.Scatter( x=data.iloc[:]['Date'], y=data.iloc[:][ target_column ] ) )     

fig.update_layout( width=1300, height=700, title='Volumes' )

fig.show()

Engine settings

Parameters that need to be set:

  • predictionTo defines prediction horizon and is set to 7*24 as we want to predict volumes for the next 7 days.
  • backtestLength - defines length of out-of-sample interval.

We also ask for additional data from engine to see details of sub-models so we define extendedOutputConfiguration parameter as well.

In [8]:
back_test_length = int( data.shape[0] * .33 )

prediction_horizon_samples = 7*24
In [9]:
configuration_backtest = {
    'usage': {                                 
        'predictionTo': { 
            'baseUnit': 'Sample',              
            'offset': prediction_horizon_samples                   
        },
        'backtestLength': back_test_length             
    },
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True      
    }
}

Experiment iteration(s)

In [10]:
backtest = api_client.prediction_build_model_predict(data, configuration_backtest)
              
backtest.status                                                                                 
Out[10]:
'Finished'
In [11]:
backtest.result_explanations
Out[11]:
[]

Insights - inspecting ML models

Simple and extended importances are available for you to see to what extent each predictor contributes to explanation of variance of target variable.

In [12]:
simple_importances = backtest.predictors_importances['simpleImportances']
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True) 

simple_importances = pd.DataFrame.from_dict( simple_importances )
In [13]:
fig = go.Figure()

fig.add_trace( go.Bar( x = simple_importances['predictorName'],
                      y = simple_importances['importance'] ) )

fig.update_layout(
        title='Simple importances',
        width = 1200,
        height = 700
)

fig.show()
In [14]:
extended_importances = backtest.predictors_importances['extendedImportances']
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True) 

extended_importances = pd.DataFrame.from_dict( extended_importances )
In [15]:
extended_importances[ extended_importances['time']=='11:00:00' ]
Out[15]:
time type termName importance
20 11:00:00 Interaction IsOpen & RegularCustomers 16.62
31 11:00:00 Interaction IsOpen & Temperature 13.18
50 11:00:00 Interaction IsOpen & RegularCustomers(t-32) 10.95
57 11:00:00 Interaction IsOpen & Temperature(t-7) 9.73
80 11:00:00 Interaction cos(2Ï€t / 12.0 hours) & cos(2Ï€t / 2.0 hours) 7.12
87 11:00:00 Interaction IsOpen & Temperature(t-21) 6.47
88 11:00:00 Interaction IsOpen & cos(2Ï€t / 2.0 hours) 6.44
93 11:00:00 Interaction IsOpen & ContractsToExpireIn30days(t-42) 5.98
100 11:00:00 Interaction IsOpen & sin(2Ï€t / 3.0 hours) 5.49
151 11:00:00 Interaction IsOpen & IsMktingCampaign 2.88
174 11:00:00 Interaction IsOpen & IsOpen(t-1) 2.24
181 11:00:00 Interaction IsOpen(t-1) & cos(2Ï€t / 3.0 hours) 2.08
202 11:00:00 Interaction IsOpen & IsMktingCampaign(t-142) 1.72
208 11:00:00 Interaction cos(2Ï€t / 2.0 hours) & sin(2Ï€t / 12.0 hours) 1.64
217 11:00:00 Predictor IsOpen 1.53
233 11:00:00 Interaction Volumes(t-168) & IsMktingCampaign 1.37
238 11:00:00 Interaction IsOpen & cos(2Ï€t / 3.0 hours) 1.33
242 11:00:00 Interaction IsMktingCampaign(t-142) & IsMktingCampaign 1.22
252 11:00:00 Interaction IsOpen & cos(2Ï€t / 8.0 hours) 1.04
260 11:00:00 Interaction cos(2Ï€t / 3.0 hours) & IsOpen(t-1) 0.99
In [16]:
fig = go.Figure()

fig.add_trace( go.Bar( x = extended_importances[ extended_importances['time'] == '11:00:00' ]['termName'],
                      y = extended_importances[ extended_importances['time'] == '11:00:00' ]['importance'] ) )

fig.update_layout(
        title='Features generated from predictors used by model for 11:00',
        width = 1200,
        height = 700
)

fig.show()

Evaluation of results

Results for out-of-sample interval.

In [17]:
# Helper function, merges actual and predicted values together
def create_eval_df( predictions ):
    data2 = data.copy()
    data2[ timestamp_column ] = pd.to_datetime( data2[ timestamp_column ]).dt.tz_localize('UTC')
    data2.rename( columns={ timestamp_column: 'Timestamp' }, inplace=True)
    data2.set_index( 'Timestamp', inplace=True)

    eval_data = data2[ [ target_column ] ].join( predictions, how='inner' )

    return eval_data

In-sample

In [22]:
for i in range(0,7):
    print('Day:', i+1, 'RMSE:', backtest.aggregated_predictions[i]['accuracyMetrics']['RMSE'] )
Day: 1 RMSE: 1514.396105849587
Day: 2 RMSE: 1598.8268294503896
Day: 3 RMSE: 1607.5194335875638
Day: 4 RMSE: 1628.1118796911414
Day: 5 RMSE: 1611.4158701397384
Day: 6 RMSE: 1602.467316085566
Day: 7 RMSE: 1613.7498960976457
In [23]:
# backtest.aggregated_predictions[0]['type'], backtest.aggregated_predictions[6]['type']
In [24]:
edf = create_eval_df( backtest.aggregated_predictions[0]['values'] )
In [25]:
fig = go.Figure()

fig.add_trace( go.Scatter( x = edf.index, y=edf['Prediction'], name='In-Sample') )     
fig.add_trace( go.Scatter( x = edf.index, y=edf[ target_column ], name='Actual') )    

fig.update_layout( width=1200, height=700, title='Actual vs. predicted'  )

fig.show()

Out-of-sample

In [26]:
for i in range(7,14):
    print('Day:',i-6,'RMSE:',backtest.aggregated_predictions[i]['accuracyMetrics']['RMSE'] )
Day: 1 RMSE: 2060.2152459019217
Day: 2 RMSE: 2177.948830850947
Day: 3 RMSE: 2229.520122467636
Day: 4 RMSE: 2262.223084425699
Day: 5 RMSE: 2211.5367573571143
Day: 6 RMSE: 2201.0999567584945
Day: 7 RMSE: 2229.889504434829
In [28]:
#backtest.aggregated_predictions[7]['type'], backtest.aggregated_predictions[13]['type']
In [29]:
edf = create_eval_df( backtest.aggregated_predictions[7]['values'] )
In [30]:
fig = go.Figure()

fig.add_trace( go.Scatter( x = edf.index, y=edf['Prediction'], name='Out-of-Sample') )     
fig.add_trace( go.Scatter( x = edf.index, y=edf[ target_column ], name='Actual') )   

fig.update_layout( width=1200, height=700, title='Actual vs. predicted' )

fig.show()

Summary

We demonstrated how TIM can automate forecasting of key inputs to resource planning/scheduling - predicting volume of requests.

Predictors used, and their quality, play vital role in building of such forecasting system, thus it is assumed that cooperation with LoBs (lines of business) that possess relevant information, preferably with forecasted values, is established.

Contact centers that support multiple channels that customers can use to submit query may benefit from forecasts for various perspectives. With TIM RTInstantML it is possible to build new model and make predictions for various perspectives, e.g. volume per channel (incoming calls, messages from social media, emails etc.), volumes per region, consolidated volumes, and other. Equally, need for various prediction horizons does not mean any additional burden for TIM, depending on sampling of your data, you can predict from minutes to years ahead.