Contact center volumes forecasting (quarter-ahead)

Title: Contact center volumes forecasting (quarter-ahead)
Author: Tangent Works
Industry: Contact centers, Shared services centers
Area: Workforce management
Type: Forecasting


Contact centers rely on pool of resources ready to help customers when they reach out via call, email, chat, or other channel. For contact centers, predicting volume of incoming requests at specific times is critical for resource scheduling (very short- and short-term horizon) and resource management (mid to long term horizons). It takes time action taken within workforce management framework becomes effective (and is reflected in financial reports eventually), moving people around, hiring, upskilling, or down-sizing pool of resources takes weeks if not longer. Because of this, forecast for longer horizons is needed, starting from one to more months.

To build a high-quality forecast, it is necessary to gather relevant, and valid data with predictive power. In such case it is possible to employ ML technology like TIM RTInstantML that can build models for time-series data in fraction of time.

In our sample use case, we will showcase how TIM can predict volumes of requests for the next quarter, for each week ahead.

Business parameters

Business objective: Reduce risk of resources shortage
Business value: Optimal resources planned
KPI: -

Business parameters

Business objective: Reduce effort on forecasting
Business value: Free capacity of high skilled personnel
KPI: -
In [25]:
import logging
import pandas as pd
import plotly as plt
import as px
import plotly.graph_objects as go
import numpy as np
import json

import tim_client
import os
In [2]:
with open('credentials.json') as f:
    credentials_json = json.load(f)                     # loading the credentials from credentials.json

TIM_URL = ''          # URL to which the requests are sent

SAVE_JSON = False                                       # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/'                            # folder where the requests and responses are stored

In [3]:
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
In [4]:
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)

api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
[INFO] 2021-08-17 10:28:15,725 - tim_client.api_client:save_json:66 - Saving JSONs functionality has been disabled
[INFO] 2021-08-17 10:28:15,726 - tim_client.api_client:json_saving_folder_path:75 - JSON destination folder changed to logs


Dataset contains aggregated (per week) information about request volumes, temperature, holiday, no. of regular customers, marketing campaign, no. of customers for which contract will expire within next 30 or 60 days, no. of invoices sent, invoicing days, hours open.




Structure of CSV file:

Column name Description Type Availability
Date Timestamp Timestamp column
Sum of Volumes Sum of all requests in given week Target t+0
Avg temperature Mean temperature Predictor t+13
Hours of public holidays Public holiday days in given week x 24 Predictor t+13
Hours open Total hours center was/will be open to requests Predictor t+13
Hours of mkting campaign How many hours campaign run/will run Predictor t+13
Avg contracts to expire in 30 days Average no. of regular contracts that will expire within 30 days Predictor t+13
Avg contracts to expire in 60 days Average no. of regular contracts that will expire within 60 days Predictor t+13
Avg no. of regular customers Average no. of active contracts for regular customers Predictor t+13
No. of invoicing hours Total hours during which invoice were/will be sent Predictor t+13
No. of invoices No. of invoices sent Predictor t+13

Data situation

We want to predict total volume of requests for the next quarter (13 weeks) for each week. We assume to have forecasted values for predictors available. This situation in data is reflected in values present in CSV file. To simulate out-of-sample period thoroughly (i.e. to use always the latest model for each forecasting), each forecasting situation has its own CSV file reflecting data situation relevant at respective forecasting.

CSV files used in experiments can be downloaded here as ZIP package.


This is synthetic dataset generated by simulating outcome of events relevant to operations of contact center.

In [5]:
# Sample from the first CSV file
data = tim_client.load_dataset_from_csv_file('dataL/data2LB1.csv', sep=',')          
Date Sum of Volumes Avg temperature Avg contracts to expire in 30 days Avg contracts to expire in 60 days Avg no. of regular customers Hours of public holidays Hours open Hours of mkting campaign No. of invoicing hours No. of invoices
0 2012-05-20 1009189.0 11.698214 23887.994048 42446.601190 56967.791667 0 66 0 24 56940
1 2012-05-27 781528.0 17.617857 24220.125000 42385.767857 56932.523810 24 66 0 0 0
2 2012-06-03 961166.0 14.111905 32031.994048 42457.255952 56918.892857 0 66 0 0 0
3 2012-06-10 968952.0 15.504762 42451.392857 42429.035714 56732.327381 24 55 0 0 0
4 2012-06-17 1192165.0 16.595833 42491.714286 42412.315476 56597.630952 0 66 0 24 56652
... ... ... ... ... ... ... ... ... ... ... ...
132 2014-11-30 NaN 0.479762 56540.351190 57953.000000 78312.869048 0 66 0 0 0
133 2014-12-07 NaN -0.514881 56777.446429 58175.309524 78446.595238 0 66 0 0 0
134 2014-12-14 NaN 2.689881 56885.446429 58218.291667 78650.113095 0 66 0 0 0
135 2014-12-21 NaN 5.559524 57093.029762 58483.880952 78788.196429 0 66 0 24 78674
136 2014-12-28 NaN 2.175595 57308.446429 58685.148810 78906.744048 72 33 0 0 0

137 rows × 11 columns

In [6]:
target_column = 'Sum of Volumes'  # sum of requests per given week

timestamp_column = 'Date'


In [7]:
fig = go.Figure()

fig.add_trace( go.Scatter( x=data.iloc[:]['Date'], y=data.iloc[:][ target_column ] ) )     

fig.update_layout( width=1300, height=700, title='Sum of Volumes' )

Engine settings

Parameters that need to be set:

  • predictionTo defines prediction horizon and is set to 13 as we want to predict volumes for the next quarter.
  • backtestLength - defines length of out-of-sample interval, in our case 0, as we want to evalute out-of-sample results by simulating production forecasting with respective datasets.

We also ask for additional data from engine to see details of sub-models so we define extendedOutputConfiguration parameter as well.

In [8]:
back_test_length = 0

prediction_horizon = 13
In [9]:
configuration_backtest = {
    'usage': {                                 
        'predictionTo': { 
            'baseUnit': 'Sample',              
            'offset': prediction_horizon                   
        'backtestLength': back_test_length             
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True      

Experiment iteration(s)

Experiment for the first CSV file, in the next section we will simulate 40 production forecasts.

In [10]:
backtest = api_client.prediction_build_model_predict( data, configuration_backtest )
In [11]:
[{'index': 1,
  'message': 'Predictor Avg contracts to expire in 60 days contains an outlier or a structural change in its most recent records.'},
 {'index': 2,
  'message': 'Predictor Avg no. of regular customers contains an outlier or a structural change in its most recent records.'}]

Insights - inspecting ML models

Simple and extended importances are available for you to see to what extent each predictor contributes to explanation of variance of target variable.

In [12]:
simple_importances = backtest.predictors_importances['simpleImportances']
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True) 

simple_importances = pd.DataFrame.from_dict( simple_importances )

# simple_importances
In [13]:
fig = go.Figure()

fig.add_trace( go.Bar( x = simple_importances['predictorName'],
                       y = simple_importances['importance'] ) )

        title='Simple importances',
        width = 1200,
        height = 700
In [14]:
extended_importances = backtest.predictors_importances['extendedImportances']
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True) 

extended_importances = pd.DataFrame.from_dict( extended_importances )
In [15]:
fig = go.Figure()

fig.add_trace( go.Bar( x = extended_importances[ extended_importances['time'] == '[11]' ]['termName'],
                      y = extended_importances[ extended_importances['time'] == '[11]' ]['importance'] ) )

        title='Features generated from predictors used by model for 11th week in prediction horizon',
        width = 1200,
        height = 700
In [32]:
# Helper function, merges actual and predicted values together
def create_eval_df( predictions, prediction_only = False ):
    data2 = None        
    if prediction_only:
        data2 = tim_client.load_dataset_from_csv_file('data2L.csv', sep=',')  
        data2 = data.copy() 

    data2[ timestamp_column ] = pd.to_datetime( data2[ timestamp_column ]).dt.tz_localize('UTC')
    data2.rename( columns={ timestamp_column: 'Timestamp' }, inplace=True)
    data2.set_index( 'Timestamp', inplace=True)

    eval_data = data2[ [ target_column ] ].join( predictions, how='inner' )

    return eval_data

Evaluation of results


In [33]:
edf = create_eval_df( backtest.aggregated_predictions[0]['values'] )
In [34]:
In [35]:
fig = go.Figure()

fig.add_trace( go.Scatter( x = edf.index, y=edf['Prediction'], name='In-Sample') )     
fig.add_trace( go.Scatter( x = edf.index, y=edf[ target_column ], name='Actual') )    

fig.update_layout( width=1200, height=700,  title='Actual vs. predicted (in-sample)'  )

Out-of-sample result for one forecasting situation

In [36]:
edf = create_eval_df( backtest.prediction, True )
In [37]:
fig = go.Figure()

fig.add_trace( go.Scatter( x = edf.index, y=edf['Prediction'], name='Prediction') )     
fig.add_trace( go.Scatter( x = edf.index, y=edf[ target_column ], name='Actual') )    

fig.update_layout( width=1200, height=700, title='Actual vs. predicted' )

Simulation of 40 production forecasts

In [38]:
results = list()
mapes = list()
In [39]:
{'usage': {'predictionTo': {'baseUnit': 'Sample', 'offset': 13},
  'backtestLength': 0},
 'extendedOutputConfiguration': {'returnExtendedImportances': True}}
In [40]:
datadir = 'dataL'

for fname in os.listdir(datadir):
    fpath_ = os.path.join( datadir, fname )
    # print( fpath_ )

    data_ = tim_client.load_dataset_from_csv_file( fpath_, sep=',' )      

    backtest_ = api_client.prediction_build_model_predict( data_, configuration_backtest )
    # print( backtest_.status )  

    edf_ = create_eval_df( backtest_.prediction, True )
    edf_['err_pct'] = abs( edf_[ target_column ] - edf_[ 'Prediction' ] ) / edf_[ target_column ]

    results.append( edf_ )
    mapes.append( edf_['err_pct'].mean() )

Mean MAPE value

In [ ]:
Out[ ]:
count 40.000000
mean 0.075285
std 0.022402
min 0.037157
25% 0.053423
50% 0.081945
75% 0.094086
max 0.109226
In [ ]:
fig = go.Figure()

fig.add_trace( go.Bar( x = list(range(len(mapes))), y= mapes, name='MAPE') )     

fig.update_layout( width=1200, height=700, title='MAPE per forecast' )


We demonstrated how TIM can be used to predict volumes for mid-term forecasting with weekly data.

Having relevant data with predictive power available at the time of forecasting is prerequisite to any ML/AI solution, however not every ML solution can build new model in fraction of time, adapting to the most recent reality reflected in data.

Contact centers that support multiple channels that customers can use to submit query may benefit from forecasts for various perspectives. With TIM RTInstantML it is possible to build new model and make predictions for various perspectives, e.g. volume per channel (incoming calls, messages from social media, emails etc.), volumes per region, consolidated volumes, and other. Equally, need for various prediction horizons does not mean any additional burden for TIM, depending on sampling of your data, you can predict from minutes to years ahead.