Skip to content

How to effectively size your infrastructure for a TIM InstantML implementation

Introduction

This paper explains the overall capacity planning methodologies available for TIM InstantML, and explains the calculations used to obtain metrics for estimating and sizing a TIM Engine environment.

The TIM InstantML technology is deployable in various ways. TIM InstantML can be deployed as a SAAS Service that is called in your IT environment.

Alternatively, TIM InstantML can also be deployed in an On-Premise environment where you provide the server infrastructure but also in a Bring Your Own Cloud License (BYOL on Azure, AWS,...) way.

This document describes sizing considerations for an On-Premise or BYOL environment.

In the SAAS scenario, scaling of the service is done automatically. The BYOL/On-Premise solution also provides scaling but you will need to provide sufficient resources.

This paper is related to following versions of the TIM software:

  • TIM Engine 4.X
  • TIM Studio 2.0

Architectural Components

TIM InstantML runs mainly on a Kubernetes cluster.

As an example, we provide an Azure Deployment Scheme:

  • Scalability Fabric TIM Engine with queuing - AKS Cluster with D3 v2 VM – This is the Kubernetes Cluster Service implementation by Azure.
  • Database - Azure SQL for PostgreSQL – This is an Azure Database Service.
  • TIM Workers - ACI for TIM worker instances – This is the fast scaling Azure Container Instances service (or Kubernetes).

Typically at least two of these services will be setup for redundancy.

The number of TIM Workers are scaled up depending on the number of requests you send to the TIM Engine and therefore the number of requests in the queue.

In an On-Premise environment local Kubernetes and local PostgreSQL installs are used.

On other cloud environments appropriate services will be used. As as an example on AWS:

  • Scalability Fabric TIM Engine and TIM Workers - EKS Cluster with m5.xlarge VM – This is the Kubernetes Cluster Service implementation by AWS.
  • Database - PostgreSQL + TimescaleDB – This is Database Service managed by Timescale Cloud on AWS - https://www.timescale.com/cloud.
  • Queuing of TIM Engine tasks - AmazonMQ for RabbitMQ

Difficulties in Sizing the Environment

Consider the following elements that determine the CPU time and memory requirements for creating a model or creating a forecast, classification or anomaly detection.

  • The size of the data structure
  • The number of predictors
  • The number of timestamps
  • The predictor feature importance
  • Correlation between the predictor candidates and target.

This makes it difficult to have an algorithm that provides you the exact memory and CPU consumption. This document provides you benchmarking figures that make it easy to calculate the size of the architecture using benchmark data rather than rock-solid calculations.

Capacity Planning And Performance Overview

Data input size Considerations

The lightning fast speed of TIM InstantML is the result of efficiently using in memory processing and parallelization of computation. The default maximum memory usage is a dataset of 100 MB. Check out Data Properties

Benchmark Data and Scenarios

In most cases the sizing calculation is straightforward.

A typical TIM worker runs on following configuration:

  • CPU
    • 4 virtual CPU cores
  • Memory
    • 12 GB of RAM

In this benchmark we provide performance data on one single TIM Worker instance for:

  • Different data sets sizes
  • Slim (Few Predictors), Medium, Wide (Many Predictors) datasets
  • Different use cases

For the benchmark, reference datasets are used with different characteristics:

Cells Slim (No Of Predictors) Medium (No Of Predictors) Wide (No Of Predictors)
Predictors 10 50 100
Test Set 1 1,000 100 20 10
Test Set 2 10,000 1,000 200 100
Test Set 3 20,000 2,000 400 200
Test Set 4 50,000 5,000 1,000 500
Test Set 5 100,000 10,000 2,000 1,000

The approximate size of the related CSV and JSON is given in this table:

type cells predictors rows csv_size (KB) json_size (KB)
Slim 1000 10 100 12 40
Medium 1000 50 20 8 40
Wide 1000 100 10 8 40
Slim 10000 10 1000 104 384
Medium 10000 50 200 72 352
Wide 10000 100 100 68 352
Slim 20000 10 2000 256 768
Medium 20000 50 400 144 700
Wide 20000 100 200 136 696
Slim 50000 10 5000 512 1916
Medium 50000 50 1000 384 1744
Wide 50000 100 500 384 1724
Slim 100000 10 10000 1024 3828
Medium 100000 50 2000 704 3488
Wide 100000 100 1000 704 3444

Benchmark results

The benchmark is based on various scenario as specified in the table below:

case request type
1 /prediction/build-model
2 /prediction/predict
3 /prediction/build-model-predict
4 /detection/build-model
5 /detection/rebuild-model
6 /detection/detect

In the following table you find the processing response time and CPU load created by the request based on 1 TIM Worker (running on one 4 core CPU) for each of the case.

1 /prediction/build-model

Cells 1000 - Response Time Max CPU 10,000 - Response Time Max CPU 20,000 - Response Time Max CPU 50,000 - Response Time Max CPU 100,000 - Response Time Max CPU
Slim 1.7 sec 59% 4.7 Sec 363% 9.5 Sec 385% 20.3 Sec 393% 37.1 Sec 392%
Medium 1.7 sec 46% 2.8 sec 287% 3.5 Sec 335% 5.7 Sec 366% 11.5 Sec 385%
Wide 2.1 sec 50% 2.3 Sec 91% 3.3 Sec 318% 4.8 Sec 355% 7.4 Sec 357%

2 /prediction/predict

Cells 1000 - Response Time Max CPU 10,000 - Response Time Max CPU 20,000 - Response Time Max CPU 50,000 - Response Time Max CPU 100,000 - Response Time Max CPU
Slim 0.8 sec 30% 0.9 Sec 32% 1.1 Sec 48% 1.9 Sec 75% 3.2 Sec 100%
Medium 1.5 sec 37% 1.4 sec 36% 1.6 Sec 42% 1.9 Sec 74% 2.7 Sec 100%
Wide 2.1 sec 51% 2.2 Sec 38% 2.3 Sec 40% 2.6 Sec 49% 3.3 Sec 95%

3 /prediction/build-model-predict

Cells 1000 - Response Time Max CPU 10,000 - Response Time Max CPU 20,000 - Response Time Max CPU 50,000 - Response Time Max CPU 100,000 - Response Time Max CPU
Slim 1.5 sec 40% 5.2 Sec 373% 9.4 Sec 386% 15.6 Sec 389% 33.7 Sec 391%
Medium 1.4 sec 42% 2.4 sec 260% 3.2 Sec 288% 5.3 Sec 356% 9 Sec 376%
Wide 1.5 sec 56% 2.0 Sec 102% 2.8 Sec 275% 4.3 Sec 347% 6.7 Sec 361%

4 /detection/build-model

Cells 1000 - Response Time Max CPU 10,000 - Response Time Max CPU 20,000 - Response Time Max CPU 50,000 - Response Time Max CPU 100,000 - Response Time Max CPU
Slim 14.4 sec 100% 16.7 Sec 142% 19.3 Sec 147% 26 Sec 154% 38.7 Sec 367%
Medium 61 sec 100% 62.7 sec 156% 62.7 Sec 154% 64.7 Sec 146% 68.3 Sec 163%
Wide 120.4 sec 101% 121.9 Sec 157% 122 Sec 164% 122 Sec 142% 124.8 Sec 148%

5 /detection/rebuild-model

Cells 1000 - Response Time Max CPU 10,000 - Response Time Max CPU 20,000 - Response Time Max CPU 50,000 - Response Time Max CPU 100,000 - Response Time Max CPU
Slim 6.9 sec 101% 7.4 Sec 100% 7.9 Sec 100.4% 9.9 Sec 100% 13.338.7 Sec 123%
Medium 30.9 sec 100% 31.6 sec 100% 31.5 Sec 100% 32.3 Sec 100% 33.2 Sec 100%
Wide 62 sec 100% 62 Sec 100% 61.8 Sec 100% 62.8 Sec 100 62.2 Sec 148%

6 /detection/detect

Cells 1000 - Response Time Max CPU 10,000 - Response Time Max CPU 20,000 - Response Time Max CPU 50,000 - Response Time Max CPU 100,000 - Response Time Max CPU
Slim 7.3 sec 100% 7.6 Sec 100% 8.0 Sec 100.4% 9.8 Sec 100% 13.2 Sec 100%
Medium 30.9 sec 100% 31.7 sec 100% 31.7 Sec 100% 32.3 Sec 100% 34 Sec 100%
Wide 61.4 sec 100% 62.5 Sec 100% 62.6 Sec 100% 63.1 Sec 100 62.3 Sec 100%

Notes:

  • The CPU Load is expressed per CPU. i.e.. 400% 4 X 100% / Core
  • The performance figures are for sequential execution of the ML requests without scaling and spinning up more TIM Workers.

Scaling the workers

What do you do if you need more transaction per hour?

The TIM Engine provides queueing and automatically spins up new TIM workers to cater for the volume of request being handled.

How to calculate the size and pricing of your infrastructure?

The benchmark figures give you an indication what the performance will be in your use case. You need to determine the profile of ML request you need and calculate the number of TIM Workers you will need.

In this table, we give an example of a calculation:

Component Sizing Consideration Cost Costing Example
TIM Engine Fabric This Kubernetes installed component ensure a REST Endpoint is available We recommend 2 VM with 4 core CPU and 32 Memory for this. 140 Euro / Month for 2 D3 Servers to support the cluster
Queueing Service Rabbit MQ is available a Kubernetes cluster deployment - alternately you can use a Platform service for this. RabbitMQ service are available on AWS and Azure Optional
Database This is the Postgres database service. Azure SQL for Postgress - 130 Euro / Month
TIM Workers The TIM workers are scalable component. You can find the CPU load response time in the benchmark tables. This allows you to calculate the number of 4 Core/12 GB Servers you need. 2 ACI containers for TIM Workers - 240 Euro per Months
Total 510 Euro / Month

This is a two TIM worker configuration. Some Example throughputs:

  • RTInstantML Scenario (/Prediction/build-model-predict) - 1000 Cells -Medium - 1,5 Sec Response Time - 2400 Transactions/Hour/Works = 4800 Transaction per hour for this configuration
  • RTInstantML Scenario (/Prediction/build-model-predict) - 100,000 Cells - Medium - 34 Sec Response Time - 106 Transactions/Hour = 212 Transaction per hour for this configuration

Notes:

  • Do not forget to cater for data collection. The measurements in the tables above are processing time (Response Time) of the TIM Worker.
  • the The prices are indicational and dependent on your plan with Azure.
  • The Azure prices are based on 3 years upfront.
  • Similar pricing is possible for AWS or on premise.
  • You might want to consider servers with less cores if your cases does not benefit from parallelization over multiple cores.

Sizing And Estimation Methodology

Estimating anything can be a complex and error-prone process. That’s why it's called an 'estimation', rather than a 'calculation'. There are three primary approaches to sizing a TIM InstantML implementation:

  • Algorithm, or Calculation Based
  • Size By Example Based
  • Proof of Concept Based

Typical implementations of TIM InstantML do not required complex sizing and estimation processes. An algorithm based approach, taking into account the data size and the number of ML transaction per hour per worker, allows you to determine the number of parallel workers and design your architecture.

In more complex cases a Proof of Concept might be useful. This is typically the case with more complicated peak time ML consumption requirements.

Algorithm, Or Calculation Based

An algorithm or process that accepts data input is probably the most commonly accepted tool for delivering sizing estimations. Unfortunately, this approach is generally the most inaccurate.

When considering a multiple model – multiple use case implementation, the number of variables involved in delivering a calculation that even approaches a realistic sizing response requires input values numbering in excess of one hundred, and calculations so complex and sensitive that providing an input value plus or minus 1% of the correct value results in wildly inaccurate results.

The other approach to calculation-based solutions is to simplify the calculation to the point where it is simple to understand and simple to use. This paper shows how this kind of simplification can provide us with a sizing calculator.

Size-By-Example Based

A size-by-example (SBE) approach requires a set of known samples to use as data points along the thermometer of system size. The more examples available for SBE, the more accurate the intended implementation will be.

By using these real world examples, both customers and Tangent Works can be assured that the configurations proposed have been implemented before and will provide the performance and functionality unique to the proposed implementation. Tangent Works Engineering can help here.

Proof Of Concept Based

A proof of concept (POC), or pilot based approach, offers the most accurate sizing data of all three approaches.

A POC you to do the following:

  • Test your InstantML implementation design
  • Test your chosen hardware or cloud platform
  • Simulate projected load
  • Validate design assumptions
  • Validate Usage
  • Provide iterative feedback for your implementation team
  • Adjust or validate the implementation decisions made prior to the POC

There are, however, two downsides to a POC based approach, namely time and money. Running a POC requires the customer to have manpower, hardware, and the time available to implement the solution, validate the solution, iterate changes, re-test, and finally analyze the POC findings.

A POC is always the best and recommended approach for any sizing exercise. It delivers results that are accurate for the unique implementation of the specific customer that are as close to deploying the real live solution as possible, without the capital outlay on hardware and project resources.