Skip to content

Root cause analysis


Root cause analysis is added at the end of the tabular output described in anomaly detection outputs overview. We can see columns RCA_Influencer 1, RCA_Influencer 2, ..., RCA_Influencer N at the end of the table below, which represent the root cause analysis output:

datetime model_index ... normal_behavior ... RCA_Influencer 1 RCA_Influencer 2 ... RCA_Influencer N
2020-10-12T03:00:00.0 4 ... 71277.24 ... 6371.5959080431 5250.57478589984 ... 727.376704085072
2020-10-12T04:00:00.0 5 ... 83687.43 ... 7563.0550795337 3450.52478589984 ... 825.976580193047
2020-10-12T05:00:00.0 6 ... 92960.32 ... 83898.8481559912 -2250.174728489984 ... 1029.63298605852
2020-10-12T06:00:00.0 7 ... 90857.38 ... 4671.2022090489 3240.87478589984 ... 1362.98733337759
2020-10-12T07:00:00.0 8 ... 91852.39 ... 8997.1516298713 -1231.14178543984 ... -1345.38208161086
2020-10-12T08:00:00.0 9 ... 93413.58 ... 9128.3621537876 3210.17578289984 ... -400.37972214374


RCA brings additional information to a customer concerning anomalies. Without RCA, you could see the actual vs normal behavior values, anomalies, influencers and anomaly indicator - see in the following picture :


It is quite obvious that there is something unusual happening on 23 May. Anomaly indicator went above the threshold, the difference between normal behavior and actual is more prominent, and it's marked with red dots signalizing anomaly. But what was behind this increase in normal behavior remained unclear. Thus, the primary motivation for RCA was to propagate information exploring what drives normal behavior.

RCA should bring a customer :

  • transparency, explainability, trust in results and confidence in critical decisions
  • a deeper understanding of what drives normal behavior
  • to explore the possible reason behind the anomaly
  • make the final decision about anomaly candidate based on analysis

What information can I obtain from the root cause analysis ?

What drives normal behavior

RCA output reveals the involvement of each influencer in normal behavior for a given data point. It is a straightforward way how to see what drives normal behavior.

For a given timestamp t, the sum of the influencers equals to normal behavior value.

What drives normal behavior change

Using the RCA output you can additionally create differences between most recent normal behaviors before anomalous point adhering to reasonable filtering (that secure that differences are calculated only between the same models(model_index) and are not affected by anomalies except the analyzed anomalous point). Especially useful is to calculate the difference between anomalous point and the most recent not anomalous point - this gives you information about the involvements of each influencer on the change of normal behavior.

We are getting now :

For a given timestamp t, the sum of the influencers changes equals to normal behavior change.