Friday, March 4, 2022

Perfomalist #ChangeDetection API was used against #MongoDB #perfomanceTesting dataset

We are participating in the data challenge for icpe2022.spec.org conference.

The challenge dataset is provided by MongoDB.

Initially some small part of the data was used to prove that Perfomalist CPD API can be used. 

Data looks like a big data cube with numerous dimensional variables and two factual ones (datetime and value). I took one case with a particular slice of this cube and processed that (datetime-value) by calling the Perfomalist API. The result I have plotted using Excel and can be seen in the following  picture. 




IDEA: Potentially some program could be developed to call the CPD API  (i.e.,  Perfomalist) for every data cube slice and to collect change points in a separate table like in the 2nd picture below:


That (meta-) data then should be correlated with events happening (or not happening) around any change dates detected, e.g., feature flag tuned on/off (that data is hidden from us so far). The result should help to explain each change. Additionally, to measure the magnitude of the change I would suggest calculating the entropy based imbalance of the data between changes (see my last paper how to do that). For example, that could tell how stable or not stable performance had become after particular change. 

After my 1st initial Peorfomalist usage, more rigorous usage was done against MongoDB dataset, based on which the following paper was written and accepted for data challenge track of the conference:

"Change Point Detection for MongoDB Time Series Performance Regression" paper for ACM/SPEC ICPE 2022 Data Challenge Track


"Detecting Past and Future Change Points in Performance Data" - research paper preprint

Our research paper was accepted for  ORAL PRESENTATION at  ICTDsC 2024  in India. We were not able to go there and plan to publish that late...