Friday, March 4, 2022

Perfomalist #ChangeDetection API was used against #MongoDB #perfomanceTesting dataset

We are participating in the data challenge for conference.

The challenge dataset is provided by MongoDB.

Initially some small part of the data was used to prove that Perfomalist CPD API can be used. 

Data looks like a big data cube with numerous dimensional variables and two factual ones (datetime and value). I took one case with a particular slice of this cube and processed that (datetime-value) by calling the Perfomalist API. The result I have plotted using Excel and can be seen in the following  picture. 

IDEA: Potentially some program could be developed to call the CPD API  (i.e.,  Perfomalist) for every data cube slice and to collect change points in a separate table like in the 2nd picture below:

That (meta-) data then should be correlated with events happening (or not happening) around any change dates detected, e.g., feature flag tuned on/off (that data is hidden from us so far). The result should help to explain each change. Additionally, to measure the magnitude of the change I would suggest calculating the entropy based imbalance of the data between changes (see my last paper how to do that). For example, that could tell how stable or not stable performance had become after particular change. 

After my 1st initial Peorfomalist usage, more rigorous usage was done against MongoDB dataset, based on which the following paper was written and accepted for data challenge track of the conference:

"Change Point Detection for MongoDB Time Series Performance Regression" paper for ACM/SPEC ICPE 2022 Data Challenge Track

Monday, January 10, 2022

Perfomalist Release Notes

- Perfomalist 1.1. has now the Change Point Detection API as described in the previous post:

The Change Points Detection Perfomalist API beta version is released. 

Contributors:  Arvid Trubin

Filipp Trubin


- Perfomalist 1.2. has additional two columns in the table view of the weekly profile to underline two types of anomalies the tool detects: 

    High Anomaly - Unusual high data value for particular hour calculated as Actual - UCL95 (only positive values of the subtraction is populated and represents EV+ which is Exception Value/significance of the anomaly) 

    Low Anomaly - Unusual low data value for particular hour calculated as UCL5 - Actual (only positive values of the subtraction is populated and represents EV- which is Exception Value /significance of the anomaly) 

If the value of Low or/and High Anomaly is "0" the particular hour does not have any anomalies. 
The number of anomalous week hours also counted and printed at the header of the columns in "()".

Contributor: Michael Berdichevsky

Perfomalist team is presenting at international conference in Orlando.

PRODUCT: LinkedIn Post ABSTRACT: The MASF/SETDS method of detecting changes and anomalies in performa...