Friday, March 4, 2022

Perfomalist #ChangeDetection API was used against #MongoDB #perfomanceTesting dataset

We are participating in the data challenge for icpe2022.spec.org conference.

The challenge dataset is provided by MongoDB.

Initially some small part of the data was used to prove that Perfomalist CPD API can be used. 

Data looks like a big data cube with numerous dimensional variables and two factual ones (datetime and value). I took one case with a particular slice of this cube and processed that (datetime-value) by calling the Perfomalist API. The result I have plotted using Excel and can be seen in the following  picture. 




IDEA: Potentially some program could be developed to call the CPD API  (i.e.,  Perfomalist) for every data cube slice and to collect change points in a separate table like in the 2nd picture below:


That (meta-) data then should be correlated with events happening (or not happening) around any change dates detected, e.g., feature flag tuned on/off (that data is hidden from us so far). The result should help to explain each change. Additionally, to measure the magnitude of the change I would suggest calculating the entropy based imbalance of the data between changes (see my last paper how to do that). For example, that could tell how stable or not stable performance had become after particular change. 

After my 1st initial Peorfomalist usage, more rigorous usage was done against MongoDB dataset, based on which the following paper was written and accepted for data challenge track of the conference:

"Change Point Detection for MongoDB Time Series Performance Regression" paper for ACM/SPEC ICPE 2022 Data Challenge Track


Monday, January 10, 2022

Perfomalist Release Notes

- Perfomalist 1.1. has now the Change Point Detection API as described in the previous post:


The Change Points Detection Perfomalist API beta version is released. 


Contributors:  Arvid Trubin

Filipp Trubin

 
 

- Perfomalist 1.2. has additional two columns in the table view of the weekly profile to underline two types of anomalies the tool detects: 


    High Anomaly - Unusual high data value for particular hour calculated as Actual - UCL95 (only positive values of the subtraction is populated and represents EV+ which is Exception Value/significance of the anomaly) 

    Low Anomaly - Unusual low data value for particular hour calculated as UCL5 - Actual (only positive values of the subtraction is populated and represents EV- which is Exception Value /significance of the anomaly) 

If the value of Low or/and High Anomaly is "0" the particular hour does not have any anomalies. 
The number of anomalous week hours also counted and printed at the header of the columns in "()".

Contributor: Michael Berdichevsky



Wednesday, December 22, 2021

Perfomalist

Perfomalist (www.Perfomalist.com) is a web based anomaly and change point detection tool. The method used by the tool is SETDS - Statistical Exception and Trend Detection System, which is a variation of the Statistical Process Control method applied to time series data. The key idea of the method is EV (Exception Value) which indicates the severity of anomalies calculated as a difference between control limits and actual anomalous data points. Any change that occurs first would appear as an anomaly and then may become a normality (new norm), so collecting overtime and analyzing the severity of all anomalies opens the possibility to find phases in the data history with different patterns. To detect change points between phases one just needs to find all the roots of the following equation:  EV(t)=0 , where t is time. [1]Using this method the Perfomalist API call returns all change points found in the input CSV data.

[1] - Igor Trubin, "Exception Based Modeling and Forecasting" , 34th International Computer Measurement Group Conference, December 7-12, 2008, Las Vegas, Nevada, USA, Proceedings





Sunday, November 21, 2021

The Change Points Detection Perfomalist API beta version is released. Everybody is welcome to test!

Link to tool: www.Perfomalist.com

Control Points API

POST https://api.perfomalist.com/api/controlpoints.py

'Accept: text/plain'
'Content-Type: text/csv'

Input

Post body should be input data in CSV format. First three lines are parameters also in CSV format.

  • sValue - Statistical band in %, where 100 is UCL=MAX, 0 is UCL=LCL=mean).
  • eValue - Exception Value (EV) threshold in % of actual historical average.
  • BaseLineLength - The time period to compare current value against.
For example:
sValue, 99
eValue, 5
BaseLineLength , 7

These may be omitted in which case default values will be used.

Parameters are followed by data as shown in example input which could downloaded from www.Perfomalist.com


Date, Hour, Value 

7/2/2011,0,236274 
7/2/2011,1,215359 
7/2/2011,2,170011
....

Input data should be provided as a body of the API call.

Output

Output is JSON style data:

{
    "Change Point": {            #full list of values for respective dates, populated by zeroes if no change point detected to aid with graphing
         "Date": value
    },
    "Change Points Only": {      #only dates of change points with respective values
        "Change Point": {
            "Date": value
        }
    },
    "Ev": {                      #exeption values for respective dates
        "Date": value
    },
    "LCL": {                     #lower control limit value for respective dates
        "Date": value
    },
    "Moving Average": {          #moving average value for respective dates
        "Date": value
    },
    "UCL": {                     #upper control limit value for respective dates
        "Date": value
    },
    "Value": {                   #user input value for respective dates
        "Date": value
    }
}
EXAMPLE 1 is applied against the sample data from www.Performalist.com by
using Postman tool:


After copying data to a spreadsheet, the control points could be validated visually:



EXAMLE 2: With a a some step jump event to detect:
Original Change Point Detection method explained here:
http://www.trub.in/2020/08/cpd-change-points-detection-is-planed.html

The next step is to build Perfomalist CPD UI.

The paper about using #Perfomalist "Change Point Detection for #MongoDB Time Series Performance Regression" was cited...

The paper about using  #Perfomalist  " Change Point Detection for  #MongoDB  Time Series Performance Regression " was cited in the...