Our research paper was accepted for ORAL PRESENTATION at ICTDsC 2024 in India.
We were not able to go there and plan to publish that later at another place. The abstract is below.
TruTech Development, LLC
Our research paper was accepted for ORAL PRESENTATION at ICTDsC 2024 in India.
We were not able to go there and plan to publish that later at another place. The abstract is below.
ABSTRACT: The MASF/SETDS method of detecting changes and anomalies in performance data, its recent implementation and the way to use and interpret the results will be presented with real examples against MangoDB testing data.
Since 1995, a time when the CMG conference published a very influential paper about the MASF method of anomaly detection, this topic has been increasingly more popular in the area of Capacity and Performance management. The modification of the MASF – SETDS method – was introduced in the 2002 CMG best paper and then got implemented in several companies and applications. This included the CMG online class “Perfomalies (Performance Anomaly) Detection”. Most recently this method has been turned into the cloud based serverless API microservice which is available for free via “Perfomalist” web app. The Perfomalist change points detection API was used against MongoDB’s testing data and got acceptable results which were published in the SPEC.org 2022 conference. This paper included an additional post-processing algorithm (XGBoost) to eliminate false positives which is planned to be added to Perfomalist service.
The paper about using #Perfomalist "Change Point Detection for #MongoDB Time Series Performance Regression" was cited in the following paper: "Estimating Breakpoints in Piecewise Linear Regression Using #MachineLearning Methods", where our method was mentioned as " … offer a hybrid change point detection system..."
We are participating in the data challenge for icpe2022.spec.org conference.
The challenge dataset is provided by MongoDB.
Initially some small part of the data was used to prove that Perfomalist CPD API can be used.
Data looks like a big data cube with numerous dimensional variables and two factual ones (datetime and value). I took one case with a particular slice of this cube and processed that (datetime-value) by calling the Perfomalist API. The result I have plotted using Excel and can be seen in the following picture.
That (meta-) data then should be correlated with events happening (or not happening) around any change dates detected, e.g., feature flag tuned on/off (that data is hidden from us so far). The result should help to explain each change. Additionally, to measure the magnitude of the change I would suggest calculating the entropy based imbalance of the data between changes (see my last paper how to do that). For example, that could tell how stable or not stable performance had become after particular change.
After my 1st initial Peorfomalist usage, more rigorous usage was done against MongoDB dataset, based on which the following paper was written and accepted for data challenge track of the conference:
LINK to paper: https://www.trub.in/2022/01/performance-anomaly-and-change-point.html
Intelligent Sustainable Systems pp 403-407| Cite as
Perfomalist (www.Perfomalist.com) is a web based anomaly and change point detection tool. The method used by the tool is SETDS - Statistical Exception and Trend Detection System, which is a variation of the Statistical Process Control method applied to time series data. The key idea of the method is EV (Exception Value) which indicates the severity of anomalies calculated as a difference between control limits and actual anomalous data points. Any change that occurs first would appear as an anomaly and then may become a normality (new norm), so collecting overtime and analyzing the severity of all anomalies opens the possibility to find phases in the data history with different patterns. To detect change points between phases one just needs to find all the roots of the following equation: EV(t)=0 , where t is time. [1]. Using this method the Perfomalist API call returns all change points found in the input CSV data.
[1] - Igor Trubin, "Exception Based Modeling and Forecasting" , 34th International Computer Measurement Group Conference, December 7-12, 2008, Las Vegas, Nevada, USA, ProceedingsLink to tool: www.Perfomalist.com
Control Points API
POST
https://api.perfomalist.com/
'Accept: text/plain'
'Content-Type: text/csv'
Input
Post body should be input data in CSV format. First three lines are parameters also in CSV format.
For example:
sValue, 99
eValue, 5
BaseLineLength , 7
These may be omitted in which case default values will be used.
Parameters are followed by data as shown in example input which could downloaded from www.Perfomalist.com.
Output
Output is JSON style data:
{
"Change Point": { #full list of values for respective dates, populated by zeroes if no change point detected to aid with graphing
"Date": value
},
"Change Points Only": { #only dates of change points with respective values
"Change Point": {
"Date": value
}
},
"Ev": { #exeption values for respective dates
"Date": value
},
"LCL": { #lower control limit value for respective dates
"Date": value
},
"Moving Average": { #moving average value for respective dates
"Date": value
},
"UCL": { #upper control limit value for respective dates
"Date": value
},
"Value": { #user input value for respective dates
"Date": value
}
}
EXAMPLE 1 is applied against the sample data from www.Performalist.com by Original Change Point Detection method explained here:
Our research paper was accepted for ORAL PRESENTATION at ICTDsC 2024 in India. We were not able to go there and plan to publish that late...