Abstract |
The ability to accurately predict the amount and type of resources needed to sustain a
desired level of service is an important enabler of goal-oriented application performance
management. While the use of systematic measurements for building performance
prediction models is a well
studied topic, little attention has been paid so far on the
application space of data-intensive systems using NoSQL databases. In this thesis we
introduce and evaluate a measurement-based approach to performance prediction of
data-intensive applications over NoSQL systems. Measurement-based performance
prediction approaches are often limited by a relatively narrow range of hardware
characteristics available within each organization's private infrastructure. An opportunity
to change this fact is the emergence of federated, large-scale, service-oriented research
infrastructures, featuring a variety of heterogeneous hardware. This thesis demonstrates
accurate measurement-based prediction of Yahoo Cloud Serving Benchmark (YCSB)
performance over NoSQL systems in such infrastructures. We consider three regression
techniques: Multivariate adaptive regression splines (MARS), support vector regression
(SVR), and artificial neural network (ANN) regression. We find that all three techniques
achieve performance prediction with average accuracy of over 90%, with MARS yielding
the best results. We extend our results focusing on MARS and a virtualized private cloud
environment with two NoSQL data stores, MongoDB and RethinkDB. Despite their
differences, we find that MARS achieves accurate performance prediction on both data
stores with an average accuracy of 95% across cases considered. This result points to the
potential of applying our methodology to a broader set of NoSQL systems and
deployment environments
|