
On implementing and evaluating these changes, we find that while they do improve the overall prediction accuracy, prediction error remains high for multiple realistic scenarios, showing how ML fails as a general predictor.Sherlock Holmes Books: Complete List and Fun FactsReferring to himself as a 'consulting detective' in the stories, Holmes is known for his proficiency with observation, forensic science, and logical reasoning that borders on the fantastic, which he employs when investigating cases for a wide variety of clients, including Scotland Yard. Our findings motivate the need for system-level modifications and/or ML-level extensions that can improve predictability, showing how ML fails to be an easy-to-use predictor. We find that 12 out of our 13 applications exhibit inherent variability in performance that fundamentally limits prediction accuracy.



We apply our methodology to test 6 ML models in predicting the performance of 13 real-world applications. We develop a methodology for systematically diagnosing whether, when, and why ML does (not) work for performance prediction, and identify steps to improve predictability. In this paper, we attempt to answer this broader question. Yet, a critical question remains unanswered: does ML make prediction simpler (i.e., allowing us to treat systems as blackboxes) and general (i.e., across a range of applications and use-cases)? After all, the potential for simplicity and generality is a key part of what makes ML-based prediction so attractive compared to the traditional approach of relying on handcrafted and specialized performance models. server configuration, capacity planning). There is a growing body of work that reports positive results from applying ML-based performance prediction to a particular application or use-case (e.g.
