How to interpret percentiles in google monitoring dashboards?
Are you in charge of designing a dashboard for measuring performance and don't know how to use the metrics provided by the platform? Don't worry; I've got you covered.
Here are an example of how you could use them in your favor
Imagine that you are building a dashboard for measuring cloud function performance. You want to know the Execution time by percentiles: The time taken to execute a function from start to finish. This can be broken down into average, minimum, maximum, and percentiles (e.g., 95th percentile) to understand your function's performance better.
Interpreting execution time by percentiles can provide valuable insights into your function's performance. Percentiles help you understand the distribution of execution times and identify any outliers or bottlenecks. Here's an example of how you might interpret these metrics for a hypothetical function:
Minimum execution time: 50 ms
Maximum execution time: 2000 ms
Average execution time: 300 ms
50th percentile (median) execution time: 250 ms
95th percentile execution time: 1500 ms
99th percentile execution time: 1800 ms
In this example, the minimum execution time is 50 ms, indicating that the function can sometimes execute very quickly. The maximum execution time is 2000 ms, significantly higher than the minimum, suggesting that there might be occasional performance issues or slow external dependencies.
The average execution time of 300 ms gives a general sense of the function's performance but may not provide the whole picture, especially when extreme values are in the distribution.
The median (50th percentile) execution time of 250 ms shows that half of the invocations took less than 250 ms, and the other half took longer. Since the median is lower than the average, there might be some outliers (e.g., very high execution times) that skew the average.
The 95th percentile execution time of 1500 ms indicates that 95% of the invocations took 1500 ms or less to execute. This means that 5% of the requests took longer than 1500 ms, which can be considered slow outliers. Identifying and investigating these slow cases can help optimize your function's performance.
Identifying and investigating these slow cases (95th percentile) can help optimize your function's performance.
The 99th percentile execution time of 1800 ms shows that only 1% of the invocations took longer than 1800 ms. These extreme cases might indicate rare performance issues or edge cases that must be addressed.
By looking at the execution time broken down by percentiles, you can better understand the distribution of your function's performance and identify areas for optimization. Monitoring these metrics over time is essential to detect trends, regressions, or improvements in your function's implementation.
Happy monitoring! May the bits be ever in your favor.