This is part 1 of the 3 part series “The path to your first SLO”.
When talking about building an observability practice, many customers we talked to struggled on what to observe and usually frustrated with the alarm storms or false alarms. ITOps are concerned about centralized monitoring and gather metrics from different systems for proactive monitoring. App Owners are interested in the ability for fast root cause analysis and end-to-end tracing capabilities. Usually the ITOps take the role of first tier monitoring on the vital health signals of different systems and alert the right app teams for in-depth diagnostics.
The requirements are clear. Applications need to supply the right metrics to ITOps. That may range from simple system up/down availability metrics, K8s metrics, CPU/Memory utilization to disk consumption information. The challenge usually comes from gathering application metrics from the app layer. No matter what monitoring tools you use the following golden signals for observability are usually what you need.
Latency – the request response time
Traffic – the number of requests per second
Errors – the number of errors or error rate
Saturation – the resource constraints on higher loadings
There are in-depth explanations of each of these all over the web so we do not repeat the details here. The important thing is to observe all 4 signals for each “user journey” or “service endpoint”. As an example, for an ecommerce application, that will be the user journey of “Login”, “Browse Catalog”, “Add to Cart” and “Checkout”.
These high level metrics are what we called “Work metrics”. Combined together with the lower level system metrics – “Resource metrics”. From here, organizations can define the important SLOs (Service Level Objectives) and how to monitor and meet those SLOs with the selected SLIs (Service Level Indicators). These SLIs are the metrics of what to set alerts on – to observe what matters most to your organization. In the next article we will talk about common practice to gather these metrics from leading monitoring tools.
New to SLO?
#SLOconf is a free, virtual event focused on #SLOs! 🔥
Whether you are doing SRE, SLO, or DevOps, or Ops, or a Dev – SLOconf is the perfect platform to share insights and ideas on the latest trends and developments in SRE/SLO.
Vsceptre is a sponsor at SLOconf 2023, hosted by Nobl9! 📢
For more details & speaker lineup, register here: 👇
www.sloconf.com