Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the ajax-load-more-anything domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/devwp/public_html/p225-newweb/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wordpress-seo domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/devwp/public_html/p225-newweb/wp-includes/functions.php on line 6114
How to identify the golden metrics for SRE - Vsceptre

How to identify the golden metrics for SRE

29 Apr 2023

News, Observability

How to identify the golden metrics for SRE

This is part 1 of the 3 part series “The path to your first SLO”.

When talking about building an observability practice, many customers we talked to struggled on what to observe and usually frustrated with the alarm storms or false alarms. ITOps are concerned about centralized monitoring and gather metrics from different systems for proactive monitoring. App Owners are interested in the ability for fast root cause analysis and end-to-end tracing capabilities. Usually the ITOps take the role of first tier monitoring on the vital health signals of different systems and alert the right app teams for in-depth diagnostics.

The requirements are clear. Applications need to supply the right metrics to ITOps. That may range from simple system up/down availability metrics, K8s metrics, CPU/Memory utilization to disk consumption information. The challenge usually comes from gathering application metrics from the app layer. No matter what monitoring tools you use the following golden signals for observability are usually what you need.

Latency – the request response time

Traffic – the number of requests per second

Errors – the number of errors or error rate

Saturation – the resource constraints on higher loadings

There are in-depth explanations of each of these all over the web so we do not repeat the details here. The important thing is to observe all 4 signals for each “user journey” or “service endpoint”. As an example, for an ecommerce application, that will be the user journey of “Login”, “Browse Catalog”, “Add to Cart” and “Checkout”.

These high level metrics are what we called “Work metrics”. Combined together with the lower level system metrics – “Resource metrics”. From here, organizations can define the important SLOs (Service Level Objectives) and how to monitor and meet those SLOs with the selected SLIs (Service Level Indicators). These SLIs are the metrics of what to set alerts on – to observe what matters most to your organization. In the next article we will talk about common practice to gather these metrics from leading monitoring tools.

New to SLO?
#SLOconf is a free, virtual event focused on #SLOs! 🔥
Whether you are doing SRE, SLO, or DevOps, or Ops, or a Dev – SLOconf is the perfect platform to share insights and ideas on the latest trends and developments in SRE/SLO.
Vsceptre is a sponsor at SLOconf 2023, hosted by Nobl9! 📢
For more details & speaker lineup, register here: 👇
www.sloconf.com

Related Articles

Demystifying Log to Trace correlation in DataDog

Demystifying Log to Trace correlation in DataDog

At around end of March, I want to get my hands on the old raspberry pi cluster again as I need a testbed for K8S, ChatOps, CI/CD etc. The DevOps ecosystem in 2023 is more ARM ready compared with 2020 which makes building a usable K8S stack on Pi realistic. I upgraded from a 4 nodes cluster to a 7 Pi4 nodes with POE capabilities, SSD, USB and sitting inside a nice 1U rack. Then spending the next two months’ time on testing various OS. Re-installing the whole stack multiple times and struggling with the home router is fun. At the end the cluster is up with all platform engineering tools deployed.

Log Sensitive Data Scrubbing and Scanning on Datadog

Log Sensitive Data Scrubbing and Scanning on Datadog

In today’s digital landscape, data security and privacy have become paramount concerns for businesses and individuals alike. With the increasing reliance on cloud-based services and the need to monitor and analyze application logs, it is crucial to ensure that sensitive data remains protected. Datadog offers robust features to help organizations track and analyze their logs effectively.

Monitoring temperature of my DietPi Homelab cluster with Grafana Cloud

Monitoring temperature of my DietPi Homelab cluster with Grafana Cloud

At around end of March, I want to get my hands on the old raspberry pi cluster again as I need a testbed for K8S, ChatOps, CI/CD etc. The DevOps ecosystem in 2023 is more ARM ready compared with 2020 which makes building a usable K8S stack on Pi realistic. I upgraded from a 4 nodes cluster to a 7 Pi4 nodes with POE capabilities, SSD, USB and sitting inside a nice 1U rack. Then spending the next two months’ time on testing various OS. Re-installing the whole stack multiple times and struggling with the home router is fun. At the end the cluster is up with all platform engineering tools deployed.

This site is registered on wpml.org as a development site.

Notice: ob_end_flush(): failed to send buffer of zlib output compression (1) in /home/devwp/public_html/p225-newweb/wp-includes/functions.php on line 5464

Notice: ob_end_flush(): failed to send buffer of zlib output compression (1) in /home/devwp/public_html/p225-newweb/wp-includes/functions.php on line 5464