Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the ajax-load-more-anything domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/devwp/public_html/p225-newweb/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wordpress-seo domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/devwp/public_html/p225-newweb/wp-includes/functions.php on line 6114
Monitoring temperature of my DietPi Homelab cluster with Grafana Cloud - Vsceptre

Monitoring temperature of my DietPi Homelab cluster with Grafana Cloud

20 Aug 2023

Observability

Monitoring temperature of my DietPi Homelab cluster with Grafana Cloud

Problem Statement

At around end of March, I want to get my hands on the old raspberry pi cluster again as I need a testbed for K8S, ChatOps, CI/CD etc. The DevOps ecosystem in 2023 is more ARM ready compared with 2020 which makes building a usable K8S stack on Pi realistic. I upgraded from a 4 nodes cluster to a 7 Pi4 nodes with POE capabilities, SSD, USB and sitting inside a nice 1U rack. Then spending the next two months’ time on testing various OS. Re-installing the whole stack multiple times and struggling with the home router is fun. At the end the cluster is up with all platform engineering tools deployed.

From the software perspective it all works fine. However, I quickly realize that the ventilation inside the rack is not good as what I expected. Although each Pi POE head comes with a fan and heatsink added the CPU temperature can easily goes above 60 degrees after an hour or so. Although the Pi rack comes with some nice LED displays with information on the temp of the Pi but that’s a bit small to look at. I need to find a solution to monitor the temperature of the whole stack easily. Ideally the monitoring can be presented on a nice dashboard with alerting capabilities. I decided to work on a solution to address the problem but keep coding to a minimum.

Solution

I want to share how I derived a solution for this without getting into too much (but unavoidable) technical detail. The goal is to build something fast and easily accessible. At the same time avoid toil in deployment.

Step 1: Writing shell script to temperature as metrics in Prometheus scrap-able format.
#!/bin/s while true; do ncat -l -p 7027 --sh-exec '\ C="rpitemp $(cat /sys/class/thermal/thermal_zone0/temp | sed "s/\([0-9]\{2\}\)/\1./")"; \ printf "HTTP/1.0 200 OK\nContent-Length: ${#C}\n\n$C"' done
I am using dietpi OS and this few lines of code can provide me a simple line of “rpitemp 39.5” from port 7027. Ready for scraping by Prometheus.
Step 2: Use ansible to deploy the script to all nodes.
I turned the above into a script, use ansible to copy the file to all nodes and set the script to run at boot. Below is the ansible playbook.yml.
- name: Set cron job for getting temperature from all worker node   hosts: workers   tasks:   - name: Copy temp.sh file to all worker nodes     become: true      copy:       src: /home/dietpi/temp.sh       dest: /home/dietpi/temp.sh       owner: root       group: root               mode: 0700   - name: Create an entry in crontab for getting the pi temp on every restart     ansible.builtin.cron:       name: "a job for reboot"       special_time: reboot       job: "/home/dietpi/temp.sh &"
Step 3: Start a prometheus docker on the control node and scrap temperature metrics from all worker nodes.
Below is the prometheus.yml file. Using remote_write to send metrics from all worker nodes to Grafana Cloud.
global:   scrape_interval: 15s   evaluation_interval: 15s scrape_configs:   - job_name: "piet_temp_control01"     metrics_path: '/'     static_configs:       - targets: ["10.0.0.100:7027"]       - targets: ['10.0.0.101:7027']       - targets: ['10.0.0.102:7027']       - targets: ['10.0.0.103:7027']       - targets: ['10.0.0.104:7027']       - targets: ['10.0.0.105:7027']       - targets: ['10.0.0.106:7027'] # remote write location remote_write:   - url: [granfana cloud URL]     basic_auth:       username: [username]       password: [password]
Step 4: Sign up a Grafana cloud free tier account
The free tier account comes with 10 free dashboards quota, that’s enough for my simple use case. I choose to use Grafana cloud for building the dashboard to avoid the trouble of getting into the Pi cluster to access the local Grafana instance. Also, it will be easier for alert integration using Grafana Cloud.
Step 5: Building the dashboard
This is probably the easiest part. Grafana comes with nice dash-boarding capability for different kind of use cases. I haven’t finished setting up the alert as I am still struggling how I want to get the alert. Looking at the figures of the dashboard I probably need to open the case for better ventilation or add additional 40mm fans to the bottom or back of the rack for better cooling. Hopefully I can run this cluster 24×7 after solving the temperature issue.
The free tier account comes with 10 free dashboards quota, that’s enough for my simple use case. I choose to use Grafana cloud for building the dashboard to avoid the trouble of getting into the Pi cluster to access the local Grafana instance. Also, it will be easier for alert integration using Grafana Cloud.

Conclusion

This is an interesting side project outside of the K8S cluster on DietPi OS. Raspberry pi’s are common today as SBC for edge computing and remote IoT use cases for digital displays and edge gateways. Remember temperature is another important metric to monitor for your edge computing devices to ensure system availability and reliability.

Lindsay Chung

Head of Solutions Engineering

Related Articles

Demystifying Log to Trace correlation in DataDog

Demystifying Log to Trace correlation in DataDog

At around end of March, I want to get my hands on the old raspberry pi cluster again as I need a testbed for K8S, ChatOps, CI/CD etc. The DevOps ecosystem in 2023 is more ARM ready compared with 2020 which makes building a usable K8S stack on Pi realistic. I upgraded from a 4 nodes cluster to a 7 Pi4 nodes with POE capabilities, SSD, USB and sitting inside a nice 1U rack. Then spending the next two months’ time on testing various OS. Re-installing the whole stack multiple times and struggling with the home router is fun. At the end the cluster is up with all platform engineering tools deployed.

Log Sensitive Data Scrubbing and Scanning on Datadog

Log Sensitive Data Scrubbing and Scanning on Datadog

In today’s digital landscape, data security and privacy have become paramount concerns for businesses and individuals alike. With the increasing reliance on cloud-based services and the need to monitor and analyze application logs, it is crucial to ensure that sensitive data remains protected. Datadog offers robust features to help organizations track and analyze their logs effectively.

Setting up the first SLO

Setting up the first SLO

This is the final piece of the 3 part series “The path to your first SLO”.
We have discussed on the basics of what to observe and how to get the relevant metrics in part 1 and part 2 of this series. This time we are going to have a quick look on to setup a simple service availability monitoring SLO with Nobl9 and SolarWinds Pingdom.

This site is registered on wpml.org as a development site.

Notice: ob_end_flush(): failed to send buffer of zlib output compression (1) in /home/devwp/public_html/p225-newweb/wp-includes/functions.php on line 5464

Notice: ob_end_flush(): failed to send buffer of zlib output compression (1) in /home/devwp/public_html/p225-newweb/wp-includes/functions.php on line 5464