A tour of Riemann : check disk, by, throttle, email
How tu use the (by) stream in Riemann ?
The problem
I now want to monitor disk usage. If a filesystem is 80 % full, fire an email. But i don’t want to be spammed , so i want at most 2 mails every hours for each distinct full filesystem.
I will receive events in Riemann like this one :
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 73.04872131347656
:tags []
:time 1495380355
:ttl 20.0}
Here, the root
fs is 73 % full for host debian-mathieu.corbin
.
You can send email using Riemann. Let’s define a stream to send email.
Create a file mycorp/output/email.clj
:
(ns mycorp.output.email
"send email"
(:require [riemann.config :refer :all]
[riemann.streams :refer :all]
[riemann.test :refer :all]
;; we should import riemann.email
[riemann.email :refer :all]
[clojure.tools.logging :refer :all]))
;; this stream can be used to send email
(def email (mailer {:from "me@mcorbin.fr"
:host "mail.foo.com"
:user "foo"
:password "bar"}))
Here, we use def
to define a new stream named email.
We will use it to send emails.
Throttle
We will use throttle
to limit the number of email.
Take a look at the Riemann howto for more informations about throttle
Solution
Tests
First, let’s create a file mycorp/system/disk.clj
and write the tests for our use case:
(ns mycorp.system.disk
"Check disk"
(:require [riemann.config :refer :all]
[riemann.streams :refer :all]
[riemann.test :refer :all]
[mycorp.output.email :as email]
[clojure.tools.logging :refer :all]))
(def disk-stream)
(tests
(deftest disk-stream-test
;; i inject test events only in disk-stream
(let [result (inject! [mycorp.system.disk/disk-stream]
[;; ok
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 73
:tags []
:time 1
:ttl 20.0}
;; random event
{:host "debian-mathieu.corbin"
:service "random_service"
:state nil
:description nil
:metric 100
:tags []
:time 1
:ttl 20.0}
;; debian-mathieu.corbin/root full
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 3
:ttl 20.0}
;; debian-mathieu.corbin/var-log full
{:host "debian-mathieu.corbin"
:service "df-var-log/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 4
:ttl 20.0}
;; debian-mathieu.corbin/root full
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 4
:ttl 20.0}
;; debian-mathieu.corbin/root full
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 91
:tags []
:time 4
:ttl 20.0}
;; guixsd-mathieu.corbin/root full
{:host "guixsd-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 4
:ttl 20.0}
;; debian-mathieu.corbin/root full
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 93
:tags []
:time 3605
:ttl 20.0}])]
;; :disk-stream-tap-1 should contains all events indicating a full fs
(is (= (:disk-stream-tap-1 result)
[{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 3
:ttl 20.0}
{:host "debian-mathieu.corbin"
:service "df-var-log/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 4
:ttl 20.0}
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 4
:ttl 20.0}
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 91
:tags []
:time 4
:ttl 20.0}
{:host "guixsd-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 4
:ttl 20.0}
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 93
:tags []
:time 3605
:ttl 20.0}]))
;; :disk-stream-tap-2 should contains all events passed to the email stream.
;; for each host/service, we want maximum 2 mails every 3600 seconds
(is (= (:disk-stream-tap-2 result)
[ ;; first debian-mathieu/root
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 3
:ttl 20.0}
;; first debian-mathieu/var-log
{:host "debian-mathieu.corbin"
:service "df-var-log/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 4
:ttl 20.0}
;; second debian-mathieu/root
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 4
:ttl 20.0}
;; first debian-mathieu/guixsd
{:host "guixsd-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 90
:tags []
:time 4
:ttl 20.0}
;; next window (time = 3605), first debian-mathieu/root
{:host "debian-mathieu.corbin"
:service "df-root/percent_bytes-used"
:state nil
:description nil
:metric 93
:tags []
:time 3605
:ttl 20.0}])))))
In this test suite, we have 2 :tap
.
The first one, :disk-stream-tap-1
, will contain all events representing a fs > to 80 %.
The second, :disk-stream-tap-2
, all events actually send by email.
The distinction is important.
Remember, we only want 2 email per hour for each distinct full filesystem to not be spammed.
Look at the :disk-stream-tap-2
tests.
We injected 3 events commented debian-mathieu.corbin/root full
, but in :disk-stream-tap-2
we only had 3, because of throttle
(2 in the first 3600 seconds, 1 after).
Don’t forget to add in riemann.config the new files :
(include "mycorp/output/email.clj")
(include "mycorp/system/ram.clj")
First (incorrect) solution
We saw in a previous article how to perform a simple check. Why not reuse it with throttle
and email
?
(def disk-stream
"Check if disk if > to 80 %, email if it is. Send only 2 email for each alert type."
;; #"percent_bytes-used$" is a regex, we only want events where :service match the regex
(where (and (service #"percent_bytes-used$")
;; Test if disk is 80 % full
(> (:metric event) 80))
(tap :disk-stream-tap-1)
;; 2 events max every 3600 secondes using throttle
(throttle 2 3600
(tap :disk-stream-tap-2)
;; send email using the email stream defined in mycorp.output.email
(io (email/email "foo@mcorbin.fr")))))
Launch riemann test riemann.config
. It fails in the second test (:disk-stream-tap-2
).
Why ? because in this solution, we only send 2 email regardless the host/service fields.
If we have 10 alerts for 10 differents filesystem, with this solution we will send only 2 emails for the 2 first alerts.
We want to have independant throttle
for each host/filesystem.
And for this, we will use the by stream.
Final solution
We just need to add (by)
:
(def disk-stream
"Check if disk if > to 80 %, email if it is. Send only 2 email for each alert type."
;; #"percent_bytes-used$" is a regex, we only want events where :service match the regex
(where (and (service #"percent_bytes-used$")
;; Test if disk is 80 % full
(> (:metric event) 80))
(tap :disk-stream-tap-1)
;; use (by) to have independant streams for each host/service couple
(by [:host :service]
;; 2 events max every 3600 secondes using throttle
(throttle 2 3600
(tap :disk-stream-tap-2)
;; send email using the email stream defined in mycorp.output.email
(io (email/email "foo@mcorbin.fr"))))))
riemann test riemann.config
is now passing !
Conclusion
You now know how to send email, and how to use by
and throttle
.
Code here.
Add a comment
If you have a bug/issue with the commenting system, please send me an email (my email is in the "About" section).