Skip to main content

The best configuration manager for Nagios: Google Docs, of course!

Now, I'm not a fan of Nagios and I always recommend OpenNMS over Nagios, but when a client is fixated with Nagios I take a deep breath and get work done with it too.

Now it just happened that I couldn't convince a customer to use OpenNMS so I decided that if I really had to use Nagios I would do it in a way as innovative as possibile.

The first phase in this kind of projects is usually gathering requirements, that is hosts/appliances to be monitored. So I opened up a Google Document spreadsheet and started typing. At a certain point it hit me, what if I could make this doc the source for all configuration and just be done with it?

My spider sense were tingling and I knew I had just found a way to make a dull project an interesting and blog-worthy one.

I created a spreadsheet like the one in the picture with only 4 columns: ip, name/description, location, groups.

After that I shared the spreadsheet as csv and grabbed the url.
From the shell I could now fetch the csv file as shown in the terminal screenshot. Even better changes are automatically published after a modification. This way the customer can easily update the list of nodes to be monitored from a friendly spreadsheet and I don't have to mess with complicated Nagios configuration add-ons.

Now the hard part, how do I get from a csv file to a full-blown Nagios config?

That bit is easily taken care by a python script. Python (besides being already installed in every linux distro) has great suport for cvs files and has a sophisticated templating engine called Jinja (I had learned about Jinja while playing around with SaltStack).

Let code speak

The python script (yes, all of it) is embedded in the following gist (link to the full gist, templates included):

The templates are stored in a subdirectory called templates and the csv file is fetched from Google Docs with wget instead of Python own urllib to ease debugging (the last copy of the file is left in the directory for post-mortem inspection).
The following is an example template to generate the configuration for all host (through a loop on the host list). Note that the configuration is intentionally kept at a minimum; all parameters are inherited through the hostgroup and the template because only in this way we can keep the configuration consistent across all hosts and groups and avoid repetition.


What's still missing?

The whole services, and services to hostgroups/templates assignment, but since that varies greatly depending on each other requirements I intentionally left it blank in this post. Rest assured my customer is seeing all those fancy graphs on her browser now.

I'm going to add a shell script to wrap-up the wget, python, nagios restart (and perhaps even versioning) phases in one easy piece, so keep an eye on the gist or this blog for updates.

Comments

Popular posts from this blog

Mirth: recover space when mirthdb grows out of control

I was recently asked to recover a mirth instance whose embedded database had grown to fill all available space so this is just a note-to-self kind of post. Btw: the recovery, depending on db size and disk speed, is going to take long. The problem A 1.8 Mirth Connect instance was started, then forgotten (well neglected, actually). The user also forgot to setup pruning so the messages filled the embedded Derby database until it grew to fill all the available space on the disk. The SO is linux. The solution First of all: free some disk space so that the database can be started in embedded mode from the cli. You can also copy the whole mirth install to another server if you cannot free space. Depending on db size you will need a corresponding amount of space: in my case a 5GB db required around 2GB to start, process logs and then store the temp files during shrinking. Then open a shell as the user that mirth runs as (you're not running it as root, are you?) and cd in

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive. Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

How to automatically import a ZFS pool built on top of iSCSI devices with systemd

When using ZFS on top of iSCSI devices one needs to deal with the fact that iSCSI devices usually appear late in the boot process. ZFS on the other hand is loaded early and the iSCSI devices are not present at the time ZFS scans available devices for pools to import. This means that not all ZFS pools might be imported after the system has completed boot, even if the underlying devices are present and functional. A quick and dirty solution would be to run  zpool import <poolname> after boot, either manually or from cron. A better, more elegant solution is instead to hook into systemd events and trigger zpool import as soon as the devices are created.