Skip to main content

The best configuration manager for Nagios: Google Docs, of course!

Now, I'm not a fan of Nagios and I always recommend OpenNMS over Nagios, but when a client is fixated with Nagios I take a deep breath and get work done with it too.

Now it just happened that I couldn't convince a customer to use OpenNMS so I decided that if I really had to use Nagios I would do it in a way as innovative as possibile.

The first phase in this kind of projects is usually gathering requirements, that is hosts/appliances to be monitored. So I opened up a Google Document spreadsheet and started typing. At a certain point it hit me, what if I could make this doc the source for all configuration and just be done with it?

My spider sense were tingling and I knew I had just found a way to make a dull project an interesting and blog-worthy one.

I created a spreadsheet like the one in the picture with only 4 columns: ip, name/description, location, groups.

After that I shared the spreadsheet as csv and grabbed the url.
From the shell I could now fetch the csv file as shown in the terminal screenshot. Even better changes are automatically published after a modification. This way the customer can easily update the list of nodes to be monitored from a friendly spreadsheet and I don't have to mess with complicated Nagios configuration add-ons.

Now the hard part, how do I get from a csv file to a full-blown Nagios config?

That bit is easily taken care by a python script. Python (besides being already installed in every linux distro) has great suport for cvs files and has a sophisticated templating engine called Jinja (I had learned about Jinja while playing around with SaltStack).

Let code speak

The python script (yes, all of it) is embedded in the following gist (link to the full gist, templates included):

The templates are stored in a subdirectory called templates and the csv file is fetched from Google Docs with wget instead of Python own urllib to ease debugging (the last copy of the file is left in the directory for post-mortem inspection).
The following is an example template to generate the configuration for all host (through a loop on the host list). Note that the configuration is intentionally kept at a minimum; all parameters are inherited through the hostgroup and the template because only in this way we can keep the configuration consistent across all hosts and groups and avoid repetition.

What's still missing?

The whole services, and services to hostgroups/templates assignment, but since that varies greatly depending on each other requirements I intentionally left it blank in this post. Rest assured my customer is seeing all those fancy graphs on her browser now.

I'm going to add a shell script to wrap-up the wget, python, nagios restart (and perhaps even versioning) phases in one easy piece, so keep an eye on the gist or this blog for updates.


Popular posts from this blog

Indexing Apache access logs with ELK (Elasticsearch+Logstash+Kibana)

Who said that grepping Apache logs has to be boring?

The truth is that, as Enteprise applications move to the browser too, Apache access logs are a gold mine, it does not matter what your role is: developer, support or sysadmin. If you are not mining them you are most likely missing out a ton of information and, probably, making the wrong decisions.
ELK (Elasticsearch, Logstash, Kibana) is a terrific, Open Source stack for visually analyzing Apache (or nginx) logs (but also any other timestamped data).

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive.
Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

RUNDECK job maintenance

Learn more about Rundeck.

Now that I have a fair number of jobs scheduled by Rundeck, how do I periodically prune the job execution history and keep only the last, say, 30 executions for each job?