Wednesday, June 27, 2012

The best configuration manager for Nagios: Google Docs, of course!

Now, I'm not a fan of Nagios and I always recommend OpenNMS over Nagios, but when a client is fixated with Nagios I take a deep breath and get work done with it too.

Now it just happened that I couldn't convince a customer to use OpenNMS so I decided that if I really had to use Nagios I would do it in a way as innovative as possibile.

The first phase in this kind of projects is usually gathering requirements, that is hosts/appliances to be monitored. So I opened up a Google Document spreadsheet and started typing. At a certain point it hit me, what if I could make this doc the source for all configuration and just be done with it?

My spider sense were tingling and I knew I had just found a way to make a dull project an interesting and blog-worthy one.

I created a spreadsheet like the one in the picture with only 4 columns: ip, name/description, location, groups.

After that I shared the spreadsheet as csv and grabbed the url.
From the shell I could now fetch the csv file as shown in the terminal screenshot. Even better changes are automatically published after a modification. This way the customer can easily update the list of nodes to be monitored from a friendly spreadsheet and I don't have to mess with complicated Nagios configuration add-ons.

Now the hard part, how do I get from a csv file to a full-blown Nagios config?

That bit is easily taken care by a python script. Python (besides being already installed in every linux distro) has great suport for cvs files and has a sophisticated templating engine called Jinja (I had learned about Jinja while playing around with SaltStack).

Let code speak

The python script (yes, all of it) is embedded in the following gist (link to the full gist, templates included):

The templates are stored in a subdirectory called templates and the csv file is fetched from Google Docs with wget instead of Python own urllib to ease debugging (the last copy of the file is left in the directory for post-mortem inspection).
The following is an example template to generate the configuration for all host (through a loop on the host list). Note that the configuration is intentionally kept at a minimum; all parameters are inherited through the hostgroup and the template because only in this way we can keep the configuration consistent across all hosts and groups and avoid repetition.


What's still missing?

The whole services, and services to hostgroups/templates assignment, but since that varies greatly depending on each other requirements I intentionally left it blank in this post. Rest assured my customer is seeing all those fancy graphs on her browser now.

I'm going to add a shell script to wrap-up the wget, python, nagios restart (and perhaps even versioning) phases in one easy piece, so keep an eye on the gist or this blog for updates.

No comments: