Showing posts with label monitoring. Show all posts
Showing posts with label monitoring. Show all posts

Thursday, July 11, 2013

Book Review: Instant OpenNMS Starter

Disclaimer: Packt kindly sent me a free copy for review.

TL;DR: Rating 4/5. Recommended for beginners and intermediate.

The book itself is short, but packed with information. A fast reader with some experience with OpenNMS should be able to finish it in 4 to 6 hours. Beginners will probably want to follow the pointers to the online documentation, check the configuration files and possibly experiment so they should allocate more time.

Before being published the book has been reviewed by Jeff Gehlbach. Anyone who has been involved with OpenNMS for some time know him, as he is one of the many brilliant minds working for the OpenNMS company, the commercial entity which develops and supports OpenNMS. Surely his involvement serves as a kind of seal of quality for the book. I for one was surprised by the clarity with even the most complex aspects of OpenNMS were presented in such a short text.

Instant OpenNMS Starter is divided in three main parts: installation, quick start and an advanced section that the book calls ‘the top 5 features’. The final section is a reference of sites and humans with more information on OpenNMS.

The author has been careful to link to the relevant sections of the online wiki when he felt that the wiki content was adequate, without devoiding the book of any additional practical information. For instance in the installation section he actually describes a more secure way of installing OpenNMS than that described in the online user guide and he does so by simply citing the extra steps while leaving to the online documentation to specify the rest.

The quick start section is useful for those in a hurry to just have something monitored with OpenNMS and needing to for a pointer on what all those links in the web ui do.

The advanced section is where probably you will spent most of your time as it describes the most interesting features of OpenNMS which are:
  1. service assurance through polling
  2. data collection through collectors
  3. thresholds and notifications
  4. events, alarms and automations
  5. reports

IMHO one glaring omission in this list is the Provisioning system which was introduced with OpenNMS 1.8 and is a key feature because it covers a critical aspect: how nodes are added into OpenNMS for monitoring. I reread the book twice hoping that perhaps it was mistake on my part, but I could not find a single reference to it.

The book covers each of the five areas with enough depth to give a dedicated beginner useful pointers and background on how to implement the most advanced features of OpenNMS. The author again intelligently uses links to the online wiki to extend the text.
Only the section on reports felt a little thin. In defense of the author one could say that the reports area is so complex that it would have quickly grown out of hand for this kind of book. Perhaps in a second edition he should consider expanding it to at least mention the possibility of creating Jasper reports from collected data.

Instant OpenNMS Starter is clearly aimed at, and I recommend it for, people starting with OpenNMS, evaluating it or who might have inherited a working installation and now have to maintain it. Users seeking to master one of the 5 areas listed above should certainly consider buying it when the online bits and pieces feel not enough or too sparse.
By the title it should come as no surprise that advanced users are not likely to find any new or useful information at all, but, again given the price and the short text, it could still be used as a kind of self-check.

Update Oct/2/2013

There was a brief exchange of emails on the opennms-discuss mailing list with the author, I think it gives useful context to some of the items in my review. I reproduce it under here in full (link) :

Regarding Provisiond, I agree that it needs to be there. When I wrote the book I had to follow very specific guidelines from the publisher. In the top 5 features section I had to decide how to organize it. It was either going to be Capsd or the new and improved Provisiond as one of the 5. When I wrote it, Capsd was still enabled by default and I thought it was easier to get started with. If I would redo it now, I would change the section to Provisiond with a simple mention of how it evolved. In fact, I am preparing this very section now and will make it available on my site. Would be nice to have a revised edition though, I'll check with the publisher. Regarding reports I think it would be nice to have a similar short book of its own on the subject, going through OpenNMS' default reporting capabilities, more advanced custom JasperReports and maybe some modern ajax report dashboards built on top of the nice RESTful API (something I've been wanting to explore). Just thoughts...

Monday, May 20, 2013

Monitoring Oracle tablespace quota with OpenNMS

Going beyond the normal application availability check

One interesting use of the OpenNMS JDBC poller is for extracting data from the Oracle administrative database tables, for example tracking tablespace quota usage to detect quota exhaustion, sudden usage peaks and graph usage over time.

Graph of quota usage for user [redacted] on tablespace DAT.
Notice the cleaning operation running at 3.30 AM
Tablespace quotas is a feature present in the Oracle database that allows the DBA to set a limit on the amount of storage that any given user can consume on a specific tablespace. This allows the DBA to share tablespaces across users yet still be able to policy users into predefined usage boundaries. When a user consumes all its quota it can no longer store data, but it can delete it, thus allowing self-recovery.

Friday, February 01, 2013

3 new features that I wish were in OpenNMS 2.0

As a long time OpenNMS user I've been often impressed with its extensibility and the completeness of its feature set. There is support for lots of data collection techniques: from the old school snmp exec extensions, to the http poller, from the JDBC poller to the XML poller and many others that I probably forgot to mention.

Supporting new probes is therefore just a matter of how, not if , it can be done. And with new monitoring tools popping up every day this is clearly good as it allows OpenNMS to keep up with the competition.
So the present looks bright, but what about the future? With OpenNMS 2.0 not yet on the radar I thought I could put together a list of features I would love to have. What do you think of them?

Sunday, January 20, 2013

Triggering OpenNMS notifications when patterns occur in a log file

A common problem with OpenNMS is how to monitor a log file and trigger alerts when certain conditions are met. Let me clarify with an example: you have this mission critical app that sometimes experiences internal errors. The application keeps running and still responds to requests, but the error will slow down the system and/or delay further processing. Monitoring the process and/or network polling will obviously not be able to detect the issue and the only way is to tail the application log file and look for certain messages.

The problem can usually be solved simply by forwarding the log file to OpenNMS through syslog, but what for logs generated by applications that don't speak syslog or if you don't want to configure syslog forwarding?

Wednesday, June 27, 2012

Monitoring QNAP devices with OpenNMS

QNAP devices have snmp support out-of-the-box, unfortunately the agent they ship with is almost unusable. At first it seems it supports lots of cool features (like fans, temperature, smart, etc) but if you take a little time to dig deeper you will notice that almost all key entries are, what?!, octetStrings.
Capacity reported as a String, QNAP what were you thinking?

So good luck estimating disk usage when it is reported as a string: '1.8TB' (I quote exactly as it is shown by mibbrowser).

Without wasting any more of your (and mine) time let's fix that by installing the Optware QPKG and then installing net-snmp from the ipkg web console. The whole process is traightforward, just make sure to have a recent firmware:
  1. log in to the QNAP admin web interface
  2. open Applications servers and then select QPKG Center
  3. from the Available tab install Optware and the enable it from the Installed tab
  4. now access the Optware web interface and search for net-snmp, then click the install button on the net-snmp package
At this point net snmp is running, we only need to configure it and to do that we will ssh into the device.
Once logged in edit the configuration file /opt/etc/snmpd.conf and then have the snmpd daemon reload its configuration by issuing a SIGHUP:
killall -HUP snmpd

Key configuration directives to edit:
  • sysLocation and sysContact , ├ža va san dir
  • uncomment or delete all disk entries and add an includeAllDisks 5% directive instead
  • change communities as per your organization policies
After that rescan the node from the opennms web ui and enjoy the new graphs.

The best configuration manager for Nagios: Google Docs, of course!

Now, I'm not a fan of Nagios and I always recommend OpenNMS over Nagios, but when a client is fixated with Nagios I take a deep breath and get work done with it too.

Now it just happened that I couldn't convince a customer to use OpenNMS so I decided that if I really had to use Nagios I would do it in a way as innovative as possibile.

The first phase in this kind of projects is usually gathering requirements, that is hosts/appliances to be monitored. So I opened up a Google Document spreadsheet and started typing. At a certain point it hit me, what if I could make this doc the source for all configuration and just be done with it?

My spider sense were tingling and I knew I had just found a way to make a dull project an interesting and blog-worthy one.

I created a spreadsheet like the one in the picture with only 4 columns: ip, name/description, location, groups.

After that I shared the spreadsheet as csv and grabbed the url.
From the shell I could now fetch the csv file as shown in the terminal screenshot. Even better changes are automatically published after a modification. This way the customer can easily update the list of nodes to be monitored from a friendly spreadsheet and I don't have to mess with complicated Nagios configuration add-ons.

Now the hard part, how do I get from a csv file to a full-blown Nagios config?

That bit is easily taken care by a python script. Python (besides being already installed in every linux distro) has great suport for cvs files and has a sophisticated templating engine called Jinja (I had learned about Jinja while playing around with SaltStack).

Let code speak

The python script (yes, all of it) is embedded in the following gist (link to the full gist, templates included):

The templates are stored in a subdirectory called templates and the csv file is fetched from Google Docs with wget instead of Python own urllib to ease debugging (the last copy of the file is left in the directory for post-mortem inspection).
The following is an example template to generate the configuration for all host (through a loop on the host list). Note that the configuration is intentionally kept at a minimum; all parameters are inherited through the hostgroup and the template because only in this way we can keep the configuration consistent across all hosts and groups and avoid repetition.

What's still missing?

The whole services, and services to hostgroups/templates assignment, but since that varies greatly depending on each other requirements I intentionally left it blank in this post. Rest assured my customer is seeing all those fancy graphs on her browser now.

I'm going to add a shell script to wrap-up the wget, python, nagios restart (and perhaps even versioning) phases in one easy piece, so keep an eye on the gist or this blog for updates.