Skip to main content

3 new features that I wish were in OpenNMS 2.0

As a long time OpenNMS user I've been often impressed with its extensibility and the completeness of its feature set. There is support for lots of data collection techniques: from the old school snmp exec extensions, to the http poller, from the JDBC poller to the XML poller and many others that I probably forgot to mention.

Supporting new probes is therefore just a matter of how, not if , it can be done. And with new monitoring tools popping up every day this is clearly good as it allows OpenNMS to keep up with the competition.
So the present looks bright, but what about the future? With OpenNMS 2.0 not yet on the radar I thought I could put together a list of features I would love to have. What do you think of them?


1. Receiving metrics over the Graphite/Carbon protocol

Graphite is the new kid on the monitoring block and its primary concern is with receiving, storing and graphing data. It does not rely on any existing protocol for data collection, but instead invented its own.

The protocol is dead simple: text lines sent over a tcp socket with the following format:
metric-path value timestamp
A client can be as simple as this shell one-liner:
echo -e "local.meaning.of.life 42 `date +%s`\n\n" | nc graphitehost 2003
Enabling the collection of metrics over this protocol would enable OpenNMS to:
  1. use any of the numerous collectd (not OpenNMS collectd) plugins (like tailing a log file and counting istances of certain text patterns)
  2. let applications push custom metrics into OpenNMS through libraries available for most programming languages
  3. reduce XML configuration overhead
  4. leverage it as an extensible platform for integration with other systems
  5. position itself as a Graphite (partial?) replacement (in conjunction with item 2) because Graphite does not do thresholding or alerting
Notes: OpenNMS should find a way to correlate the metric to the node by looking at the first component in the metric-path and matching it with the node name or the node id and should handle the creation of new metrics on the fly without any configuration.

2. Expose collected metrics over JSON

RRD png graphs are fine, but to be cool and hang out with the new kids you have to render stuff in the browser and do it like this:

Graphs like that require that metrics be accessible through JSON and recent versions of RRD already support exporting to this format.
Checking this item would also open a new world of possibilities as people could write graphs and dashboards in javascript/html which nowadays seem to be way more popular than properties files (and I can see why).

3. Provide a pub/sub event bus over an open message protocol (amqp, openwire, jms, etc)

OpenNMS already uses an event bus internally and it would be super cool if events could also be broadcasted (and received) over an open protocol like AMQP.
Broadcasting would be especially useful for:
  1. handling of notifications outside of OpenNMS with systems like PagerDuty, or simply with scripts that will send notifications to different recipients/endpoints depending on the affected service/node (this has always been a pain point with the current OpenNMS notification system)
  2. complex event processing with tools like Esper
  3. relying of events into other systems for trouble ticketing, performance analysis, correlation
  4. automated event handling: restarting hung services, killing runaway processing, relocating instances, etc
As for receving I can't see any big driver for implementation yet as OpenNMS already has send-event.pl and I guess we could live with it if just someone made a Java client. Of course the adoption of AMQP or similar protocol would remove the necessity for this client entirely.

Comments

Popular posts from this blog

Mirth: recover space when mirthdb grows out of control

I was recently asked to recover a mirth instance whose embedded database had grown to fill all available space so this is just a note-to-self kind of post. Btw: the recovery, depending on db size and disk speed, is going to take long. The problem A 1.8 Mirth Connect instance was started, then forgotten (well neglected, actually). The user also forgot to setup pruning so the messages filled the embedded Derby database until it grew to fill all the available space on the disk. The SO is linux. The solution First of all: free some disk space so that the database can be started in embedded mode from the cli. You can also copy the whole mirth install to another server if you cannot free space. Depending on db size you will need a corresponding amount of space: in my case a 5GB db required around 2GB to start, process logs and then store the temp files during shrinking. Then open a shell as the user that mirth runs as (you're not running it as root, are you?) and cd in

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive. Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

How to automatically import a ZFS pool built on top of iSCSI devices with systemd

When using ZFS on top of iSCSI devices one needs to deal with the fact that iSCSI devices usually appear late in the boot process. ZFS on the other hand is loaded early and the iSCSI devices are not present at the time ZFS scans available devices for pools to import. This means that not all ZFS pools might be imported after the system has completed boot, even if the underlying devices are present and functional. A quick and dirty solution would be to run  zpool import <poolname> after boot, either manually or from cron. A better, more elegant solution is instead to hook into systemd events and trigger zpool import as soon as the devices are created.