Skip to main content

OpenNMS: PostgreSQL 9.1 tuning

I have just completed an upgrade from OpenNMS 1.8.11 to the latest and greatest 1.10. The upgrade in itself is easy and the guides on the OpenNMS wiki will serve you well. Instead in this post I'll describe a couple of other changes that I made which improved very much the overall performance and responsiveness of the system.

One is the upgrade from PostgreSQL 8.4 (which came with CentOS) to 9.1 + tuning.
The other is switching from apache to nginx.

Upgrading postgres is mostly a matter of taking a backup, pulling in the right repo, running yum install and finally importing the database.

Tuning postgres
I left opennms running on PostgreSQL 9.1 for a while and then I went checking how well postgres was doing. Postgres 9 already performs significantly better that its 8.x predecessors, but I wanted to do better than out-of-the-box.

As the postgres user I logged in into the opennms database to install a utility that will help me estimate how much of the database is being cached. If the server had enough memory I could then configure postgres to use more memory for caching which means less disk i/o and overall better performance (unless your server starts thrashing, that is).

CREATE EXTENSION pg_buffercache;
create view v_database_cache as SELECT c.relname,pg_size_pretty(count(*) * 8192) as buffered,round(100.0 * count(*) /(SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent,round(100.0 * count(*) * 8192 /pg_relation_size(c.oid),1) AS percent_of_relation 
FROM pg_class c INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.oid,c.relname ORDER BY 3 DESC LIMIT 10;

The second command creates a view as a placeholder for the complex query behind it. The query is documented in the (highly recommended) book PostgreSQL 9.0 High performance.
I then queried the database for its size as follows:

opennms=# select pg_size_pretty(pg_database_size('opennms')) as db_size;
 691 MB
(1 row)

Since the whole database is a little less than 700MB and the server has 4GB of RAM I thought that I could raise the shared_buffers param from the measly 32MB that is the default to something more appropriate like 512MB. After that I restarted both opennms and postgres and waited for a while to make sure that the server wasn't swapping. Two days after I queried the view I created above to see how much of the database was in the cache (which means in RAM):

opennms=# select * from v_database_cache;
            relname            | buffered | buffers_percent | percent_of_relation 
 events                        | 266 MB   |            52.0 |                64.5
 event_archives                | 22 MB    |             4.3 |               100.0
 notifications                 | 20 MB    |             3.9 |               100.1
 events_nodeid_display_ackuser | 5616 kB  |             1.1 |                33.2
 outages                       | 5000 kB  |             1.0 |               100.3
 events_ipaddr_idx             | 4496 kB  |             0.9 |                27.5
 events_nodeid_idx             | 4456 kB  |             0.8 |                36.4
 events_uei_idx                | 3648 kB  |             0.7 |                10.5
 iprouteinterface              | 2096 kB  |             0.4 |               101.6
 events_time_idx               | 2256 kB  |             0.4 |                20.0
(10 rows)

Looks good: the events table (the largest by far) and its indexes are mostly cached into RAM!
I could try to raise the shared_buffers value to a number larger than the database size, but since the most expensive relation and its indexes are for more than half cached and since postgres is already using lots of other memory besides shared_buffers I left it as it is.
Performance and responsiveness of the UI improved a lot. I/O wait went down too by a 2-4%.

Swapping NGINX in for Apache
I did it mostly to save CPU and RAM because the amount of resources that even a lightly loaded apache can consume is astounding. Don't take my word for it: google it!
This one's easy too: head to the download page and grab the rpm for your distro.

The net result of the above changes is that the UI is now faster to load and page transitions are smooth without eccessive waiting, even on the node detail page for a system with lots of events.

Hope this helps!


Popular posts from this blog

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive.
Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

Mirth: recover space when mirthdb grows out of control

I was recently asked to recover a mirth instance whose embedded database had grown to fill all available space so this is just a note-to-self kind of post.
Btw: the recovery, depending on db size and disk speed, is going to take long.

The problem A 1.8 Mirth Connect instance was started, then forgotten (well neglected, actually). The user also forgot to setup pruning so the messages filled the embedded Derby database until it grew to fill all the available space on the disk. The SO is linux.

The solution First of all: free some disk space so that the database can be started in embedded mode from the cli. You can also copy the whole mirth install to another server if you cannot free space. Depending on db size you will need a corresponding amount of space: in my case a 5GB db required around 2GB to start, process logs and then store the temp files during shrinking.

Then open a shell as the user that mirth runs as (you're not running it as root, are you?) and cd into the mirth home. …

Indexing Apache access logs with ELK (Elasticsearch+Logstash+Kibana)

Who said that grepping Apache logs has to be boring?

The truth is that, as Enteprise applications move to the browser too, Apache access logs are a gold mine, it does not matter what your role is: developer, support or sysadmin. If you are not mining them you are most likely missing out a ton of information and, probably, making the wrong decisions.
ELK (Elasticsearch, Logstash, Kibana) is a terrific, Open Source stack for visually analyzing Apache (or nginx) logs (but also any other timestamped data).