Saturday, March 10, 2012

OpenNMS: PostgreSQL 9.1 tuning

I have just completed an upgrade from OpenNMS 1.8.11 to the latest and greatest 1.10. The upgrade in itself is easy and the guides on the OpenNMS wiki will serve you well. Instead in this post I'll describe a couple of other changes that I made which improved very much the overall performance and responsiveness of the system.

One is the upgrade from PostgreSQL 8.4 (which came with CentOS) to 9.1 + tuning.
The other is switching from apache to nginx.

Upgrading postgres is mostly a matter of taking a backup, pulling in the right repo, running yum install and finally importing the database.


Tuning postgres
I left opennms running on PostgreSQL 9.1 for a while and then I went checking how well postgres was doing. Postgres 9 already performs significantly better that its 8.x predecessors, but I wanted to do better than out-of-the-box.

As the postgres user I logged in into the opennms database to install a utility that will help me estimate how much of the database is being cached. If the server had enough memory I could then configure postgres to use more memory for caching which means less disk i/o and overall better performance (unless your server starts thrashing, that is).

CREATE EXTENSION pg_buffercache;
create view v_database_cache as SELECT c.relname,pg_size_pretty(count(*) * 8192) as buffered,round(100.0 * count(*) /(SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent,round(100.0 * count(*) * 8192 /pg_relation_size(c.oid),1) AS percent_of_relation 
FROM pg_class c INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.oid,c.relname ORDER BY 3 DESC LIMIT 10;

The second command creates a view as a placeholder for the complex query behind it. The query is documented in the (highly recommended) book PostgreSQL 9.0 High performance.
I then queried the database for its size as follows:

opennms=# select pg_size_pretty(pg_database_size('opennms')) as db_size;
 db_size 
---------
 691 MB
(1 row)

Since the whole database is a little less than 700MB and the server has 4GB of RAM I thought that I could raise the shared_buffers param from the measly 32MB that is the default to something more appropriate like 512MB. After that I restarted both opennms and postgres and waited for a while to make sure that the server wasn't swapping. Two days after I queried the view I created above to see how much of the database was in the cache (which means in RAM):

opennms=# select * from v_database_cache;
            relname            | buffered | buffers_percent | percent_of_relation 
-------------------------------+----------+-----------------+---------------------
 events                        | 266 MB   |            52.0 |                64.5
 event_archives                | 22 MB    |             4.3 |               100.0
 notifications                 | 20 MB    |             3.9 |               100.1
 events_nodeid_display_ackuser | 5616 kB  |             1.1 |                33.2
 outages                       | 5000 kB  |             1.0 |               100.3
 events_ipaddr_idx             | 4496 kB  |             0.9 |                27.5
 events_nodeid_idx             | 4456 kB  |             0.8 |                36.4
 events_uei_idx                | 3648 kB  |             0.7 |                10.5
 iprouteinterface              | 2096 kB  |             0.4 |               101.6
 events_time_idx               | 2256 kB  |             0.4 |                20.0
(10 rows)

Looks good: the events table (the largest by far) and its indexes are mostly cached into RAM!
I could try to raise the shared_buffers value to a number larger than the database size, but since the most expensive relation and its indexes are for more than half cached and since postgres is already using lots of other memory besides shared_buffers I left it as it is.
Performance and responsiveness of the UI improved a lot. I/O wait went down too by a 2-4%.

Swapping NGINX in for Apache
I did it mostly to save CPU and RAM because the amount of resources that even a lightly loaded apache can consume is astounding. Don't take my word for it: google it!
This one's easy too: head to the download page and grab the rpm for your distro.

The net result of the above changes is that the UI is now faster to load and page transitions are smooth without eccessive waiting, even on the node detail page for a system with lots of events.

Hope this helps!

Friday, March 09, 2012

A case for manipulating the DOM outside the Sproutcore RunLoop

KVO is one of Sproutcore coolest features. At the core of it there is the RunLoop in which events and notifications are efficiently processed, dispatched to their destination(s) and views modify the DOM to reflect the new application state.
Sproutcore developers are consequently told not to manipulate directly the DOM. When out-of-band events, like an ajax call returning from a third-party library or a WebSocket receiving a new message happen it is possible to trigger the RunLoop manually by calling SC.RunLoop.begin() ... SC.RunLoop.end().

Sometimes though it is not only necessary, but recommended, to bypass the RunLoop and manipulate directly the DOM to either provide UI refreshes as fast as the browser allows or avoid the expensive computations implicated in the RunLoop. These concepts were incidentally discussed on IRC just when I needed to implement a progress bar to provide feedback on the loading state of a particularly slow datasource and I am writing them down here so that others might benefit from them.

WFM disclaimer : now I don't know if the implementation I am going to document in this post is completely sound so take it with a grain of salt and/or discuss it on IRC before adopting it.

Run Loop links:
http://guides.sproutcore.com/core_concepts.html#the-run-loop (a bit short, but gives the idea)
http://frozencanuck.wordpress.com/2010/12/21/why-does-sproutcore-have-a-run-loop-and-when-does-it-execute/ (a must read, the post and the whole blog)

The problem
The application loads a GetCapabilities XML document from a WMS server. The document describes, among other things, the projection used by each layer. To correctly display a layer the projection must be loaded in the web gis and this operation might require a remote call to a projection registry.
Until this second remote call has completed the datastore cannot continue loading the layer list from the GetCapabilities document.

This process is, in most cases, immediate because the projections used are so common that they ship with the web gis and therefore do not require the remote call. But for thise cases when the remote call is needed, it is importanto to let the user know about what is going on and how long it is going to take.

The first approach
My first approach was to implement a SC.ProgressView in a modal pane. Unfortunately this does not work because datastore operations are already executed in a RunLoop and all notifications are therefore delayed until the loop end.
The result is that the SC.ProgressView is not updated until the loading process has completed and just jumps from 0 to 100% without any intermediate step. The user is not getting better feedback than if the application did without the progress bar altogether.

The solution
To provide better and faster (in fact as fast as the browser allows) feedback to the user the application needs to be able to modify the DOM directly, bypassing the facilities provided by Sproutcore.
To do it we need the following:
  1. a view representing a progress bar of some sort which can be directly updated, bypassing kvo
  2. a counter tracking progess
  3. a way to update the view with the current progress status
For demonstration purposes we will create an anonymous view embedded in a modal pane like the following:

App.progressPane = SC.PanelPane.create({
    layout:{ width:400, height:60, centerX:0, centerY:0 },
    contentView:SC.View.extend({
        childViews:"labl bar".w(),
        labl:SC.LabelView.design({
            layout:{top:10, centerX:0, width: 100, height:30},
            value:"_loading".loc()
        }),
        bar:SC.View.design({
            layout:{top:30, centerX:0, width:350, height:20},
            render:function (ctx, firstTime) {
                if (firstTime) {
                    ctx.push("<progress style="width: 100%\";"></progress>");
                }
                return ctx;
            },
            updateProgress:function (progress) {
                var bar = this.$("progress")[0];
                if(bar) {
                    bar.max=100;
                    bar.value = progress;
                }
            }
        })
    })
});

Note that to keep things simple I went with a progress HTML5 element. In browsers that do not support it (notably Safari) the users sees nothing but the loading labl. Implementation of a fallback strategy is left as an exercise to the reader ;-).
I'd also like you to note the updateProgress function which by use of a jquery selector grabs the progress element and updates its value. This function is not part of any SC specification and expressly violates the principle of not manipulating the DOM directly.

The counter is very much implementation dependent: one quick and dirty solution could be to to hook it up to SC.Request.manager and count inflight+pending down to 0, but it might not work because it also depends on the RunLoop which, remember, we're trying to do without. In the specific case that sparked this post requests were fired from a third party library and could be counted down by using a store-local variable decremented by a callback.

Whenever the counter is increased/decreased (depends if it's counting down or up) the callback must also update the view. Again we cannot rely on KVO and must explicitly invoke the updateProgress function which we added to our custom view just for this purpose.
The code at controller, or statechart level, could look something like this:

updateProgress: function(progress) {
   App.progressPane.contentView.bar.updateProgress(progress);
},

Final touch
In the last snippet you might have noticed the ugly hardcoded path coded into the controller: smells like bad code.
Looks like a perfect case for using SC.outlet. The PanelPane gets a new property:

App.progressPane = SC.PanelPane.create({
    progressbar: SC.outlet("contentView.bar"),
    layout:{ width:400, height:60, centerX:0, centerY:0 },
    contentView:SC.View.extend({

And the function then becomes:

updateProgress: function(progress) {
   App.progressPane.get("progressbar").updateProgress(progress);
},

Thursday, March 01, 2012

Tweak OpenLayers to get parse GetCapabilities working in IE

It happened to me recently that I needed OpenLayers (version 2.11) to parse a GetCapabilities response from Geoserver to present the user a list of layers to pick from.

The capabilites request is made through a Sproutcore request, which is basically a jQuery ajax object in disguise.
This of course works beautifully in every browser with the notable execption of ... IE (IE9 included).

The cause is that IE will helpfully xml-parse a response whose content-type is text/xml, but will refuse to parse a document whose content-type is application/vnd.ogc.request+xml. To add comedy to the drama the responseXML attribute of response is not null, as one would expect, but is instead set to reference an empty dom.

The workaround is to put a giant browser-sniffing if in your javascript to handle IE differently.
The code is like the following and please note that it is Sproutcore code, so browser sniffing and other amenities are peculiar to Sproutcore:

// God mess IE
if(SC.$.browser.msie) {
    content=response.responseText;
    var xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
    // required or IE will attempt to validate against DTD, which could fail
    // because the dtd is not accessibile and we don't really care
    // we will notice later if any layer was loaded from the response anyway
    xmlDoc.async = false;
    xmlDoc.validateOnParse = false;
    xmlDoc.resolveExternals = false;
    var parsed=xmlDoc.loadXML(content);
    if(!parsed) {
        var myErr = xmlDoc.parseError;
        alert(myErr.reason);
    } else {
        content=xmlDoc;
    }
}

var wmsCapabilities = new OpenLayers.Format.WMSCapabilities();
var capabilities = wmsCapabilities.read(content);
// use the capabilities as you wish


IMPORTANT: remember to set the Proxy Base Url in the Geoserver Global configuration page to the correct value if using a reverse proxy.