Wednesday, June 27, 2012

Monitoring QNAP devices with OpenNMS

QNAP devices have snmp support out-of-the-box, unfortunately the agent they ship with is almost unusable. At first it seems it supports lots of cool features (like fans, temperature, smart, etc) but if you take a little time to dig deeper you will notice that almost all key entries are, what?!, octetStrings.
Capacity reported as a String, QNAP what were you thinking?

So good luck estimating disk usage when it is reported as a string: '1.8TB' (I quote exactly as it is shown by mibbrowser).

Without wasting any more of your (and mine) time let's fix that by installing the Optware QPKG and then installing net-snmp from the ipkg web console. The whole process is traightforward, just make sure to have a recent firmware:
  1. log in to the QNAP admin web interface
  2. open Applications servers and then select QPKG Center
  3. from the Available tab install Optware and the enable it from the Installed tab
  4. now access the Optware web interface and search for net-snmp, then click the install button on the net-snmp package
At this point net snmp is running, we only need to configure it and to do that we will ssh into the device.
Once logged in edit the configuration file /opt/etc/snmpd.conf and then have the snmpd daemon reload its configuration by issuing a SIGHUP:
killall -HUP snmpd

Key configuration directives to edit:
  • sysLocation and sysContact , ├ža va san dir
  • uncomment or delete all disk entries and add an includeAllDisks 5% directive instead
  • change communities as per your organization policies
After that rescan the node from the opennms web ui and enjoy the new graphs.

The best configuration manager for Nagios: Google Docs, of course!

Now, I'm not a fan of Nagios and I always recommend OpenNMS over Nagios, but when a client is fixated with Nagios I take a deep breath and get work done with it too.

Now it just happened that I couldn't convince a customer to use OpenNMS so I decided that if I really had to use Nagios I would do it in a way as innovative as possibile.

The first phase in this kind of projects is usually gathering requirements, that is hosts/appliances to be monitored. So I opened up a Google Document spreadsheet and started typing. At a certain point it hit me, what if I could make this doc the source for all configuration and just be done with it?

My spider sense were tingling and I knew I had just found a way to make a dull project an interesting and blog-worthy one.

I created a spreadsheet like the one in the picture with only 4 columns: ip, name/description, location, groups.

After that I shared the spreadsheet as csv and grabbed the url.
From the shell I could now fetch the csv file as shown in the terminal screenshot. Even better changes are automatically published after a modification. This way the customer can easily update the list of nodes to be monitored from a friendly spreadsheet and I don't have to mess with complicated Nagios configuration add-ons.

Now the hard part, how do I get from a csv file to a full-blown Nagios config?

That bit is easily taken care by a python script. Python (besides being already installed in every linux distro) has great suport for cvs files and has a sophisticated templating engine called Jinja (I had learned about Jinja while playing around with SaltStack).

Let code speak

The python script (yes, all of it) is embedded in the following gist (link to the full gist, templates included):

The templates are stored in a subdirectory called templates and the csv file is fetched from Google Docs with wget instead of Python own urllib to ease debugging (the last copy of the file is left in the directory for post-mortem inspection).
The following is an example template to generate the configuration for all host (through a loop on the host list). Note that the configuration is intentionally kept at a minimum; all parameters are inherited through the hostgroup and the template because only in this way we can keep the configuration consistent across all hosts and groups and avoid repetition.


What's still missing?

The whole services, and services to hostgroups/templates assignment, but since that varies greatly depending on each other requirements I intentionally left it blank in this post. Rest assured my customer is seeing all those fancy graphs on her browser now.

I'm going to add a shell script to wrap-up the wget, python, nagios restart (and perhaps even versioning) phases in one easy piece, so keep an eye on the gist or this blog for updates.

Saturday, June 09, 2012

An alternative auditing strategy for Grails apps

In some applications it is useful, and in some case required, that the application audits some or all database operations: for instance to track when a user updates, deletes or inserts a record.

In Grails this is often handled by a plugin, like the Audit Logging plugin, which hooks into the Hibernate events and then logs the details to an audit table, including a timestamp, the user (if available) who initiated the operation and the type of operation.

Unfortunately the Audit Logging plugin suffers from some issues and if you, like me, have to ship today then you can only roll your own solution.

In this post I will lay out a solution which is simple, efficient and should work in most scenarios. The general requirements are:
  1. only one datasource
  2. the underlying dbms must have triggers and some kind of session variables (most do, but I'll focus on Oracle)
  3. quartz jobs, background operations and other activities not necessarily initiated by an http request must properly initialize the connection (like the filter does) or auditing of the user will not work. But on second thought these items are not always 'owned' by a user, or are they? I guess this last item depends mostly on your requirements
The astute reader will note that item 1 or the case when multiple datasources are present can be easily generalized from the proposed solution. If you have a solution I'll be glad to post/link it.

Overview

The solution requires the creationg of a Grails filter which sets a session variable holding the details of the logged-in user owning the current request. The session variable can then be accessed by standard DBMS triggers/procedures and logged together with the operation details without the application even knowing it.
Besides being faster (no round trips to the database, except for the session setup at the beginning of the request) the auditing will be active even in case of changes made by external sql tools.

I will be using Oracle as the underlying database for this post and Grails 2.0.x. Again I will gladly link to solutions for other DBMS.

The filter

The filter in itself  is pretty simple and I think there is actually not much that needs explaining:

import groovy.sql.Sql
class OracleAuditFilters {
    def filters = {
        alterSession(uri: '/**') {
            before = {
                def session = grailsApplication.mainContext.sessionFactory.currentSession
                if(session!=null && session.connection()!=null) {
                    def sql=new Sql(session.connection())
                    sql.execute("select audit_user.au_set(?) from dual",[request.getRemoteUser()?request.getRemoteUser():"ANONYMOUS"])
                }
            }
            after = { Map model ->
            }
            afterView = { Exception e ->
            }
        }
    }
}

With every request it calls a package function which sets the username of the currently logged in user on the database side. Note that the username is retrieved from the request and is authentication-framework agnostic, as long as the framework complies with J2EE standards (Spring Security does, in case you're wondering).

Note that this is a Grails filter, not a J2EE filter.

DBMS session variable code

Oracle has support for session variables though user created contexts, but frankly it looks like more pain than gain. I will then resort to an alternative solution that exploits a particular feature of package variables. Package variables in fact behave much like session variables in the sense that each session has its own instance of a package variable that can be set/get indipendently of other sessions.

The code for the package is shown below, please note the use of the package variable login:

CREATE OR REPLACE PACKAGE audit_user
AS
    FUNCTION au_get RETURN varchar2;
    FUNCTION au_set(username in varchar2) RETURN varchar2;
END audit_user;
/

CREATE OR REPLACE PACKAGE BODY audit_user
AS
    login varchar2(20);
    
    function au_get
    RETURN varchar2
    IS
    begin
      RETURN login;
    END get;
    
    function au_set(username in varchar2)
    RETURN varchar2
    IS
    begin
      login:=username;
      RETURN login;
    END get;
    
END audit_user;
/

I think now you're seeing how this goes: the filter calls the audit_user.au_set() function which inform the database of the user who will be performing the next operations.
Any database code can now access the currently connected user by invoking:

audit_user.au_get()

I have not tried it, but it should even work in column defaults (as long as the audit_user package has already been declared).

Audit trigger example

The following sql is code for an Oracle auditing trigger. The actual auditing code is omitted as it is not relevant to the post and likely to change depending on preferences and requirements. Note the use of the audit_user.au_get function again and the fallback to the currently connected user when it is null (i.e. when using an sql tool like sqlplus).

 CREATE OR REPLACE TRIGGER MYTABLE_AUD AFTER
  INSERT OR
  DELETE OR
  UPDATE ON MYTABLE FOR EACH ROW 
DECLARE v_operation CHAR(1);
  v_username VARCHAR(30);
  BEGIN
    IF INSERTING THEN
      v_operation := 'I';
    ELSIF DELETING THEN
      v_operation := 'D';
    ELSE
      v_operation := 'U';
    END IF;

    -- returns the user connected to the web application
    SELECT audit_user.au_get INTO V_USERNAME FROM DUAL;
    
    IF V_USERNAME IS NULL THEN
        V_USERNAME:=user; -- returns the user currently connected to oracle
    END IF;

    -- AUDITING CODE GOES HERE
 
  END;

Wednesday, June 06, 2012

TileDrawer: change bright style background color

Bright style after changing the background color.
More from my adventures in TileDrawer: the bright style comes with the wrong color for the map background (blue?), so I decided to replace it with a more sober shade of pale brown like most slippy maps seem to use.

The change is a one liner: go to /usr/local/tiledrawer/gunicorn, edit the second line of style.xml and change the bgcolor attribute to whatever you like. I used rgb(241,238,232).

Diff:

Other TileDrawer posts.

Friday, June 01, 2012

Improving TileDrawer rendering speed

I have been playing with TileDrawer recently. After a first rapid successful test on a throw-away EC2 instance I decided to deploy one on a vm for intranet use.

The installation process is pretty straightforward, just run the script copied from the TileDrawer page as root. FYI make sure you have installed curl and python-cssutils before launching the script.

After the script completed I started browsing the map and noticed that tiles took a looong time to render.
A look at top from the server console showed that postgres was hogging the cpu. Memory was fine with no signs of swapping (it is a 1GB instance which I promptly upgraded to 2, running on server class hardware with Xeon CPU). Even after the memory upgrade tile rendering was so slow that the browser would sometime give up and show a white tile.

I decided to look into it a little further and started by using the technique I have already described in another article. The database cache looked fine though as most relationships were completely cached and database size is a measly 300MB which is no problem at all as the postgres param shared_buffers was already set to 512MB.

I then decided to enable slow queries logging (scroll down to 18.8.2) in postgresql.conf:

log_min_duration_statement = 100

I started with a threshold of 100ms, which means pg will log all queries that take more than 100ms to complete. The log quickly filled with lots of queries. Armed with patience I started indexing all the obvious fields like landuse, z_order, amenity, etc.

Eventually I used EXPLAIN on a frequent query that was well above 500ms and to my surprise found out that the planet_osm_polygon table was missing the spatial index on the way geometry. After the index was created no long queries were logged at all and tile rendering was snappy enough to not require cache seeding.

The complete list of all indexes I have created on the database planet_osm (copy and paste):

create index planet_osm_polygon_landuse_idx on planet_osm_polygon(landuse);
create index planet_osm_polygon_x_idx on planet_osm_polygon(z_order);
create index planet_osm_line_highway_idx on planet_osm_line(highway);
create index planet_osm_polygon_amenity_idx on planet_osm_polygon(amenity);
create index planet_osm_line_waterway_idx on planet_osm_line(waterway);
create index planet_osm_poly_building_idx on planet_osm_polygon(building);
create index planet_osm_poly_wayarea_idx on planet_osm_polygon(way_area);
create index planet_osm_poly_natural_idx on planet_osm_polygon("natural");
create index planet_osm_poly_buildingnn_idx on planet_osm_polygon(building) where (building is not null);
create index planet_osm_poly_way_idx on planet_osm_polygon using gist(way);