Skip to main content

Postgres maintenance: vacuum hell, cluster to the rescue

I have an OpenNMS server (a PC really) sitting under a desk at a company that I don't hear from often, but I still get weekly backup mail-recaps.
This week the recap brought bad news: postgres stopped accepting connections until the db is vacuumed to prevent transaction id wraparound.

I have know of this issue of postgres for some time and planned accordingly: a cron job every month would run a full vacuum. Unfortunately in one month the db grew so much that vacuum never reached completion because the customer simply rebooted the server thinking that it was 'stuck'.

This vacuum-stuck-reboot had gone on for months until today. As I knew from prior experience this was going to be a painful experience: a quick du on the database folder reported 60GB of data. Vacuum would probably take days.

Luckily for me I had heard from the great Postgresql 9.0 high performance book that there is another tool in the toolbox called cluster.
So I decided to execute this plan this time:

  1. delete old events, alarms and notifications (by far the largest tables)
  2. cluster each table (including the 5 largest)
  3. vacuum full
Well, believe it or not the whole process took no longer that a couple of hours, most of which spent deleting records. I am not kidding you when I say that other times I have spent days waiting for vacuum to end.
And best of all the database shrunk from 60GB to 7.9GB! Most of the shrinking happened already after clustering.


Comments

Popular posts from this blog

Indexing Apache access logs with ELK (Elasticsearch+Logstash+Kibana)

Who said that grepping Apache logs has to be boring?

The truth is that, as Enteprise applications move to the browser too, Apache access logs are a gold mine, it does not matter what your role is: developer, support or sysadmin. If you are not mining them you are most likely missing out a ton of information and, probably, making the wrong decisions.
ELK (Elasticsearch, Logstash, Kibana) is a terrific, Open Source stack for visually analyzing Apache (or nginx) logs (but also any other timestamped data).

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive.
Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

RUNDECK job maintenance

Learn more about Rundeck.

Now that I have a fair number of jobs scheduled by Rundeck, how do I periodically prune the job execution history and keep only the last, say, 30 executions for each job?