Saturday, January 14, 2012

Using SQLite to keep state in shell scripts

Bash shell scripting is one the things that I miss more on the times I work on Windows. Even as good as shell scripting is, sometimes I wish it was easy to keep track of state across script executions, for instance when a script executes a rsync at short intervals. In that case I don't care if one particula rsync fails, but I definitely care if it fails, say, ten times in a row or for more than a day.

To do this I need some kind of way to keep state and record each run exit status. The simplest approach that I could think of is to use SQLite. Copied straight from the SQLite site: SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. The source code for SQLite is in the public domain.

As I said, in this example I'll show how to track the status of a series of rsync operations, all run from a central node and pushed to remote nodes. The script is scheduled to run every 30m.

First thing to do in the shell script is to create and init the database unless it does not exist. This is pretty easy (notice the unique constraint, which I'll explain below):

function createdb {
 echo "create table stat(host varchar(20) unique, failcount smallint, lastsuccess datetime);" | sqlite3 db/stats.db

mkdir db > /dev/null 2>&1
test -e db/stats.db || createdb

I put this code at the beginning of the script. If the database gets corrupted I just delete the file and it gets recreated with the next run.

The table structure is easy enough: for each host I want to know when the last successful replication occurred and how many failures since then.

The second important step is to make sure there is a row for each host, because in SQL data must first be inserted and then updated. This is also easy since SQLite support the OR statement: basically we add a unique constraint on the host column and then at every run attempt to insert an initial row for each host. On a traditional db (like Oracle or Postgres) the second insert would trigger an error because the duplicate value violates the unique constraint. Instead on a db that supports OR we can tell the db to ignore the duplicate and keep the existing row. This way I don't have to test if the row exists and insert it if it doesn't with the bonus that the script stays small and simpler:

echo "insert or ignore into stat(host,failcount,lastsuccess) values ('$ip', 0, null);" | sqlite3 db/stats.db  

SQL statements can be fed on stdin or as the second argument of the sqlite3 program. I'll show you how in the last fragment. In this case $ip holds the ip address or hostname used to identify the host in the script.

After the rsync command has run I capture the exit code and store it into the db:

if [ "$status" == "0" ]; then
   echo "update stat set failcount=0, lastsuccess=datetime('now') where host='$ip';" | sqlite3 db/stats.db  
   echo "update stat set failcount=failcount+1 where host='$ip';" | sqlite3 db/stats.db  

At the end of the script I write out failures stats in a file that is later piped into a mail notification:

echo "The following hosts NEVER completed a sync successfully:" > stats.txt
sqlite3 db/stats.db "select host from stat where lastsuccess is null" >> stats.txt
echo "" >> stats.txt
echo "The following hosts did not sync in the last 24 hours:" >> stats.txt
sqlite3 db/stats.db "select host from stat where lastsuccess < (datetime('now','-1 day')) and failcount>0" >> stats.txt

No comments: