Thursday, 27 March 2014

unix or gnu: sh n awk. finding the next lun.

Had a query from a mate about writing a sh script for a specific purpose. He had one or more files containing a natural number on each line (luns) and wanted to find out where the next whole was.

Apart from being a bit ... i dunno, surprised that a "unix sysadmin" of 25 years (i.e. a lifer) wasn't a total Bourne Shell fiend with lashings of awk and perl for dessert ... it seemed a simple lunch-break 'challenge' so I came up with a couple of solutions.

First I just used bash. It's a bit clumsy though.

#!/bin/sh

# usage: nextlun file ...

n=0
cat $* | sort -n | while read c; do
    if [ $c -ne $n ] ; then
        echo $n
        exit 1;
    fi
    n=`expr $n + 1`
done
# above runs in sub-shell so doesn't update n
if [ $? -eq 0 ]; then
    last=`cat $* | sort -n | tail -1`
    echo `expr $last + 1`
fi

I probably should've known because I use it quite often but I didn't realise (or forgot) "while read" runs in a sub-shell.

However, awk makes this much easier and is the sort of thing it's really good at. The algorithm is identical though.

#!/bin/sh

cat $* | sort -n \
 | awk -e 'BEGIN { n=0; } { if (n != $0) { exit 0; } else { n=n+1; } } END { print n; }' 

Neither handles blank lines properly, so an easy fix should that be necessary in the awk:

#!/bin/sh

cat $* | sort -n \
 | awk -e 'BEGIN { n=0; } /[0-9]+/ { if (n != $0) { exit 0; } else { n=n+1; } } END { print n; }' 

A grep could perform the same duty in the bash version as well.

This is the core of what makes "unix" actually something worth using. The whole system itself is the "integrated development environment". And all that power is available to any user who wants it without having to buy some overpriced application to do it.

Imagine how many people would use a spreadsheet for such a simple task - and then have to do it manually every time to rub salt into the wound. So not only do you have to pay real cash for the privilege, you have to keep on paying with your own time - which is something you can never buy back for any price.

Update: As a further emphasis on the last point I was relaying this anecdote to a mate of mine, one who is also in an area where scripts should be a comfortable notion (dba on unix systems). He actually suggested using "excel" to do the same task. But then again he did earn his wings as a dba being paid by the hour ... so perhaps can be forgiven for not minding a bit of menial busy work ;-)

No comments: