Monday, October 20, 2008

Unix Tools: Playing with sorting

I do a lot of unix scripting, every now and then I find a new way of using some basic unix tools which always reminds me of how powerful unix/linux can be. Today I've been playing with sorting.

I have a script that process storm locations and forecast tracks. When works with a single storm, this is not a problem the python program I wrote outputs the tracks and forecasts sorted the way I need them by design. The problem happens when I want to concatenate several tracks together for multiple storms. The column order of the CSV files created makes sense for database and human reading, but not for simple sorting. Using:
sort *.csv > sorted.csv
simply doesn't work. Nice part is that sort has some nice features that let you play with the process. First off, since in my case I'm dealing with CSV files, I first want to tell sort how to separate the columns:

sort -t , *.csv > sorted.csv
This really doesn't do anything at this point to help the sort issue, but I want to point out that any of the nifty features that I used from this point on won't work unless you've included the '-t [char]' flag first.

So at this point sort knows how to separate the fields, not the beauty of sort comes in. Using the -k pos1,pos2 flags, you can tell sort which columns to sort by and in which order.

sort -t , -k4,8 *.csv > sorted.csv
This will limit sorting to only the 4-8 columns which are the only columns that matter in my case, but still this does not totally solve my problem since part of it is the order in which the columns are look at. The following is the command options I came up with:
sort -t, -k6n,6 -k7n,7 -k4d,4 -k8n,8  *.csv > sorted.csv
'-k#n,#' tells sort to interpret the column as a number, '-k#d,#' tells sort to interpret the column using a basic dictionary sort. So my solution first sort by column 6 as a number, then sort by 7 also as a number, then sort by the 4 column as a string, and then finally the 8th as a number. So while this makes no sense when you look at it this way, if you knew the format of the csv file it would make perfect sense... anyways.... I just wanted to point out that I always how powerful simple unix tools are.

anyways for more examples and a much better write up on Sort than I can give:
http://www.softpanorama.org/Tools/sort.shtml
http://www.softpanorama.org/Tools/Sort/unix_sort_examples_collection.shtml
and don't forget the man pages:
http://unixhelp.ed.ac.uk/CGI/man-cgi?sort

No comments: