Sunday, January 14, 2007

Getting the transpose of a CSV

Today my Boss/P.I. approached me with an application problem he was having. He had several large comma separated value files that needed to be transposed (i.e. switching data that are in a row to a column) to work with an application known as Jqtl. Now typically this would be no problem for him as he would simply have to just pop the file into Excel or Datadesk but he was dealing with files that had about 45,000 rows and 30 columns. Now if any of you have worked with Excel and large datasets you would know that Excel used to have a row limit of 256 columns (until Excel 12 according to this blog) so using that as a method was definitely not a solution.

So I simply wrote a quick Perl script for this as I didn't see any available in my 10 minute search online. I'm sure there is probably a module for it, but I thought it would be easy enough.

It took around three seconds to transpose the 45,000 by 30 dataset without any fancy code optimization. Here's the script.

If you're running in a Unix/Linux environment make sure you chmod to make the file executable. To run the script on lets say a file called foo simply run the following form a terminal


$ ./transpose_csv.pl foo

You'll end up with a file with "tr_" appended to the original file name such as tr_foo.

No comments: