The Linux Rain Linux General/Gaming News, Reviews and Tutorials

Transposing rows and columns: 3 methods

By Bob Mesibov, published 13/05/2016 in Tutorials


Swapping the rows and columns in a table on the command line would seem to be a fairly easy thing to do. For example, with this tab-separated table (called 'tabbed') as input:

get this tab-separated table ('swapped') as output:

Datamash

Transposing rows and columns is indeed easy if you have the GNU datamash utility on your system:

The default separator for columns in datamash is the tab character, so if your table is comma-separated ('commas' table in the following example), you need to specify that separator with a -t option:

Note that datamash needs special instructions if there are blanks anywhere in the table — more on this later.

AWK

The mighty text-processing program AWK can also do transposing, but the only commands I've seen are far from simple. Here's a command used by 'fedorqui' in this Stack Overflow post. It puts the whole table in an array before printing the columns as rows:

AWK's default column separator is whitespace (including the tab character), so if the separator is a comma you need to specify that:

DIY

Let's build something from simple shell tools to do this transposing job. We can use the cut command to cut out the first column in the table:

and the paste command with its -s option ('single line') to make that column into a row:

We can loop through all 6 columns in the table with a 'for' loop, doing that cut-and-paste job on each:

The cut and paste commands both use the tab character as default separator, so if the table is comma-separated we need to specify that separator for each command with the -d option:

Scripting

That 'for' loop is fine and good, but notice that it specifies the number of columns, 6, in the table. What if we don't know the number of columns, or can't be bothered counting them?

We can find the number of columns ('numc') by adding 1 to the number of separators with BASH arithmetic. The number can be found with grep and *wc** from the first line of the table, like this for 'commas':

The variable 'numc' can then be fed into the 'for' loop:

These two steps would go nicely in a shell script, here called 'trans'. I'll feed the name of the table to be transposed into the script as the the first argument ($1) for the 'trans' command:

#!/bin/bash

numc=$(($(head -n 1 "$1" | grep -o , | wc -l)+1))
for ((i=1; i<="$numc"; i++))
do cut -d,  -f"$i" "$1" | paste -s -d,
done

What if the separator isn't a comma? No problem. We can feed the separator to a modified 'trans' as a second argument, $2:

#!/bin/bash

numc=$(($(head -n 1 "$1" | grep -o "$2" | wc -l)+1))
for ((i=1; i<="$numc"; i++))
do cut -d "$2" -f"$i" "$1" | paste -s -d "$2"
done

Note that in the improved 'trans', a tab character would be fed in as a quoted variable ($'\t'):

Time trials

Three methods for transposing rows and columns, but which is fastest? To do a test, I'll use a tab-separated data table called 'table1' with 15 columns and 10004 rows. Not a huge table, but big enough to detect speed differences in the methods. Because there's whitespace within data items in 'table1', I'll need to specify a tab character as separator for AWK.

And here's the time test:

Looks like our shell script 'trans' is the winner. But is its output correct? Yes:

Missing data

If there are blank entries in the table, as there are in 'table2', datamash complains:

The solution in datamash version 1.0.6 (the one I got from the Debian 8 repository) is to use the --no-strict option. This is supposed to replace each blank entry with a string, and the default string is 'N/A'. Unfortunately, this doesn't work:

This was a known bug in version 1.0.6 and is said to have been fixed in the next datamash version. Anyway, both AWK and 'trans' will handle blank entries without problems, so let's do a time test again:

And once again, AWK and 'trans' give the same output:

The shell script

I wrote 'trans' as throwaway code for transposing my own tables. It could be extended so that argument 2 specifies the input column separator, and argument 3 a different column separator for the output.

There are other ways to swap rows and columns in tables, for example using Perl, but for the time being I'm happy with the script!



About the Author

Bob Mesibov is Tasmanian, retired and a keen Linux tinkerer.

Tags: tutorials scripting bash transposing awk regular-expressions tables cli
blog comments powered by Disqus