The Linux Rain Linux General/Gaming News, Reviews and Tutorials

DMS to DD to KML with AWK and sed

By Bob Mesibov, published 06/02/2015 in Tutorials


In a 2014 Linux Rain article I describe a fast 'points plotter' for Google Earth. First I copy a list of latitude/longitudes (lat/lons) from a spreadsheet to the clipboard, then I launch a shell script with a keyboard shortcut. The script builds a KML file from the copied list and opens the KML for viewing in Google Earth.

Fine and good, but that script needs to have its lat/lons in decimal degree (DD) format, like -41.4431 103.9522, which is the correct format for KML. The script explained in this article does fast KML-building with lat/lons given in degree-minute-second (DMS) format, like 41°26'35"S 103°57'08"E. Not so simple a script, but AWK and sed were my trusty friends for this job.

Converting to decimal degrees

It's easy to code from DMS to DD. A latitude in degrees, minutes and seconds converts to decimal degrees like this:

degrees,  minutes, seconds > degrees + minutes/60 + seconds/3600

For example, 41°26'35" becomes 41.4431 decimal degrees, to 4 decimal places:

41°26'35" = 41 + 26/60 + 35/3600 = 41 + 0.433333 + 0.009722 = 41.443056

If the degrees, minutes and seconds in a DMS latitude are in separate fields, AWK can do this calculation and use printf to do the rounding-off:

Here the number resulting from the calculation '$1 + $2/60 + $3/3600' is rounded off to 4 places with the option %.4f.

Unfortunately, something's missing. DD lat/lons are signed.

DD's conventions

In DMS notation, latitudes north of the Equator are designated N, and south of the Equator S. Longitudes west of the Prime Meridian, through Greenwich in the UK, are designated W, and longitudes east of Greenwich are E. There are no letters in DD notation, however, only positive and negative numbers. The conventions for N/S of the Equator and W/E of Greenwich are shown below.

So to do DMS conversion to DD, there are 4 slightly differing AWK commands, as shown below. The lat/lon results shown are tab-separated by printf.

AWK is flexible

To tell AWK which of the 4 possible commands to use, I could pull out the direction letters (N,S,W,E) as a pair at the end of the lat/lon string (I'll explain how in a moment), then use an if/else construction:

awk '{if ($9=="NE") {printf ("%.4f\t%.4f\n",$1+$2/60+$3/3600,$5+$6/60+$7/3600)} \
else if ($9=="NW") {printf ("%.4f\t%.4f\n",$1+$2/60+$3/3600,-($5+$6/60+$7/3600))} \
else if ($9=="SE") {printf ("%.4f\t%.4f\n",-($1+$2/60+$3/3600),$5+$6/60+$7/3600)} \
else if ($9=="SW") {printf ("%.4f\t%.4f\n",-($1+$2/60+$3/3600),-($5+$6/60+$7/3600))}}'

Complicated to look at, but the logic is simple — and it works.

Sed is helpful

My problem at this point is that DMS lat/lons don't come neatly laid out in 8 space-separated fields like 41 26 35 S 103 57 08 E. In the lat/lon data I work with, there are typically degree, minute and second characters in the strings: 41°26'35"S 103°57'08"E.

You might think that converting those characters to spaces with sed would be simple, but it isn't because single and double quotes have special meanings in the BASH shell, and the degree character isn't on my keyboard. Worse yet, many lat/lons in my sources (like those in Wikipedia) don't actually contain single and double quotes as minutes and seconds designators. Instead they have characters like 'prime' and 'double prime'. The practical workaround is to simply replace anything in the lat/lon string that isn't a digit or N, E, W or S with a space, using an appropriate regular expression:

sed 's/[^0-9NEWS]/ /g'

Sed is really helpful

There's another wrinkle. In DMS lat/lons, sometimes there aren't the minutes and seconds figures needed in those AWK calculations, above. Sometimes lat/lons are in the degrees-minutes form 41°26'S 103°57'E or the degrees-only 41°S 103°E. The AWK calculations would work fine, though, if I could transform these into 41 26 0 S 103 57 0 E and 41 0 0 S 103 0 0 E.

sed again comes to the rescue. For lat/lons without seconds, I can space the string out with zeroes using:

sed 's/\xc2\xb0\([0-9]\{1,2\}\).\([NEWS]\)/ \1 0 \2/g'

where '\xc2\xb0' is the degree symbol escaped. For lat/lons without either minutes or seconds I can use:

s/\xc2\xb0\([NEWS]\)/ 0 0 \1/g

Putting the 3 sedcommands together, I can space out and convert the string to the form ready for AWK:

sed 's/\xc2\xb0\([0-9]\{1,2\}\).\([NEWS]\)/ \1 0 \2/g;s/\xc2\xb0\([NEWS]\)/ 0 0 \1/g;s/[^0-9NEWS]/ /g'

An AWK touch

Now to get back to adding a pair of direction letters at the end of the lat/lon, so that the AWK if/else command knows which calculation to perform. To add the letters I use

awk -v FS="[ \t]" '{print $0,substr($1,length($1),1)substr($2,length($2),1)}'

on the starting string:

The command tells AWK that there are two fields in the string, which might be separated with either a space or a tab (-v FS="[ \t]"). AWK first prints the full string, then extracts a substring (the last character) from each of the two fields and concatenates the substrings as a last field.

Note that the sed spacing and cleaning commands (described above, but following this step in the final script) have no effect on the [space][2 letters] added to the end of the lat/lon string.

Putting the bits together

My DMS-to-KML script looks like this in outline:

  • Paste a list of lat/lons from clipboard to temp file with the xclip utility
  • Use AWK on the temp file to add a direction-letter pair at the end of each line of lat/lons
  • Pipe the result to sed to space out incomplete lat/lons with zeroes, and to replace degrees, single quotes, double quotes etc with single spaces
  • Pipe the result to AWK to do the DMS to DD conversions, getting the correct signs for the DDs from the direction-letter pair, and save the result to another temp file
  • Use AWK to build a simple KML file from the list of DDs in the second temp file, and save the KML file to the desktop
  • Open the KML in Google Earth
  • Delete the two temp files

and like this as a shell script:

#!/bin/bash

xclip -o > /tmp/list

awk -v FS="[ \t]" '{print $0,substr($1,length($1),1)substr($2,length($2),1)}' /tmp/list \
| sed 's/\xc2\xb0\([0-9]\{1,2\}\).\([NEWS]\)/ \1 0 \2/g;s/\xc2\xb0\([NEWS]\)/ 0 0 \1/g;s/[^0-9NEWS]/ /g' \
| awk '{if ($9=="NE") {printf ("%.4f\t%.4f\n",$1+$2/60+$3/3600,$5+$6/60+$7/3600)} \
else if ($9=="NW") {printf ("%.4f\t%.4f\n",$1+$2/60+$3/3600,-($5+$6/60+$7/3600))} \
else if ($9=="SE") {printf ("%.4f\t%.4f\n",-($1+$2/60+$3/3600),$5+$6/60+$7/3600)} \
else if ($9=="SW") {printf ("%.4f\t%.4f\n",-($1+$2/60+$3/3600),-($5+$6/60+$7/3600))}}' >> /tmp/final

awk 'BEGIN {print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<kml xmlns=\"http://www.opengis.net/kml/2.2\"> \
\n<Document>\n<Style id=\"marker\">\n<IconStyle> \
\n<Icon><href>http://maps.google.com/mapfiles/kml/shapes/placemark_circle.png</href></Icon> \
\n</IconStyle>\n</Style>"} \
{print "<Placemark><styleUrl>#marker</styleUrl><Point><coordinates>"$2","$1",0</coordinates></Point></Placemark>"} \
END {print "</Document>\n</kml>"}' /tmp/final > ~/Desktop/temp.kml

xdg-open ~/Desktop/temp.kml

rm /tmp/list /tmp/final

exit

A test drive

Here's the script at work on a demonstration list of 18 Indonesian cities, north and south of the Equator and with lat/lons given only with degrees and minutes. I copy the lat/lon list from a spreadsheet to the clipboard:

and launch the script. Google Earth opens with the cities located as points:

with the KML correctly built:

Because the latitudes and longitudes for these cities were in separate fields in the spreadsheet, the pasted list in the first temp file is tab-separated. For more on that point, see this Linux Rain article.

User beware

I wrote this script as throwaway code for clean, consistent lat/lons. Please note that it won't work for lat/lons with decimal minutes or decimal seconds, or with longitude first instead of latitude, or with spaces within latitude or longitude strings, or with impossible numbers (like more than 60 minutes or seconds). All those things could be catered for in a script, but at the cost of making the script more complicated than I like. For my uses, lat/lons get made clean and consistent in tables before I turn them into KMLs.



About the Author

Bob Mesibov is Tasmanian, retired and a keen Linux tinkerer.

Tags: bash shell dd scripting tutorial kml sed awk commandline
blog comments powered by Disqus