sed one-liners

SED and AWK are my favorites when it comes to (pre)processing data (…before I go and write python script). While my primary source of reference is book titled Sed & Awk, 2nd ed. (O’Reilly), I also have some prints of various sed one-liners hanging on the wall in my office. I never bothered to write down url on which we can find many “popular” sed one-liners, so here it is:

http://sed.sourceforge.net/sed1line.txt

Just an example of how handy this list of one-liners can be. It helped me the other day when someone on the dba-village forum asked about how to load data from comma separated file. There are five fields in the file, but the problem is that the third field may contain any number of commas. (This is a common type of a question that I receive from time to time at my workplace as well — and I almost always resolve for a solution to the sed.)

For example, let’s say that we have file (test.txt) with comma separated data such as this:


COL1,COL2,COL3-with, some, data, with, commas,COL4,COL5

We should enclose field 3 within quotes and then use OPTIONALLY ECNLOSED BY clause in SQL*Loader control file to load the data.

Basically we need to replace second comma with ,’ and next to last comma with ‘, .

The first part is easy:

cmd> cat test.txt | sed "s/,/,'/2" ....will replace second comma with ,'

The second part, replacing next to the last comma with ‘, is more complex and was finally derived from an example written in the one-liner list mentioned above:

cmd> cat test.txt | sed "s/\(.*\)\,\(.*\,\)/\1',\2/"

We're using grouping ( ) and then addressing each of the two groups with 1 and 2. Backslashes are there to escape characters that are part of the syntax, but admittedly making command line string harder for humans to parse. But consider this, lighter example where we want to split days with two consecutive semicolons) :

cmd> echo MondayTuesday | sed "s/\(Monday\)\(Tuesday\)/\1::\2/"
Monday::Tuesday

So that in the end we can get desired result:

cmd> cat test.txt | sed "s/,/,'/2" | sed "s/\(.*\)\,\(.*\,\)/\1',\2/" > test_out.txt

cmd> cat test_out.txt

COL1,COL2,'COL3-with, some, data, with, commas',COL4,COL5

Posted on 08.08.2008, in Scripting and tagged . Bookmark the permalink. Comments Off on sed one-liners.

Comments are closed.