SED and AWK are my favorites when it comes to (pre)processing data (…before I go and write python script). While my primary source of reference is book titled Sed & Awk, 2nd ed. (O’Reilly), I also have some prints of various sed one-liners hanging on the wall in my office. I never bothered to write down url on which we can find many “popular” sed one-liners, so here it is:
Just an example of how handy this list of one-liners can be. It helped me the other day when someone on the dba-village forum asked about how to load data from comma separated file. There are five fields in the file, but the problem is that the third field may contain any number of commas. (This is a common type of a question that I receive from time to time at my workplace as well — and I almost always resolve for a solution to the sed.)
For example, let’s say that we have file (test.txt) with comma separated data such as this:
COL1,COL2,COL3-with, some, data, with, commas,COL4,COL5
We should enclose field 3 within quotes and then use OPTIONALLY ECNLOSED BY clause in SQL*Loader control file to load the data.
Basically we need to replace second comma with ,’ and next to last comma with ‘, .
The first part is easy:
cmd> cat test.txt | sed "s/,/,'/2" ....will replace second comma with ,'
The second part, replacing next to the last comma with ‘, is more complex and was finally derived from an example written in the one-liner list mentioned above:
cmd> cat test.txt | sed "s/\(.*\)\,\(.*\,\)/\1',\2/"
We're using grouping ( ) and then addressing each of the two groups with 1 and 2. Backslashes are there to escape characters that are part of the syntax, but admittedly making command line string harder for humans to parse. But consider this, lighter example where we want to split days with two consecutive semicolons) :
cmd> echo MondayTuesday | sed "s/\(Monday\)\(Tuesday\)/\1::\2/"
So that in the end we can get desired result:
cmd> cat test.txt | sed "s/,/,'/2" | sed "s/\(.*\)\,\(.*\,\)/\1',\2/" > test_out.txt
cmd> cat test_out.txt
COL1,COL2,'COL3-with, some, data, with, commas',COL4,COL5