Count dates with AWK
Someone on the public forum asked this question:
“Say I want to find out how many dates are there in 1000 lines of characters.Also date format is given (…in my case date format is of ‘DD-MON-YYYY’).”
When I read the question I thought, what a nice exercise for regex with awk. Here is part of the answer I gave:
####################### # Countdates.awk script ####################### BEGIN {i=0} { $0 = toupper($0) i += gsub(/(0[1-9]|[1-2][0-9]|3[01])-(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)-((19[0-9][0-9])|(20[0-9][0-9]))/,"") } END {print "Number of dates (DD-MON-YYYY) in the file: ", i} ####################### # End Countdates.awk #######################
Sample file with three lines (dates.txt):
xxxxx01-Jan-200002-Feb-2003 xxxx13-APR-1990xxxxx xx32-OCT-2007---xxx00-dEC-1900 xxx 1-MAY-2003xxxxx xxx2004-AUG-07 xxxxx-08-Aug-2007- 12-Jun-1890
Now, if you run the script:
G:\>gawk -f countdates.awk dates.txt Number of dates (DD-MON-YYYY) in the file: 4</code>
It’ll catch three dates in the first line (01-Jan-2000, 02-Feb-2003,13-APR-1990), none in second line
and one in the third line (08-Aug-2007).
Short explanation of the script:
$0 = toupper($0) ... it's needed to catch Apr, APR, aPR
gsub() functions returns number of substitutions which is added to variable i
/ ...search pattern ... /
(0[1-9]|[1-2][0-9]|3[01]) ... trying to catch only valid days 01-31 (assuming leading zero)
(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) .... month names abbreviation in uppercase to compare against input line ($0) which we uppercased at he beggining
((19[0-9][0-9])|(20[0-9][0-9])) .... assuming only years that starts with 19xx and 20xx are valid (that's why we didn't count 12-Jun-1890)
You can take this example and fine “tune” it for the data you actually have. This script lacks true date validation (e.g. it’ll count 29-FEB-2007 as a date despite the fact that it’s really wrong), for this, perl/python etc. are better suited (first catching the date candidates with the power of regex pattern search, then validating those strings as a true dates with some date/time module.).
Posted on 21.08.2007, in Scripting and tagged Scripting. Bookmark the permalink. Comments Off on Count dates with AWK.