Thursday, June 13, 2019

Awk: Why Does This Not Work?

Tab delimited lines.  One or two columns per line.  Only print lines with first column:
awk -F'\t' ' 'NF==1' filename
Lines in input file:
 sources URL
"Roth and Dayton, "Homicide among Adults: Massachusetts homicides, 1751-1760," 89." "Roth and Dayton, "Homicide among Adults: Massachusetts homicides, 1751-1760," 53." "Trial of Abel Clements, Petersburg Intelligencer, July 15, 1806." Horrid Murder! At an early hour on Wednesday morning last, the inhabitants of this town were alarmed with the dreadful information , (Augusta, Me., 1806), 1" ""Horrid Barbarity," Hillsborough [N.C.] Recorder, Apr. 28, 1824, 3."

No lines passed through.  Curiously using print NF, it claims two fields per line.  

awk -F'\t' '{$2!=""}' filename returns nothing.


Since the second field in lines that I want to exclude always has http:// or https:// in it, I tried:
grep -v "https://|http://' filename
And this passed through all lines including http:// or https:// .  Why?

sed '/https://d' file file | sed '/http://d' 
works.

Thanks for all the suggestions.  I found a fix in the mean time.  Beat your head against awk and sed long enough, and they will do what you want.

2 comments:

  1. Works for me (after I removed that third apostrophe from your example). I'm using GNU Awk 4.2.1

    ReplyDelete
  2. The blog mangled your sample file, but your lines are title-tab-url, and sometimes there's no URL, yes? If you have a title and a tab, but no URL, you still have 2 columns, based on my testing. You would need to remove lines with a trailing tab, or, I guess, look for lines where field 2 is null.

    Based on a quick search and test,
    awk -F'\t' '$2==""' myfile
    looks like it should do what you want, although YMMV if you're not using GNU awk.

    ReplyDelete