awk -F'\t' ' 'NF==1' filenameLines in input file:
sources URL
"Roth and Dayton, "Homicide among Adults: Massachusetts homicides, 1751-1760," 89." "Roth and Dayton, "Homicide among Adults: Massachusetts homicides, 1751-1760," 53." "Trial of Abel Clements, Petersburg Intelligencer, July 15, 1806." Horrid Murder! At an early hour on Wednesday morning last, the inhabitants of this town were alarmed with the dreadful information , (Augusta, Me., 1806), 1" ""Horrid Barbarity," Hillsborough [N.C.] Recorder, Apr. 28, 1824, 3."
awk -F'\t' '{$2!=""}' filename returns nothing.
Since the second field in lines that I want to exclude always has http:// or https:// in it, I tried:
grep -v "https://|http://' filenameAnd this passed through all lines including http:// or https:// . Why?
sed '/https://d' file file | sed '/http://d'works.
Thanks for all the suggestions. I found a fix in the mean time. Beat your head against awk and sed long enough, and they will do what you want.
Works for me (after I removed that third apostrophe from your example). I'm using GNU Awk 4.2.1
ReplyDeleteThe blog mangled your sample file, but your lines are title-tab-url, and sometimes there's no URL, yes? If you have a title and a tab, but no URL, you still have 2 columns, based on my testing. You would need to remove lines with a trailing tab, or, I guess, look for lines where field 2 is null.
ReplyDeleteBased on a quick search and test,
awk -F'\t' '$2==""' myfile
looks like it should do what you want, although YMMV if you're not using GNU awk.