Friday, May 24, 2019

Another Plea for a Research Assistant

This is a simple task.  When I started compiling the spreadsheet of horror, I only included the author (if identified), article title, newspaper title, date and page number:
“Slain with Ax and Pistol,” San Francisco Call, May 28, 1896, 1.
Part way through the process, I realized that the URL to the article should be included and I started doing that.  But there are easily 100-150 incidents where I did not record the URL.  It would be useful to fill in those URLs.  All that is required is to search the Chronicling America web site with the article title and limit the year range to the year of the article. The URL appears at the bottom of the page.  One of you clever sorts with a still functioning brain can write a program to grab each citation (above) from a CSV file, produce a search string that finds the article in Chronicling America , and then copy the URL.  (There should be few duplicate article titles in the same year.  If need be, add the month to the search limitation.)  I know that this can be done.  I have written C# before that followed URLs and loaded webpages to follow links.

2 comments:

David aka True Blue Sam said...

Nettie murdered with axe by husband; early twentieth century; Gladstone, IL. https://truebluesam.blogspot.com/2013/08/de-escalate-situation.html

metapundit.net said...

See the Python code at https://gist.github.com/simeonf/b89c9a1a021973f7266d0107514f1638 which processes a .csv file and prepends the url of the newspaper article. I only tested on the one sample data row you provided so may be fragile...