Conservative. Idaho. Software engineer. Historian. Trying to prevent Idiocracy from becoming a documentary.
Email complaints/requests about copyright infringement to clayton @ claytoncramer.com. Reminder: the last copyright troll that bothered me went bankrupt.
Wednesday, October 12, 2016
Would You Like to be One of My Research Minions?
I have hundreds of pages of letters, most scrawled in Sam Colt's horrible 19th century cursive often poorly spelled. I need to crowd-source converting these to text. Trying to OCR these would set fire to my PC. Would you be interested in helping? The images are large, and high resolution (usually 12 MP..) clayton ------at------- claytoncramer.com.
Subscribe to:
Post Comments (Atom)
I'll try one. Post the images somewhere, I'll pull one down and transcribe it. You might want to create some kind of log or tag so people know who's working on what, and don't duplicate efforts. Perhaps a structure with an editable text page next to the source image.
ReplyDeleteMechanical Turk is a great example for how to do such a project. You have a certain number of tasks that get handed out with a certain time to completion attached and you work on it. You don't hit the metrics and the task goes right back into the undone pile. Tasks can be done multiple times to improve the certainty that the answer is the correct one.
ReplyDeleteIf you've got a budget for this, submit the pages to mechanical turk. If you're looking for enthusiasts to do it for free, replicate the workflow without incurring the overhead costs.
How about this? A suite of pages, one for each document, with an editable text box next to it. I would also recommend a history function, so that every iteration of the edit box is saved. If nothing else, this would protect against vandals. I could see anti-gun trolls trying to sabotage the project.
ReplyDeleteTM: Mechanical Turk is the way to do it. But how do I specify the HTML for a sequence of URLs?
ReplyDeleteI have genealogy skills including deciphering handwriting. Transcribing can be great, but you do need quality control as well. Most genealogical/historical transcribing projects include two transcribers per page (each doing it themselves) and an arbitrator for quality to compare the two transcriptions and selecting the most accurate.
ReplyDeleteYour best bet might be to approach a historical society or genealogical society to see if they might want to take up the task. I'd love to be a part of it if I could.