tag:blogger.com,1999:blog-2807403883562053852.post1810787643294474786..comments2024-03-27T08:40:31.785-06:00Comments on Clayton Cramer.: A Long Read With Something BizarreClayton Cramerhttp://www.blogger.com/profile/03258083387204776812noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-2807403883562053852.post-1581256436092848032020-10-23T09:54:57.860-06:002020-10-23T09:54:57.860-06:00Only pages 1, 2, and 478 are text. Pages 3 throug...Only pages 1, 2, and 478 are text. Pages 3 through 477 are scanned images. If you zoom in on any one of those pages, you can clearly see the artifacts from the scanning process.<br /><br />The PDF was created with <a href="https://www.adobe.com/support/pdfs/CapturePlugInHelp.pdf" rel="nofollow">Adobe Paper Capture Plug-in</a>.<br /><br />"You can use Adobe Acrobat with a scanner to create Adobe PDF files from paper documents. The resulting file is a PDF Image Only file—that is, a bitmap picture of the pages that can be viewed in Acrobat but not searched.<br /><br />"If you want to be able to search, correct, and copy the text from an Adobe PDF Image Only file, you can “capture” the pages in three file formats. Adobe PDF Formatted Text and Graphics, PDF Searchable Image (Exact) and Searchable Image (Compact) all apply optical character recognition (OCR) and font and page recognition to the text images and convert them to normal text. The Searchable Image file types have a bitmap image of the pages in the foreground, and the captured text on an invisible layer beneath it"Hal Dustonhttps://www.blogger.com/profile/03579515713751458192noreply@blogger.comtag:blogger.com,1999:blog-2807403883562053852.post-6411671750812791492020-10-22T19:25:58.664-06:002020-10-22T19:25:58.664-06:00Clayton, your humble informant here.... In the Dec...Clayton, your humble informant here.... In the Dec. 11th version (first version was Dec. 9th, which I did not see) <b>every single instance</b> of Comey's name was spelled "Corney," with kerned r + n. If memory serves, they were spread across 65 pages of the pdf. Before I wrote you about this the other day I went through my copy of the Dec. 11th version just to verify that that was the case. The next released version had all but two instances corrected to "Comey" with an m. <br /><br />And I agree with you that it's highly unlikely that any OCR process was involved in the preparation of this document, not here in 2019. brianhttps://www.blogger.com/profile/09135701234251124801noreply@blogger.comtag:blogger.com,1999:blog-2807403883562053852.post-23361687300078523572020-10-22T18:48:51.951-06:002020-10-22T18:48:51.951-06:00The number of documents which should have been pri...The number of documents which should have been printed directly to PDF but instead are printed then scanned is truly astonishing, and it's not just government that does it. At least with mostly-text documents, the OCR keeps the file size somewhat reasonable. Construction plan sets are horrible - sheets that are a quarter MB printed as PDF are 3 MB when scanned. And even small projects will have hundreds of sheets.<br /><br />There's a term for when the spacing between letters isn't right and makes it difficult to read: keming.Anthonyhttps://www.blogger.com/profile/12389602137217799305noreply@blogger.comtag:blogger.com,1999:blog-2807403883562053852.post-81891594868175858132020-10-22T15:39:57.270-06:002020-10-22T15:39:57.270-06:00Hal & RevGreg: For a 1970s document, they migh...Hal & RevGreg: For a 1970s document, they might have scanned it. All current Word documents would be exported as a PDF.<br /><br />J Melcher: That would be readily done with a keyboard macro.Clayton Cramerhttps://www.blogger.com/profile/03258083387204776812noreply@blogger.comtag:blogger.com,1999:blog-2807403883562053852.post-34166302843442483642020-10-21T21:36:27.803-06:002020-10-21T21:36:27.803-06:00I've seen things like that occur with optical ...I've seen things like that occur with optical character recognition from scanned documents, but why there would be scanning involved these days is anyone's guess. My low opinion of government's idea of efficiency wouldn't rule it out though.RevGreghttps://www.blogger.com/profile/11627218929198779685noreply@blogger.comtag:blogger.com,1999:blog-2807403883562053852.post-53171377054838482642020-10-21T20:37:46.194-06:002020-10-21T20:37:46.194-06:00I've seen this sort of thing in many, many PDF...I've seen this sort of thing in many, many PDF files. It is caused by the original source being a scanned image which is run then through OCR without correcting any of the inevitable errors.Hal Dustonhttps://www.blogger.com/profile/03579515713751458192noreply@blogger.comtag:blogger.com,1999:blog-2807403883562053852.post-64193842386564457102020-10-21T15:15:24.971-06:002020-10-21T15:15:24.971-06:00Would the r + n kerning job be possible with a &qu...Would the r + n kerning job be possible with a "search and replace" ?J Melcherhttps://www.blogger.com/profile/14349242761775214765noreply@blogger.com