fairly secret: the tower of approach: OCR

Friday, October 01, 2010

OCR

I've said something to the effect before elsewhere but one of the pleasures of being a late technological adopter is taking delight in some technical sleight of hand that is utterly old hat for the world in general. I ran optical character recognition for the first time this morning and Great Scott, my computer can read hard copy!

This shows up here because lately I have been digging into my files in search of all the songs I've written outside of the daily projects, in service of having something to post during the interim - the current project I'm blogging ended on January 5, 2001 and the second daily writing project started on May 4, 2005. I've become attached to posting songs on the same date they were written, hence there's going to be a 4 month gap. I'm pretty sure I don't have anything like enough extant songs to post daily but it will at least keep the thing alive. For all its millions of avid readers.

Anyway.

So I've gotten through the easy stuff, organizing the songs I'd written on the computer in the first place or transcribed earlier for some reason, and then through the less easy stuff, cleaning up the songs I'd transcribed in the distant past and existed in file formats that had been mangled by subsequent technological drift. Leaving a substantial file of handwritten hard copy and a small stack of printed songs for which the originating files have apparently been lost.

There's nothing for the former but to get them transcribed, same as the Songs of Days books - humans can barely decipher my handwriting, let alone computers. But the latter is a source of irritation. I typed the damned things into a computer at some point, and furthermore I have gone to a certain amount of trouble and expense over the years to drag all my files kicking and screaming through the various shifts in computer technology. The hoops I jumped through when Apple abolished the floppy disk (and my next most recent computer had neither an optical disk writer nor a USB port) was ridiculous. And yet here are files that clearly were, their physical manifestation undeniable before me, yet gone. I have to retype the things.

But wait, I think, what about this wonderful Optical Character Recognition technology I've heard so much about? Now, commercial OCR has been around for decades, and consumer-level OCR has been around for, well, decades... I just never bothered with it. My current plight inspired me though, I fired up the scanner and started translating the pages into TIFF files, found a free service online (albeit with a Captcha requirement and a ten upload per hour limit) and literally like a minute later there it was, a paper document rendered to copyable, pastable digital text. Magic.

Easy as it was (the source material was ideal for OCR - clean, first generation print-outs on single pages), ironically I suspect that, between setting it up and figuring it out and (now) choosing to go online and write about it, if I'd just sat down and typed the things I'd probably have been done with this whole thing half an hour ago. But that wouldn't have been cool (even if it's only cool to me at this point).

Friday, October 01, 2010

OCR

No comments: