The Mudcat Café TM
Thread #20301   Message #1712802
Posted By: JohnInKansas
07-Apr-06 - 06:07 PM
Thread Name: Tech: Question - Scanners
Subject: RE: Tech: Question - Scanners
Joe - (I think I'm looking only at your recent questions)

For text, TextBridge OCR can import any .jpg image file for conversion, so you can, at least in principle, use your digital camera and download the images to the computer. If you're saving in some other format, use any decent image editing program to convert to .jpg.

I'm assuming that a different OCR program should have similar ability to open files from your computer, recognize them, and paste them into your wordprocesser program, and likely will use .jpg as the "native" program format. You may need to adjust for your OCR program's capabilities.

You'll likely get the "flattest" image by setting the camera up square to the page a fair distance away. You get more distortion, and uneven focus across the with of the page, with a closeup.

From a sufficient distance to wipe out the closeup distortion and get even focus across the page, you'll probably get the page and a fair amount of background. Most digital cameras now have more than enough pixels to take a shot saved at maximum (or "high" with a better camera than mine) resolution from a fair distance and crop off any "background," leaving what's in the middle (and fairly flat) with enough resolution for OCR when resized back to the original page size.

With most bindings, the page you're taking the shot of will lay flattest with the opposing page raised a bit, so you may want to set up a way to support the "other side of the book" in a slightly elevated, and adjustable, position. If you use a tripod to support the camera, you may want the whole work surface tilted so it faces "square to the camera."

Sharp focus on the page you want will help, and since lots of digital cameras "autofocus" using visble or near visible light, the page you're shooting should be fairly brightly lit, but with care to eliminate any glare. (You may have other means of focus control on your camera, but a fairly bright scene is needed for mine to home in accurately.) I'd suggest a "stand" or at least a tripod to setup your camera, and in any "fixed setup" you should use a cable release to avoid jarring the camera when you click, if your camera is equipped for one (mine isn't).

OCR usually works better with low to moderate resolutions in the images being interpreted, so your photos, after cropping to page size should be resized so the image has about 150 dpi (or even a bit less) at the the original page size. It also helps, sometimes, to be sure that the final .jpg is B/W rather than color or grayscale.

A camera is usually a bit less sensitive at picking up bleed-through of what's on the back side of the page, but it still may help to slip a sheet of colored paper under the page being copied. The color should be similar to the color of any text/markings on the back side of the page, so usually black is the color. You may be able to use your camera's flash, but since "facing the page squarely" is fairly critical for best results, it may produce too much reflected glare - in which case you'll want other fairly bright lighting.

It's the filetype that matters, so far as your OCR program should be concerned, so you can scan a newspaper page and save it as .jpg to get an image you can use to figure out how to get your OCR program to import "files from disk" and put them in your text editor/wp program.

Note - I've had some success with photo-to-OCR, but my camera demands brighter lighting for sharp focus than I've had for my jury-rig experiments. You may need to build yourself a good setup with a flat place to lay the book, support for the "other side" of the book, some added light, and a stand/tripod to get the camera in consistent position, in order to get really good results. And learn to push the shutter button slowly enough to let the camera autofocus home in.

Since you will probably need to do some processing on your camera images, your setup can be "one-sided." A 180 degree rotation of an image is lossless, so if you take a picture "upside down" your photo editor can turn it around with no effect on the image quality. (Rotations of anything except in 90 degree increments does (theoretically) induce a minor amount of "blur," but you'd have difficulty finding it in most photo images.)

Hand-scanners have all but disappeared from the market. The only ones who may use them are people who "live out of their laptop," and most of them rely on "the company they're visiting" to provide scanning. My experience with one back when Win95 was "modern" was not impressive, although current ones should be more usable - but are rare and much more expensive.

The library/archive scanner setups I've seen are very expensive - starting at around $3,000 for a minimal setup. The are -in essence - just a digital camera with an enormous focal plane shutter, and expensive dedicated software. Your camera, with a careful setup, can simulate, but probably not equal, what they can do; but you probably can get results that will satisfy your needs.

You'll need to verify that your OCR can import files from disk for reading. Verify that .jpg files work or find what other format can be used. Take careful pictures, and do a bit of processing. You'll soon (wry grin) have everything archived digitally.

And do make your "capture" setup "user friendly." Leaning over an awkwardly place scanner - or camera or book setup - can do horrrrible things to the backsides of ol' farts like us, especially if it's repetitive for long sessions.

John