The Mudcat Café TM
Thread #20301   Message #860820
Posted By: JohnInKansas
07-Jan-03 - 01:33 PM
Thread Name: Tech: Question - Scanners
Subject: RE: Tech, Question - Scanners
Also at Joe - your very old post May 00 -

I've found that taking a color scan - even if the source is black and white, then putting the color image into Photoshop Elements and converting to black/white will take most of the yellow out of old pages, and usually leaves a pretty clean image. If there's too much "gray" left where the yellow was, you can do an image "adjust threshold" on the black and white to take the gray out - and also get rid of small specks that can interfere with the OCR.

I've found that one of the most critical factors in how well OCR works is that the scan has to be "square to the page." Unfortunately, some of the old books weren't printed that way - wavy lines and slopes across the page, etc. TextBridge does (usually) a "straighten" operation before starting the OCR conversion, but it can only handle small amounts of skew automagically.

Another thing that sometimes confuses the OCR is "bleed though" from what's on the other side of the page. It may be there - enough for the program to see - even if it's not obvious to the eye. You solve that one by using a sheet the same color as what might bleed placed on top of the page being scanned - e.g. black paper as a backing for newspapers - so that the scanner sees a uniform color (any uniform color works) reflected off the back of the sheet. Helps a lot sometimes.

Of course, since you asked two years ago, you've probably learned more tricks than I have by now ...

John