The Mudcat Café TM
Thread #139005   Message #3420025
Posted By: JohnInKansas
15-Oct-12 - 02:40 AM
Thread Name: Tech: Hand Held Scanners. Anyone Use Them?
Subject: RE: Tech: Hand Held Scanners. Anyone Use Them?
Joe -

I'm not familiar with "Microsoft Office Document Imaging" and nothing like that appears in my program list - at least where I can find it. All my scanners have had their own dedicated control panels, most of which include OCR conversion if you select to do it.

The couple of OCR programs I've used make it pretty clear what's happening - or at least what sounds like what you describe.

I usually save and correct in Word, so the following is what you see there - if you've turned on the options to show all the layout stuff:

Plain text that is fairly "orderly" gets converted as plain text and is mostly easy to deal with. It's a little annoying that the OCR almost always uses a paragraph break at the end of every line, so stuff isn't in logical paragraphs, but that's pretty easy to handle.

Especially for stuff converted and saved back into .doc or most other text based programs, anything that's a little "out of line" in the scan frequently gets split into columns even if it was all one column in the original. The left column is the first parts of the lines but the right column is the second parts of the same lines, so if you reformat to single column you get "half sentences" from the column that had the starting half-a-lines, followed by "other half lines" with "other ends of the lines" all stacked up at the bottom.

Sometimes you can convert to a table and "reassemble" the lines in the right order by cut-n-paste, but it's sort of a crap shoot. You can "merge columns" so that when you convert the table to text everything is lined up (although I generally do it a little differently).

Things that are even more "out of position" (in the opinion of the converter) are frequently put into either frames or text boxes, and the location in the converted document is wherever the converter puts the box/frame. If you "remove frame" the contents of the frame remain in the document, but can go just about anywhere in the document, so you have to chase them and cut/paste back where you want them.

If you just "remove/delete" a text box, all the contents are simply deleted. Theoretically you can copy what's in the box and paste it outside the box and then just delete the empty box; but it's often hard to get all of the content, so the safer method is to "find the spot" on the text box border to right click and get a "format text box" window which includes an option to "convert text box to frame," after which you find the spot on the frame border to right click to get a "format frame" box which includes a button to "remove frame." That should always keep everything, but it may "jump it" to strange places.

Excessive use of columns, frames, and text boxes by converters is the most frequent reason for everything being scrambled when you try to copy text and paste into a "just text" document, or for direct conversion to text. In a program (like Word) where you can turn on the option to see all the non-printing stuff on screen it's fairly easy to see what happens. Fixing it is just a P.I.A.

Your mileage may vary, of course; but nobody really gets good mileage out of this stuff. All of the bragging about "easy format conversions" is pretty much a bunch of fish stories, and conversion from scans (images) to textish formats just doesn't work really well with any programs I've seen. You can get "useful" results, with a little bit of cleanup; but if you're fussy about it, it can take a lot of work to get "satisfying" ones.

John