The Mudcat Café TM
Thread #118075   Message #2549823
Posted By: Joe Offer
26-Jan-09 - 07:47 PM
Thread Name: Tech: OCR Tips - Optical Character Recognition
Subject: Tech: OCR Tips - Optical Character Recognition
Somebody has a nice songbook in PDF format, and wanted to post a number of songs from the book here at Mudcat. Since the book was PDF, the member said that it couldn't be copy-pasted here.
Well, it can't be posted here as an image, but it's relatively easy to OCR just about anything, and post it here.

The tools I like best for OCR are Microsoft Office Document Imaging and MS Office Document Scanning. The come with Word and Office, but they are not installed on normal installation. Go to Control Panel/Programs and select "uninstall a program." A list of programs installed on your computer will appear on your screen. Highlight Microsoft Office, and an option to "change" the program will appear. Select "Office Tools" and choose to run MS Office Document Imaging from your computer. Office will then install the program, and you'll find it under "Office Tools" on your list of programs.

MS Office Document Scanning operates your scanner and takes an image of the document. Then you can highlight the portion you want to copy, right-click, and select "copy" to get OCR text that you can paste into a Mudcat box or word processing document.

Here's a way to OCR PDF files - Adobe Reader allows you to highlight and copy an image from a PDF document. You can take that copied image and paste it into Microsoft Office Document Imaging (mspview.exe) - go to page/paste page. That program does a pretty good job of OCR, and then you can just paste the text into a Mudcat message box (or Word document) and edit it. Sometimes, it comes out with no OCR errors at all (but that's rare).

-Joe-