To Thread - Forum Home

The Mudcat Café TM
https://mudcat.org/thread.cfm?threadid=60313
22 messages

Tech- Scan multiple pages to one file?

09 Jun 03 - 05:41 PM (#964728)
Subject: BS: Tech- Scan multiple pages to one file?
From: katlaughing

Anyone know how to do this? I want to scan a document of about 100 pages into ONE file in Word or some other wp program. If I scan and tell it to save to the same file, each time it asks if I want to save it by dumping what I've saved before.

Is there some way to do this? DO I need to open a file with multiple pages already set up, so that it sees there is room or what? I cannot imagine the stupidity of haveing to scan in 100 pages, then copy and paste each one into a single file doc!:-)

Any help appreciated!

kat


09 Jun 03 - 06:39 PM (#964773)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: NicoleC

You need two things:

OCR software (may have come with your scanner) that supports your preferred MS Word-like software. AND plenty of time to go through and correct all the OCR errors, of which their may be many.

Either a scanner with a document feeder that automatically creates one file, or your scanner software has to supports stitching pages.

Some software -- like HP's -- will scan directly to Word and will ask you if you wish to add a page. However, doing 100 pages one page at a time would be a pain. Your local copy shop probably can do this for you on a high speed machine in about 30 seconds, but they might have to do it to PDF.


09 Jun 03 - 06:46 PM (#964775)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: katlaughing

Hey! That's a great idea Nicole, thanks! I will call them. Also, thanks for the other info. I have HP, but it doesn't prompt me with any question. You're right about it being tedious, but I am used to it. I had to do the same thing with hundreds of vintage photos, although of course I wanted them each in their own file.

Thanks a bunch,

kat


09 Jun 03 - 07:05 PM (#964787)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: JohnInKansas

Kat -

One of my scanner programs prefers to put multipage scans all into a single file. The only problem with that is that it is the only program I have that can read the multi-page TIF it creates.

I generally use TextBridge for most text scanning on small jobs where I want OCR to convert to a real "text" file; and scan direct from Word. It gives you an "add page" button, so you can scan as many pages as you want before you turn it loose on the OCR conversion. (TextBridge has the best OCR I've found, but for larger jobs I usually scan to individual images and convert them one or two pages at a time - because of the heavy editing needed to get clean text.)

For things that I intend to leave as graphics, I usually do an "import TWAIN" from Photoshop Elements, and if you do a "preview" before you scan each page, it will keep all the previous pages so that you can save them each as separate files after you close the scanning program. If you save them in a format compatible with Word, you can then paste the "pictures" into Word to make a single document file.

If you're using a "generic" scanner program, you might try doing a "preview" before each "scan" and see if it will retain the previous scan(s). This is a common "this is a new page" code to many scanner programs. Otherwise, you're obvious option is to save each one before you scan the next - and paste them into Word (insert - picture - from file) to get them as a single multipage file. For most "inserts" into Word, jpg files work well and make somewhat smaller documents than tif or bmp pictures. (Theoretically, the latest Word cannot handle a single file over 32MB, but I've got several at >105MB or so that haven't crashed yet.

John


09 Jun 03 - 07:24 PM (#964800)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: JohnInKansas

Kat -

The simple answer to your original is to do a "Save As" instead of just save. This gives you the chance to put a new filename in the box. When you put the first filename in - something like "docPage001," you can highlight the new filenam right there in the box and do a Ctl-C to copy it.

Then when you Save-As the next scan, you just Ctl-V to paste the name in the box, and "roll" the last digit to "docPage002," etc.

If you have to place the pages on a scanner one at a time, the extra 3 or 4 seconds to do a previw before each scan isn't going to add that much to the time it takes to get your hundred pages scanned. If you don't have to have "real text" it doesn't really take long to paste 100 pages of images into Word, if you've set up your document margins to "take" the page size you're pasting.

If you really need OCR converted "real editable text," the time you'll need for all the corrections is probably going to be your biggest problem, whether you do single page or multiple page images - and I'd suggest doing it a very few pages at a time, simply because the corrections are a lot simpler that way.

Your print shop may scan your book for you, but they probably won't do the editing needed after conversion.

John


09 Jun 03 - 07:52 PM (#964816)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: Gareth

John has some interesting points.

Now in the past, and no doubt in the future
, I have had to scan electoral registers (UK) - I've used TextBridge v9 for this into Access. Unfortunately there is no alternative to manual corrections.

On text one of the problems with T'Bridge 9 is that line heights and fonts or not consistant. As most of my bulk scanning for text ends up sas HTML I find that it is easist to paste the text into a table produced with "Hotmetal", this eases manipulation of font, sise, structure etc, and formate and edit from this.

Judge for yer selves, the following Click 'Ere was the result of that process on a very long legal document, in this case faxed then photocopied.

Just my thoughts,

Gareth


10 Jun 03 - 01:01 AM (#964889)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: JohnInKansas

I've found that the best OCR conversion comes from rather carefully selecting simple blocks of text, so as to eliminate ALL pictures, lines, squiggles, changes in font size, and strange characters. It's fairly easy to scan a whole page to tif or jpg and remove the strange stuff in an image editor, or you can select what to scan - although it often means several separate scans per "page."

The few OCR programs I've used seem to belch on headlines, bars, funny "dotted list" symbols, pictures, and often throw anything the don't "recognize" to the end of the "document." Strange indents - or even misaligned scans often result in frames around stuff that make it very difficult to tell what comes next. And if you're editing the finished product in Word, all of those frames pretty much have to be removed before you can get to much of the text to fix it.

If you clean up the scans before sending them through your OCR, most of the frames can be avoided - and the conversion will be much cleaner.

For really difficult text, I've resorted to scanning (or cutting from the page scans) each paragraph to a separate file. Headlines should have their own file. All pictures should of course be separated out.

Once everything is reduced to plain "pictures of block text" TextBridge (or most any other) OCR lets you input any number of files, and will "recognize" them and put them in the order you've entered them. And once you've eliminated the "funny stuff," (largely by removing the "formatting" and "graphics") the conversions can be amazingly accurate. Your spell checker should get most of them.

John


10 Jun 03 - 01:09 AM (#964892)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: JohnInKansas

If you "recognize" to Word, the procedure I've found most effective is:

(1) Remove all frames. You do this by getting your cursor into a frame and selecting "Format" "Text Box" "Remove Frame." After you remove the first frame, a "right arrow" will often move you directly to the next frame - but it's not guaranteed.

(2) Get rid of the soft hyphens that most OCRs use in place of line-wrapping. These are usually an ASCII 030 character, and you can do an "Edit" "Find" "Find ^030" Replace with "nothing - a blank box" and "Replace All."

(3) Select the whole document (Ctl A) and change everything to the same font and the same point size. (Some OCRs often read blemishes on the paper as very small letters.)

(4) Run your spell checker - or visually find the errors and fix them.

(5) Reformat however you want.

John


10 Jun 03 - 12:02 PM (#965133)
Subject: SORRY - Lost my Cookie!
From: GUEST

Hmmm...I don't know if I have an OCR, or would it be a given with my scanner? At any rate, thanks for ALL of the info. At the moment, all I am dealing with are typewritten pages, done back in the 60's by my aunt. I'll try a couple of the things listed here, but I am beginning to think it would be easiest and no more time consuming to just type them in myself. That way I can do some intended editing along the way and NOT have to deal with the idiotic scanning errors which show up in text so often.:-)

I always learn so much from you folks. Thanks a bunch!

kat


10 Jun 03 - 12:27 PM (#965142)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: NicoleC

Kat -- Do you need to EDIT the text, and scan to an actual word processing file? Or do you just want the images all in one document for easier storage and viewing, like a PDF file or a multi-page TIFF? You don't need to muck about with OCR if the latter is the case.


10 Jun 03 - 12:41 PM (#965155)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: Stilly River Sage

Kat,

In my HP scanners the choice of type of file comes once the initial scan has taken place and I choose Output Type (and I save it to a text file). The computer then asks if I want to save new pages to the same file.

If you can save it to Notepad it might be easier, because Notepad is text only, not WYSIWYG like Word, so it doesn't try to turn what was scanned into a huge variety of symbols, of which the letters of the alphabet are only 26!

SRS


10 Jun 03 - 01:14 PM (#965188)
Subject: RE: BS: Tech- Scan multiple pages to one file?
From: katlaughing

Well, if I look at Save As in my scanner, I do see an OCR folder, but there is nothing in it.

SRS, when I choose to scan in my HP scanner, it asks me, beforehand, what type of file I want to scan to, i.e. Corel, text, WordPad, etc. I have to choose one of those before scanning. I don't see any Output Type prompt. Where would that be? Thanks!:-)

This scanner is about 5-6 years old, so maybe out-of-date compared to some you all may be using?

kat


10 Jun 03 - 11:35 PM (#965560)
Subject: RE: Tech- Scan multiple pages to one file?
From: Joe Offer

Hi, Kat - if you have Word 2000 or newer, it may have installed a folder on your "start" button called "Microsoft Office Tools." the folder has nifty "Document Imaging" and "Document Scanning" utilities. Both utilities can save multi-page image files, and "Scanning" does a very nice job of OCR.
I looked all over for utilities that would do what I wanted - and I found they were already installed on my computer. Made me feel dumb that I missed them, but they sure do great work now that I've found them.
Your scanner should be fine - it's the scanner software that does the tricks.
-Joe Offer-


10 Jun 03 - 11:39 PM (#965562)
Subject: RE: Tech- Scan multiple pages to one file?
From: katlaughing

Thanks, Joe, hanging my head to say I am way behind Word; mine's about the same age as the scanner! We are looking at upgrading the harddrive and software this summer, though.:-)


10 Jun 03 - 11:45 PM (#965565)
Subject: RE: Tech- Scan multiple pages to one file?
From: GUEST,.gargoyle

~Laugh it up Kat...Laugh it up~!

~Sincerely,
Gargoyle

The "gods" must be crazy to have granted a LaughKat - "clone-status!"


15 Mar 12 - 12:24 PM (#3323204)
Subject: RE: Tech- Scan multiple pages to one file?
From: JohnInKansas

I was thinking of posting a note about a NEW FIND in scanners, and this really really old thread is about the only one that came up when I searched mudcat. I'm sure there are lots of other better ones, but I found this one interesting because of the changes in equipment available now, which border on what we'd have thought fantastic when the thread was new.

I've been running close to 1,000 pages per day through my newest scanner (less than 6 months old), bought specifically so that I could "digitally archive" bunches of old books that I occasionally use for reference, but that take up more shelf space than we've got. It's a "business class" scanner and doesn't do a particularly great job on pictures. Color is fair but gray-scale stuff often "solarizes." It only runs up to 8.5" wide, but I can flop 50 sheets on it, and it feeds and scans both sides in a single pass, without even needing to turn the sheet over. It can save an individual .jpg of each page or a .pdf of the whole bunch. And it came with a pdf maker program (that I upgraded to a little more complete one) that automatically does the OCR to make editable text when you save as pdf. The scanner can save directly as .jpg, .pdf, .tif, multi-tif, or .bmp. The (upgraded) pdf program lets me merge mixed bunches of any of those into a single .pdf, and once a "document" is created as a .pdf the program can save-as any of 17 different fomats. On reasonably "clean text" the whole setup makes very few OCR errors - compared to what was available not very long ago.

For the pages I want better pictures on, I've been using a 12(?) year old flatbed scanner/printer that's about to blow it's lamp, so I've been looking around for a replacement. I thought I found a good one, but "she" absconded with it; and we've found some unpleasantness with scanning using the "built-in" WIA interface. I think I can rig it for TWAIN, which would solve the little problems, but that's still "on the menu for later."

The other need for a flat bed is for the stuff that's too wide to go through the document feed, and I'd almost given up on finding any scanner that will scan more than 8.5" wide, for less than about $3,000 (US). It takes lots of scans to capture the whole of something that's 10 x 12 or so from an 8.5 x 11 bed, and good as it is, photomerge to put a "big picture" back together isn't always perfect, so the more pieces you have to stitch together the scabbier it gets.

The news I originally thought I wanted to post is that when I went back to check specs on a scanner I thought might be good enough, I found that Epson now has a couple of "wide format" machines in their lineup, that while not "cheap" are not unreasonable for what they claim to do. The one I ordered has a flatbed that can scan 11 x 17 (US B size), and it can print 13 x 19 (called Super B by some here) - A4 to A6 for the Euro folk. Somewhere in the notes it said it can print 13" x 40" but they didn't say how the $#@% you feed the paper in it to do that. Since I was pondering $200 for a scanner and another $150 for a decent printer, I figure the $299.99 for this beast isn't too bad a hit.

Since it's a "business class" machine, I'm not expecting the prints to be "photo quality" but samples they showed are definitely "magazine quality." The scan specs are good enough that scans may print photo grade on "her" (sniff) photo printer.

And they claim it does hold 500 sheets of letter size in the magazine, and it has a 30+ sheet(?) automatic document feed (ADF) for the scanner if you don't want to use it as a flat-bed.

I won't get the thing for about a week, so I can't actually recommend it until I get some trial runs, but anyone interested will have some idea where to look.

Almost any scanner or multi-purpose machine you can buy now does come with "scan to pdf" and OCR software, and they (mostly) do a pretty good job of both. It's the other "details" that you have to watch.

So if we ignore what Microsoft has been doing to us recently, the world is getting a little better.

John


15 Mar 12 - 01:38 PM (#3323220)
Subject: RE: Tech- Scan multiple pages to one file?
From: Nigel Parsons

JohninKansas;
Rarely do I get a chance to improve on one of your technical posts, but: ...
It takes lots of scans to capture the whole of something that's 10 x 12 or so from an 8.5 x 11 bed, and good as it is, photomerge to put a "big picture" back together isn't always perfect, so the more pieces you have to stitch together the scabbier it gets.

Surely only two scans are needed at 6.5*11 giving you a little margin for error & only two pieces to marry up.

Cheers

Nigel


15 Mar 12 - 02:09 PM (#3323238)
Subject: RE: Tech- Scan multiple pages to one file?
From: Newport Boy

John - A small correction to your "Euro" sizes (actually ISO sizes).

The one I ordered has a flatbed that can scan 11 x 17 (US B size), and it can print 13 x 19 (called Super B by some here) - A4 to A6 for the Euro folk. Somewhere in the notes it said it can print 13" x 40"

11 x 17 approximates to ISO A3 (11.7 x 16.6) which is double the size of A4 (11.7 x 8.3 - equivalent to US Letter). A6 would be 4.15 x 5.85.

The whole system starts with A0 at 1 square metre. All sizes are the same proportions, so you just half the large side to get the next size down.

My 'A3' printer is actually a 13 x 19 printer, and the input tray is also marked for 11 x 17, Letter & Legal.

Phil


15 Mar 12 - 02:24 PM (#3323244)
Subject: RE: Tech- Scan multiple pages to one file?
From: JohnInKansas

Eagle-eye Nigel caught my typo. I intended to say 20 x 12.

Anything over 8.5 wide takes two scans each end on most scanners, and if it's more than 11 long it's at least another two. Since most flatbed scanners have a "lip" at the edges of the glass, if the original goes past an edge there's a "curl" that sort of warps the scan, and you have to crop that off, so the maximum single scan if you're cutting out sections is actually about an inch less than the glass size. For a decent photomerge you need some overlap at all the joints, so take another inch off for the "usable chunk."

On an 8.5 x 11 scanner, it takes a minimum 6 cuts to get all of a Playboy centerfold - just as a randomly selected example.

Incidentally, based on the one Playboy I scanned (for a couple of articles1, of course) I note that it's the ONLY thing I've run into for which I absolutely had to turn on the "DESCREEN" filter to get anything but stripes and mud where the pictures were. Screened images used to be the rule, but other books and magazines I've scanned recently apparently use something closer to inkjet (blended colors?) than to the old style color separations. Turning on that filter, on my old flatbed, makes each scan take about 8x as long as without that filter, which adds significantly to the annoyance factor.

1 One of the articles was an interview with Rush Limbaugh (1994) and I thought I might want to be able to reference it here when we were talking about "how long has he been an a*hole," but I think that thread died.

John


15 Mar 12 - 02:38 PM (#3323251)
Subject: RE: Tech- Scan multiple pages to one file?
From: JohnInKansas

Newport Boy - I copied the ISO sizes off of a fact sheet. Those might have been what fits in the magazine and the printer does handle other sizes. (13" x 40" is mentioned). Sorry I didn't check more closely.

The US convention, as used for engineering drawings, is similar, with an A size being 8.5 x 11. For each increment larger the short dimension is doubled, so:

A = 8.5 x 11
B = 11 x 17
C = 17 x 22
D = 22 x 34
E = 34 x 44 (not used much)

For anything larger you'd go to an "R" with a constant 44 inch top to bottom and multiples of 11 inches for the length. Common lenghts are 77" and 132." For the R sizes it's customary to add about 2" - 3" for "wear strips" all the way around, and some smaller sizes often have some added margin; but that part of it isn't really too standard.

We also see "legal size" at 8.5 x 14, but legal size is illegal for anything submitted to the US Supreme Court, so it's used only by cheap lawyers and real estate agents now.

John


16 Mar 12 - 01:14 AM (#3323451)
Subject: RE: Tech- Scan multiple pages to one file?
From: Stilly River Sage

We have librarians who are expert at taking apart books (cutting off the binding and cutting the signatures apart), feeding them through scanners to get the entire book in just a few minutes, then compile it into a single PDF file. I'm sure the librarians here aren't alone in this skill. I would check with your librarians in the area if you want to do this. Our machine isn't that high end - it's an office model, yes, but I don't think it cost thousands. Probably hundreds.

Once the book is scanned, it is sent out to a binder and rebound (perfect binding this time) and it's ready for the stacks.

SRS


16 Mar 12 - 06:40 AM (#3323535)
Subject: RE: Tech- Scan multiple pages to one file?
From: JohnInKansas

Stilly -

The taking apart and scanning is exactly what I've been doing a lot lately, but since my whole purpose is to get the bound books off the shelf I don't need to worry about re-binding them. Getting the pages out "clean" is still pretty critical to getting them to feed through the ADF scanner, though; and a lot of the older ones would be far too brittle to be rebound anyway.

I am finding numerous different bindings, some of which one of my references says were "not used after XXXX" with dates somewhere in the 1700s. That's for books published ca. 1950????!

I think those deviant ones were mostly printed in "Britain" shortly before it was the "UK," but I'd have to look at the shards to be sure, and they've mostly gone to the recyclers.

Incidentally, Epson is apparently pretty hungry. I just received a notice that my new order has been shipped - sent within 2 hours after I hung up from placing the order. (But it apparently took AT&T another 4 hours to send the email on to me.(?))

And I did manage to get Lin's (formerly my - sob/sigh) new Canon printer/scanner hooked up to run in a TWAIN interface into her Photoshop Elements 9. The hookup was quite simple - just requiring one file to be copied into the right place. Finding the instructions took about 3 weeks, since they're not on the Adobe or Canon sites as far as I could find, or in any Canon instructions that came with the beast. (And the Canon User booklet recommends using only WIA, which in this case at least is a piece of CRAP.) The TWAIN looks good, although I haven't tried loading it down.

John