NISUS Archives

October 2010

NISUS@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Philip Spaelti <[log in to unmask]>
Reply To:
Date:
Wed, 20 Oct 2010 02:01:23 +0900
Content-Type:
text/plain
Parts/Attachments:
text/plain (52 lines)
On 19. Oct 2010, at 19:38 , THDW wrote:

> 
> On 19 Oct 2010, at 06:20, Philip Spaelti wrote:
> 
>> My first question is why do you need to OCR them? You wrote them (with Nisus?). Don't you have the files?
> 
> Thanks for the input.
> 
> No, I can't spare any copies so I will probably be doing everything by hand.
> 
> The first three books were written on an Apple IIe and I have lost the ms. Not that they would be much use, as the books were edited by the publisher into something fairly different.
> 

Bummer.

Well if you are serious about scanning/OCR, here is some more detailed info.

First of all, OCR software allows direct scan input, but I would suggest to forget about that. Instead I would scan everything to image files first, and save the image files. I have tried both 300dpi and 600dpi, and generally the higher resolution does help. For saving the images I'd recommend .png as the file type. TIFFs are just way to heavy, but you'll have to check what type of image files the scan software will accept.


> I have Omnipage SE on my machine. Is this sufficient ? I have rarely used it.
> 

Omnipage? Is this a OS-classic program? I used to use Omnipage back in the OS-9 days, but I think nowadays it is only available for Windows. Omnipage is actually very good, and frankly I don't think there has been any real improvement in this area. Since I switched to OSX I stopped using Omnipage and now I use ReadIris Pro. But either way I try to spend as little time with the OCR software as possible. OCR software has lots of features, but most of those features seem to have zero effect on the actual output. In terms of the results your time is probably better spent trying to get the best quality, and cleanest scans you can in a reasonable amount of time.

The OCR software will usually try to straighten the page before working on it. When you scan from a book you get images which are double pages, and it is practically impossible to get images with both pages straight. For this reason I split the scanned images in two using Graphic Converter, before doing OCR on the images. That way the OCR can straighten the pages individually. Graphic Converter allows you to batch process images, so this can be done automatically. If you don't split the pages, the OCR software will need to treat the scan images as multicolumn text.

Hope this helps.

Philip

> Best
> 
> T
> 
> THDW
> [log in to unmask]
> 
> 
> 

Philip Spaelti
[log in to unmask]





Philip Spaelti
[log in to unmask]

ATOM RSS1 RSS2