LISTSERV - NISUS Archives - LISTSERV.DARTMOUTH.EDU

NISUS Archives

October 2010

NISUS@LISTSERV.DARTMOUTH.EDU

	LISTSERV Archives
	NISUS Home
	NISUS October 2010

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: OT: A writer writes
From:	Kino <[log in to unmask]>
Reply To:	[log in to unmask]
Date:	Wed, 20 Oct 2010 03:40:06 +0900
Content-Type:	text/plain
Parts/Attachments:	text/plain (18 lines)

On Oct 19, 2010, at 7:38 PM, THDW wrote:

> so I will probably be doing everything by hand.

Although I have never tried to process a scanned image with an OCR software, I have some experience in proof reading and correcting document files created from such images by someone else -- not a file in reality but 280 (?!) separate files, oh well. Then, from my own poor but disastrous experience, I'd like you to recommend...

1. If you can, exclude the header and footer when scanning your books. It is a bit tedious to remove those texts from OCRed file(s) even if you are familiar with regular expression.

2. The first thing you should do against OCRed file(s) is to apply a colour or something very visible on all numerals ("Find All AnyDigit" in PowerFind). "I" and "l" are often recognized as "1" (one) and "O" (uppercase o) as "0" (zero) and vice versa by some OCR softwares, it seems.

3. OCR softwares tend to fail in identifiying the case for isolated characters such as "p" in "p. 135" which is often recognized as "P. 135".


Kino
--

This is not directly related to your problem but a while ago I was asked to check French quotations in the third proof of a Japanese book to be reissued and was astonished to happen to find that not a few number of "he" in Hiragana ("be" and "pe" as well) are treated as "he" in Katakana and vice vesa. Indeed they look very similar in many fonts.

ATOM RSS1 RSS2

LISTSERV.DARTMOUTH.EDU