NISUS Archives

November 2010

NISUS@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Robert B. Waltz" <[log in to unmask]>
Reply To:
Date:
Fri, 5 Nov 2010 15:54:01 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (31 lines)
On 11/5/10, Andrus wrote:

>It would seem that a space followed by a return would then still remain, and be a problem.

Not if it's from an OCR document. Or shouldn't. If the OCR program thinks it's the end of a paragraph, it won't put a space there. But if it turns out it does, it's still easy.

Change

((A-Z a-z)) (space) (1+) (return)

to

(Found1) (return)

>Separate issue, but I would not want ANY changes made without specific individual approval.  at least not changes with any vagueness to them, such as this.  There's always something we forgot, like an occasional colon, or such.

Odds are that *any* rule which can be stated in a single one-line expression will have a few problems. Either Kino's detailed way or my simple way.

But there is nothing to prevent the person from doing the changes individually instead of a global change. In general, I agree with you -- unless the number of hanging lines is extremely high. If it is, the person will probably start clicking the replace button so quickly that he'll start making mistakes. In that case, it might be better to start with the global change and then re-insert line breaks. It really all depends on what the OCR program did.

>One thing about PowerFind Pro, is the actual criteria can be sent in email.  So it has that advantage even if it can be done otherwise.

True, but then people don't UNDERSTAND it. :-) What I put above cannot be cut and pasted (for one thing, it uses parenthesis in two different ways), but it's clear to anyone who has even casual acquaintance with the ideas of regular expressions. What I supplied could, e.g, be easily converted to BBEdit notation, even though BBEdit uses a somewhat different REGEX syntax. And, based on the description given, it would be adequate to solve the problem. You could also do it Kino's way (if you tweak the find set properly), but it's a lot easier to make a mistake in a complicated regular expression than an easy one. :-)

-- 
Bob Waltz
[log in to unmask]

"The one thing we learn from history --
   is that no one ever learns from history."

ATOM RSS1 RSS2