NISUS Archives

April 2017

NISUS@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Date:
Wed, 19 Apr 2017 04:18:32 +0900
Content-Type:
text/plain
Parts/Attachments:
text/plain (78 lines)
Very kindly, a lister pointed out to me that my "Ignore Diacriticals Arabic" macro fails for some zero-width Arabic characters.

As he gave me examples and detailed description of the bug, I could fix the problem in rather short time. I uploaded the macro file as <http://www2.odn.ne.jp/alt-quinon/files/NWPro/arabic/IgnoreDiacriticalsArabic_v2_20170418_nwm.zip>,

This version may still contain some shortcomings and newly introduced bugs. If you will have noticed something irregular, please send a bug report to "[log in to unmask]" (not "[log in to unmask]" which does not work) or to my email address.

Thanks,

Kino

--

### Ignore Diacriticals Arabic (ver. 2) ###

Require Pro Version '2.1'

/*
This macro reads Arabic text from the Find panel’s find field and transforms it so that you can find it with or without diacritical marks and other zero-width characters, regardless of letter form variants.
The macro changes Find mode to Power Find Pro required by the transformed Arabic text.
The macro removes attributes from the text in the Find field. Please add attributes after having run the macro if necessary.
*/

$find = Read Find Expression
# Get Find expression from the Find panel
if $find == @undefined
	exit 'Find field is empty! Please type Arabic text before running the macro, exiting...'
end

$find = Cast to String $find # Make $find plain text

$d = @S|(?<d>[\x{640}\x{64B}-\x{65F}\x{670}\x{6D6}-\x{6DC}\x{6DF}-\x{6E8}\x{6EA}-\x{6ED}\x{8E3}-\x{8F6}\x{200C}\x{200D}])|
# Caracter set consisting of "U+0640 (TATWEEL), diacritical marks other zero-width characters, U+200C (ZERO WIDTH NON-JOINER), and U+200D (ZERO WIDTH JOINER)"

$find.replaceAll $d, '', 'E-i' # Remove all $d from $find

$findOrig = $find # Make a copy of $find as $findOrig
$find.replaceAll '(?<=[\x{621}-\x{63A}\x{641}-\x{64A}\x{671}-\x{6D3}\x{6D5}\x{6EE}\x{6EF}\x{6FA}-\x{6FC}\x{750}-\x{77F}\x{8A0}-\x{8BD}])', @S{\\g<d>*}, 'E-i'
# Put \g<d>* just after each Arabic letter

if $find == $findOrig # i.e. "if $find has not been changed"
	exit 'No Arabic letter in the Find expression, exiting...'
end

$find = $d & @S<{0}> & $find # Construct $find

$letterForms = Hash.new # For letter form variants

$letterForms{0x0622} = $letterForms{0x0623} = $letterForms{0x0625} = $letterForms{0x0627} = $letterForms{0x0670} = $letterForms{0x0671} = $letterForms{0x0672} = $letterForms{0x0673} = $letterForms{0x0675} = @S<[\x{622}\x{623}\x{625}\x{627}\x{670}-\x{673}\x{675}]>  # alif

$letterForms{0x0624} = $letterForms{0x0648} = $letterForms{0x0676} = $letterForms{0x0677} = $letterForms{0x06C4} = $letterForms{0x06C5} = $letterForms{0x06C6} = $letterForms{0x06C7} = $letterForms{0x06C8} = $letterForms{0x06C9} = $letterForms{0x06CA} = $letterForms{0x06CB} = $letterForms{0x06CF} = @S<[\x{624}\x{648}\x{676}\x{677}\x{6C4}-\x{6CB}\x{6CF}]>  # waw

$letterForms{0x0626} = $letterForms{0x0649} = $letterForms{0x064A} = $letterForms{0x0678} = $letterForms{0x06CC} = $letterForms{0x06CD} = $letterForms{0x06CE} = $letterForms{0x06D0} = $letterForms{0x06D1} = $letterForms{0x06D2} = $letterForms{0x06D3} = @S<[\x{626}\x{649}\x{64A}\x{678}\x{6CC}-\x{6CE}\x{6D0}-\x{6D3}]>  # ya

$letterForms{0x0629} = $letterForms{0x0647} = $letterForms{0x06C0} = $letterForms{0x06C1} = $letterForms{0x06C2} = $letterForms{0x06C3} = @S<[\x{629}\x{647}\x{6C0}-\x{6C3}]>  # ha

$letterForms{0x0643} = $letterForms{0x06A9} = $letterForms{0x06AA} = @S<[\x{643}\x{6A9}\x{6AA}]>  # kaf

$letterForms{0x0660} = $letterForms{0x06F0} = @S<[\x{660}\x{6F0}]>  # 0
$letterForms{0x0661} = $letterForms{0x06F1} = @S<[\x{661}\x{6F1}]>  # 1
$letterForms{0x0662} = $letterForms{0x06F2} = @S<[\x{662}\x{6F2}]>  # 2
$letterForms{0x0663} = $letterForms{0x06F3} = @S<[\x{663}\x{6F3}]>  # 3
$letterForms{0x0664} = $letterForms{0x06F4} = @S<[\x{664}\x{6F4}]>  # 4
$letterForms{0x0665} = $letterForms{0x06F5} = @S<[\x{665}\x{6F5}]>  # 5
$letterForms{0x0666} = $letterForms{0x06F6} = @S<[\x{666}\x{6F6}]>  # 6
$letterForms{0x0667} = $letterForms{0x06F7} = @S<[\x{667}\x{6F7}]>  # 7
$letterForms{0x0668} = $letterForms{0x06F8} = @S<[\x{668}\x{6F8}]>  # 8
$letterForms{0x0669} = $letterForms{0x06F9} = @S<[\x{669}\x{6F9}]>  # 9

$range = Range.new 0, $find.length
$find.transliterateInRange $range, $letterForms
# Transform $find so that it works for letter form variants above defined
$replace = Read Replace Expression
# Get Replace expression from the Find panel
Find and Replace $find, $replace, '*E-u!'  # PowerFind Pro
# Copy $find and $replace to the Find panel and change Find mode to Power Find Pro

### end of macro

ATOM RSS1 RSS2