MACSCRPT Archives

July 2009

MACSCRPT@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Mark J. Reed" <[log in to unmask]>
Reply To:
Macintosh Scripting Systems <[log in to unmask]>
Date:
Thu, 30 Jul 2009 10:22:30 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (86 lines)
Sounds like a bug, if the aslg says its supposed to be counting
grapheme clusters regardless of normalization form.  But that spec
also doesn't match the behavior of things like "id of", which will
return either 233 or {101,769} for "é" depending on the normalization
form used.

On 7/30/09, Emmanuel LEVY <[log in to unmask]> wrote:
> Peter, did you try Satimage.osax' equivalent for offset, that is find
> text?
>
> Also, you may be interested in the following excerpt from
> Satimage.osax' dictionary:
>
>> normalize unicode v : normalize Unicode text (canonical composition
>> or decomposition)
>>
>> normalize unicode string
>> [decomposition boolean] : want canonical decomposition. default:
>> false. For example, HFS Plus converts all file names to decomposed
>> Unicode, while Macintosh keyboards generally produce precomposed
>> Unicode.
>
>
> Best,
> Emmanuel
>
> On Jul 30, 2009, at 9:34 AM, Peter J. Hartmann wrote:
>
>> Somewhere in the more recent OS updates "offset of" seems to have
>> gone broken (I'm running 10.5.7, PPC). Some of my scripts seem to
>> exhibit the following problem recently.
>>
>> Try this:
>> - In ScriptEditor create the following new script:
>>
>> offset of "_" in ""
>>
>> - In the Finder, create a new empty folder or file with a name
>> containing a character with a diacritical and a trailing underscore.
>> - Copy this file name and paste it between the empty quotation marks
>> in your script.
>> - Check the return value: every diacritical is counted as extra
>> character, so the offset is off by one for each.
>> - To prove it replace the characters with diacriticals by their
>> standard forms. Now the result is correct.
>>
>> The Apple Script Language guides on p. 144 states
>>
>> offset compares text as the equals operator does, including
>> considering and ignoring conditions.
>> The values returned are counted the same way character elements of
>> text are counted — for example,
>> offset of "c" in "école" is always 2, regardless of whether "école"
>> is in Normalization Form
>> C or D.
>>
>> This obviously does not describe reality.
>>
>> You won't see this if you directly type a string with diacriticals
>> etc. into ScriptEditor.
>> The problem is that the files system saves
>>
>> Sjögren
>>
>> as
>>
>> Sjo¨gren
>>
>> internally and that there is no way currently to normalize these
>> strings coming from the FS via AS. I know I can do it in Perl and
>> Ruby 1.9. Ignoring diacriticals likewise does not help.
>>
>> Count characters, however, yields correct results.
>>
>> Or am I missing something here?
>>
>> ___ Peter Hartmann ________
>>
>> mailto:[log in to unmask]
>

-- 
Sent from my mobile device

Mark J. Reed <[log in to unmask]>

ATOM RSS1 RSS2