NISUS Archives

May 2010

NISUS@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Date:
Sun, 2 May 2010 01:03:19 +0900
Content-Type:
text/plain
Parts/Attachments:
text/plain (27 lines)
On May 1, 2010, at 11:07 PM, Hamid Haji wrote:

> On Sat, 1 May 2010 22:50:26 +0900, Kino <[log in to unmask]> wrote:
> [...]
>> curiouser and curiouser... What makes the difference? It is not the first time the Perl module treated Arabic files as encoded in MacCyrillic. I got the same result some years ago when I tried the module for the first time on my old PPC Mac. The same result with SubEthaEdit too, which uses an encoding detector based on the same Mozilla code. Then, perhaps SubEthaEdit detects Arabic encodings properly for you.

> SubEthaEdit 3.5.2 fails to detect Arabic encodings:

Same results here. Perhaps some misunderstanding. The presence of Encode::Detect will improve the function of my older macro "Open Text Files" <http://www2.odn.ne.jp/alt-quinon/files/NWPro/file/OpenTextFiles_nwm.zip> (e.g. CJK encodings support) but does not change the behaviour of the newer macro "Open Arabic Text Files" in any way as it does not call the Perl module.

And I could not find any code for Arabic encodings in the source files of Encode::Detect. Perhaps did you run "Open Arabic Text Files" macro instead of "Open Text Files" macro to open the Arabic file accidentally? Their names are so similar.

Anyway, I updated "Open Arabic Text Files" macro so that it uses Apple's converter to decode from MacArabic, which is not affected by the Perl bug mentioned in my previous posting.
<http://www2.odn.ne.jp/alt-quinon/files/NWPro/arabic/OpenArabicTextFiles_nwm.zip>


Kino

--

-	$data = decode("MacArabic", $str);  # decode from MacArabic (buggy)
-	$data =~ tr/\x{FFFD}/\x20/;  # try to work around the bug

+	$data = `textutil -convert txt -inputencoding x-mac-arabic -stdout "$path" 2>/dev/null`;  # decode from MacArabic
+	$data = decode_utf8($data);  # let perl know it is UTF-8 strings
+	$data =~ s/[\x{200E}\x{200F}\x{202A}-\x{202E}]+//g;  # remove directional markers inserted by textutil

ATOM RSS1 RSS2