NISUS Archives

May 2010

NISUS@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Date:
Sat, 1 May 2010 20:38:56 +0900
Content-Type:
text/plain
Parts/Attachments:
text/plain (84 lines)
On May 1, 2010, at 6:00 PM, Hamid Haji wrote:

> On 29.04.2010, at 15:37, Kino wrote:
> 
>> <http://www2.odn.ne.jp/alt-quinon/files/NWPro/file/OpenTextFiles_nwm.zip>

> Where does one install the perl module Encode::Detect?

As far as Arabic encodings are concerned, the installation of Encode::Detect will not improve the functionality of the macro. This module, based on Mozilla Universal Charset Detector, does not support Arabic encodings.
<http://www.mozilla.org/projects/intl/chardet.html>

Anyway, if you have Xcode Tools installed, ...

1. Download Encode-Detect-1.01.tar.gz from
   <http://search.cpan.org/~jgmyers/Encode-Detect-1.01/>;

2. Unzip it;

3. Open Terminal and change the current directory to "Encode-Detect-1.01";

4. Run the following commands one by one:

perl Makefile.PL
make
make test
sudo make install

> Unlike in other Cocoa apps, there is no possibility in NWP to customise Plain Text Encoding in the Open dialog. Also, Arabic (Windows) encoding is missing in the current list of NWP Plain Text Encoding.

I just don't understand why they stick to their own encoding menu. Apple's Encoding panel is simply better.

> Hence, Arabic (Windows) text files appear garbled when opened in NWP. I have to open such text files in another app, save them in UTF-8 and then open them in NWP. Installing the perl module Encode::Detect will help to overcome this shortcoming.

As I wrote above Encode::Detect does not help. Try the macro below instead.
<http://www2.odn.ne.jp/alt-quinon/files/NWPro/arabic/OpenArabicTextFiles_nwm.zip>

MacArabic filter of Encode module is buggy and does not translate some Latin characters properly but I think you don't use the macro to open MacArabic files. Seven or eight years ago, I reported the bug to the author. He could not fix it completely, IIRC, because of the structural limitations of the module but I don't remember well.


Kino

--
### Open Arabic Text Files ###

# Open Arabic plain text files with encoding detection.

# The most probable encoding is chosen from among UTF-8, ISO-8859-6, Windows CP-1256 and MacArabic.

$paths = Choose Files '~/Documents', 'Select Arabic Text Files'
if $paths == undefined
	exit  # cancelled
end

Set Exported Perl Variables 'data', 'path'

foreach $path in $paths
	$data = ''
	begin Perl
		use Encode;
		local $/;
		open FILE, "<:raw", $path;
		$str = <FILE>;
		close FILE;
		eval { $data = decode("utf8", $str, Encode::FB_CROAK) };
		if ( $@ ) {  # if failed to decode from UTF-8
			eval { $data = decode("iso-8859-6", $str, Encode::FB_CROAK) };
			if ( $@ ) {  # if failed to decode from ISO-8859-6
				$data = decode("cp1256", $str);  # decode from Windows Arabic
				if ($data =~ /\p{Arabic}[\xE2\xE8\xEA\xEB]\p{Arabic}/ ) {  # if odd character found between Arabic letters
					$data = decode("MacArabic", $str);  # decode from MacArabic (buggy)
					$data =~ tr/\x{FFFD}/\x20/;  # try to work around the bug
				}
			}
		}
		$data = decode_utf8($data);
	end
	$doc = Document.new false
	$doc.clearAndDisableUndoHistory
	$range = Range.new 0, $doc.text.length
	$doc.text.replaceInRange $range, $data
end

### end of macro ###

ATOM RSS1 RSS2