LISTSERV - MACSCRPT Archives - LISTSERV.DARTMOUTH.EDU

MACSCRPT Archives

August 2007

MACSCRPT@LISTSERV.DARTMOUTH.EDU

	LISTSERV Archives
	MACSCRPT Home
	MACSCRPT August 2007

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Is it possible to change AppleScript's default text encoding?
From:	David Livesay <[log in to unmask]>
Reply To:	Macintosh Scripting Systems <[log in to unmask]>
Date:	Sat, 18 Aug 2007 09:44:19 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (31 lines)

On Aug 18, 2007, at 4:11 AM, Emmanuel wrote:

> The unfortunate thing is that no default behavior produces UTF-8 -  
> you *have* to specify "as «class utf8»" - which is sad because 1/  
> UTF-8 is basically a superset of ISO 8859, so most often you can  
> safely read an ISO file pretending it is UTF-8, 2/ as has been said  
> here most UNIX tools use UTF-8 as their output encoding.

I think what was happening in my case is that the script was coercing  
UTF-8 output from runpsynch to UTF-16. Personally, I don't think this  
is a good default behavior, especially since AppleScript doesn't  
attach any metadata to the files it creates with the file read/write  
commands. Yet, for some reason, when opened them in TextEdit, it  
correctly guessed their text encoding, while BBEdit, which is usually  
pretty good at guessing a file's text encoding, opened them as the  
default encoding.

This was another surprise. You'd think UTF-16 would be the easiest  
encoding to get from the file, especially if it has a BOM. These  
files have no BOM, but still, the first byte is 00, and every other  
character thereafter is 00. Wouldn't you think that was a pretty good  
indication of 16 byteness as well as bigendedness? I can't fathom how  
BBEdit could look at that file in HEX and go, "hmm... I wonder what  
character encoding THIS is." While I'm sure you could construct a  
case in which this would lead to a bad guess, I don't see why they  
don't use it.

I also don't see why Apple doesn't say they support UTF-16BE instead  
of UTF-16. If you don't prepend a BOM, you're not just assuming  
bigendedness, you're assuming everybody else does, and that's what  
UTF-16BE means.

ATOM RSS1 RSS2

LISTSERV.DARTMOUTH.EDU