MACSCRPT Archives

February 2006

MACSCRPT@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Sander Tekelenburg <[log in to unmask]>
Reply To:
Macintosh Scripting Systems <[log in to unmask]>
Date:
Thu, 16 Feb 2006 18:08:43 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (49 lines)
At 09:37 -0500 UTC, on 2006-02-16, Bill Steele wrote:

[...]

> It's the text. OS 10.3.9.  After the HTML conversion all the
> manipulations--mostly concatenation--are done inside Applescript.
> Finally the result is written to a file.  I open the file in
> Text-Edit and get (what seems to be) Unicode.  When I open another
> file -- supposedly created the same  way -- plain text..

TextEdit can be told specifically what encoding to use when opening files. It
also has an "Automatic" setting, which I assume uses 'some logic' to
determine which encoding to apply'. AFAIK files do not themselves contain
meta information that authoratively states what encoding applies (although a
BOM can help). Thus you must know what character repertoire you are using
when you manipulate, store and read it.

It sounds to me like you have TextEdit use its "automatic" setting to decide
which encoding to apply when opening files, and only one of your files
contains non-ASCII, leading TextEdit to treat it as UTF-16 (or perhaps UTF-8).

In itself, I don't see anything wrong or problematic with that, but perhaps I
misunderstand the problem.


Btw, if you're serving HTML containing Unicode, make sure to explicitly write
your HTML files as UTF-8 (and of course serve them as such), not UTF-16,
which is the default when writing Unicode to file from within AS.

[...]

> Yes, Tex-Edit's "strip high ASCII characters" works.  But later on
> the script reads the file and uses parts of it for something else,
> and the problem shows up there. I'd rather avoid the extra step of
> opening in Tex-Edit (or BBEdit) and processing it.

There are 2 useful tools to work with character encoding (and escaping)
through AS:
- UnicodeChecker: <http://www.earthlingsoft.net/UnicodeChecker/index.html>
- has' TextCommands:
<http://freespace.virgin.net/hamish.sanderson/#textcommands>


HTH


-- 
Sander Tekelenburg, <http://www.euronet.nl/~tekelenb/>

ATOM RSS1 RSS2