MACSCRPT Archives

February 2006

MACSCRPT@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Dave McGary <[log in to unmask]>
Reply To:
Macintosh Scripting Systems <[log in to unmask]>
Date:
Thu, 16 Feb 2006 02:17:07 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (82 lines)
Hi Bill,

Your message is not extremely clear.

Is the 'Unicode' problem in the name of the files or the text inside
the resulting file?

If it is in the text inside the file, which application are you using
(telling) to manipulate (and save) the file?  And which MacOS are you
using, OSX?  10.4.4?

From your short description, I think you are using AppleScript (and an
unknown application) to process your e-mail files one at a time, and
one of the resulting files is in some non-ASCII encoding.

I tend to think Shane Stanley is right, that the text of the one
e-mail is already in Unicode (or some other encoding), and that you
can not tell because the app you read the file with before the
AppleScript processing displays the text using the proper character
mapping, but the file you are using to look at the file after
processing does not display the text as you expect.

Try opening your e-mail program and copying the text of that one
e-mail that is giving you trouble, then paste it into a document in
one of the applications that is unable to display the text properly. 
If you see similar results, then you know that the source text is
already in some other encoding, and is not being 'translated' by your
AppleScript.

In that case, your problem is to translate the text from Unicode (or
whatever) to ASCII.  Unicode to ASCII is easy to do in many ways.  My
fav is to grab the free (lite) version of BBEdit, and "Zap Gremlins".

It is also possible that you are looking at another issue, a font
problem or some other (non-ASCII, non-Unicode) encoding.  I say this
because Unicode text should look just like ASCII in most situations
(the first byte of all ASCII characters in Unicode is a zero, a null
character in ASCII, which is invisible).

One of the exceptions is Japanese text, which might be in Unicode or
any of a number of other encodings.  Trying to display S-JIS Japanese
in Unicode (or ASCII) would give you Mojibake
(http://en.wikipedia.org/wiki/Mojibake).  Greek, Russian, and other
Eastern European encodings could also cause problems.  Is the problem
file in English?  If it is English, there could still be a character
encoding problem.

If the e-mail is in HTML, it may contain an "encoding" tag, which
allows the e-mail program to display the text properly.  When you
remove the text you want and put it into a seperate file, that HTML
tag will be lost, and the text will no longer be displayed properly.

It is also possible that the application you are using to do the text
manipulation work is losing it's mind, and trashing the text.  We need
more details in order to help you.  Next time, include some of the
junk text (paste it into your reply).  I might be able to learn
something from looking at the data.

Hope this helps...

--Dave


> Subject: Unicode madness
> From: Bill Steele <[log in to unmask]>
> Date: Wed, Feb 15, 2006 at 2:09 PM
> Reply-To: Macintosh Scripting Systems <[log in to unmask]>
> To: [log in to unmask]
>
> I have a script that massages some text, then puts it into a number
> of different files with various headers and footers.
> Simplfied example:
> set page1 to theName & retutrn & theAddress & theText & page1Footer
> set page 2 to otherName & myEmail & theText & page2Footer
> So most of these work fine, but one out of the whole script creates a
> file apparently written in Unicode.  Scrolling through it with an
> arrow key requires two jumps per character, pasting into Word
> produces organized gibberish, and pasting onto a web page results in
> only the first character appearing.
> What is it about that one page...?  Or more to the point, when does
> Applescript automatically turn text into Unicode?

ATOM RSS1 RSS2