LISTSERV - MACSCRPT Archives - LISTSERV.DARTMOUTH.EDU

MACSCRPT Archives

January 2011

MACSCRPT@LISTSERV.DARTMOUTH.EDU

	LISTSERV Archives
	MACSCRPT Home
	MACSCRPT January 2011

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Unicode
From:	John Delacour <[log in to unmask]>
Reply To:	Macintosh Scripting Systems <[log in to unmask]>
Date:	Tue, 11 Jan 2011 17:34:24 +0000
Content-Type:	text/plain
Parts/Attachments:	text/plain (45 lines)

At 01:38 -0800 11/01/2011, Walter Ian Kaye wrote:

>Hey JD, you would probably know this procedure: I see a Unicode
>character on a Web page <http://www.ytn.co.kr/>, in this case a black
>triangle (like "[>"), and want to find out what code point it is. So
>I copied to the clipboard and wrote a script to try and convert it
>using something like this:
>
>item 1 of ((((theU as «class ut16») as string) as record) as list)
>
>and then convert the string to hex, but it didn't match the code
>point in the Character Palette (25B6, or possibly 25BA).
>
>Any ideas?


The best idea is almost cetainly this:

<http://earthlingsoft.net/UnicodeChecker/>

UnicodeChecker is scriptable.  An invaluable tool.

Perl will do it easily as well.


As to getting the individual bytes, this is as far as I have time to 
go at the moment, and it's a lot of typing for very little result:

set _clip to the clipboard
set f to (path to temporary items from user domain as string) & "tmp.txt"
try
   close access file f
end try
open for access file f with write permission
set eof file f to 0
write _clip as «class ut16» to file f
close access file f
set _s to read file f as string
set {_a, _b} to characters 3 through 4 of _s
-- (the first 2 chars are the BOM)
{ASCII number _a, ASCII number _b}
--=> {186, 37}

JD

ATOM RSS1 RSS2

LISTSERV.DARTMOUTH.EDU