MACSCRPT Archives

January 2008

MACSCRPT@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Macintosh Scripting Systems <[log in to unmask]>
Date:
Mon, 21 Jan 2008 16:04:40 +1100
Content-Type:
text/plain
Parts/Attachments:
text/plain (143 lines)
Hi all,

Short question:

I want to parse an arbitrary string using a known structure, but  
respecting (not parsing within) quotes and comments. What's a good way  
to do this?

Short example:

Say I have this string:

fn1('quoted string containing a bracket)', fn2(3), "something, else")  
+ 5

and I want to parse it according to opening and closing brackets, so  
what I want is to get 5 parts back:

fn1
(
'quoted string containing a bracket)', fn2(3), "something, else"
)
+ 5

Long example:

Say I have some structured express like:

max( 1, 2, fn1('quoted string containing a bracket)', fn2(3),  
"something, else"))

which I want to parse by nested brackets and commas until nothing left  
to parse. In practice I have other structures to parse but this will  
do for the example.

So I'd:

1. first parse out the brackets to get:

max
(
	1, 2, fn1('quoted string containing a bracket)', fn2(3), "something,  
else")
)

2. Then I want to parse by commas to get:

max
(
		1
	,	2
	,	fn1('quoted string containing a bracket)', fn2(3), "something, else")
)

3. and then parse by brackets again:

max
(
		1
	,	2
	,	fn1
		(
			'quoted string containing a bracket)', fn2(3), "something, else"
		)
)

4. then commas:

max
(
		1
	,	2
	,	fn1
		(
				'quoted string containing a bracket)'
			,	fn2(3)
			,	"something, else"
		)
)

5. Then brackets again:

max
(
		1
	,	2
	,	fn1
		(
				'quoted string containing a bracket)'
			,	fn2
				(
					3
				)
			,	"something, else"
		)
)

and that's it. Note that the process respects text within quotes (" or  
') as indivisible and so doesn't parse the commas or brackets out of it.

As an AppleScript nested list it looks like this:

{	"max"
,	"("
,	{	"",	1}
,	{	",",	2}
,	{	",",
		{	"fn1"
		,	"("
		,	{	"'quoted string containing a bracket)'"
			,	"\"something, else\""
			,	{	"fn2"
				,	"("
				,	{	"",	3}
				,	")"
				}
			}
		,	")"
		}
	}
,	")"
}

or more compactly written:

{"max", "(", {"", 1}, {",", 2}, {",", {"fn1", "(", {"'quoted string  
containing a bracket)'", "something, else"}, ")"}}, ")"}

I've already achieved this using AppleScript code, and it works near  
perfectly. But that section of my AppleScript code is too slow. It  
grew over time as a prototype and I always intended to replace it.

So, how can I achieve the above with something faster, such as some  
clever regex, that I can call from AppleScript? I'm calling from an  
AppleScript script. I have been using do shell to call perl to use  
regex, but that's likely to change to using a scripting addition. But  
neither seems to support recursion, eg (?R).

Any ideas?

Thanks,
Tom

ATOM RSS1 RSS2