On Nov 17, 2006, at 15:54, Paul Berkowitz wrote:
> Perhaps some of you can help out here, or direct me to somewhere I
> might
> find an answer, which will probably be in the shell script area.
>
Paul;
I do not think you need shell.
I believe what I propose is a variation of Emmanuel's method, which I
could not fully remember.
this is the script and below it is the time for a list of 2000 items
(NOTE: the script uses Smile, "chrono" will not compile without Smile)
-- this is texto: 2000 words from 1 to 2000.
set texto to "1"
repeat with j from 2 to 2000
set texto to texto & " " & j
end repeat
chrono
set l to every word of texto
ASTID(return)
set l to l as text
ASTID("1234")
set ind to (count of paragraphs in text item 1 of l)
set ch to chrono
ASTID("")
{ind, ch}
-- {858, 0.173509}
-- {1, 0.127912}
-- {1234, 0.12394}
-- {1234, 0.124827}
-- {1234, 0.123216}
-- {1234, 0.123345}
--- here texto is from 1 to 6000
-- {1234, 0.378821}
-- {4234, 0.451657}
-- {4234, 0.413229}
-- {4234, 0.384776}
on ASTID(k)
set AppleScript's text item delimiters to k
end ASTID
> I'm looking for a fast (non-repeat loop) way of returning the index
> of a
> list item given the item. (Yes, I have a very useful repeat loop
> for doing
> this myself, but it's far too slow for the purposes for which I
> need it
> here.) LNS's List & Record Tools osax has a great command
> 'difference' that
> will get me (as list3) the items of a longer list2 that are not in
> a shorter
> list1, but it can't get me the indices in list2 of these (list3)
> items.
> (Akua back in OS 8 could do this.)
>
> Once upon a time I was given an excellent perl script that could do
> the same
> thing, absurdly fast, using perl's grep which can work with entire
> lists of
> search items at once. Here it is. (It's in a form that can also be
> used to
> look for an intersection of the lists rather than the difference
> between
> them):
>
>
> set list3 to my perlScript(list1, list2, "!")
>
> on perlScript(list1, list2, bool) -- find which items of list1 are
> not in
> list2 if bool = "!", which items of list 1 _are_ in list2 if bool = ""
>
> set lf to (ASCII character 10)
>
> set AppleScript's text item delimiters to {", "}
> set list1 to list1 as string
> set list2 to list2 as string
> set AppleScript's text item delimiters to {lf}
> set scpt to paragraphs of ("perl -e '
> @list1 = (" & list1 & ");
> @list2 = (" & list2 & ");
>
> @check{@list2} = ();
> @list3= grep " & bool & "exists $check{$_}, @list1;
> print \"@list3\\n\";
> '") as string
>
> set list3 to do shell script scpt
>
> set AppleScript's text item delimiters to {" "}
> set list3 to text items of list3
> set AppleScript's text item delimiters to {""}
>
> return list3
>
> end perlScript
>
>
>
> Now I was hoping to be able to use a variation of this to get the line
> numbers (i.e. indices) within list2. Regular grep(1) has an -n
> option that
> prefixes the line number to the line of found items. That would do
> fine. (Or
> a method that returned just the indices.) But it doesn't work with
> perl's
> grep.
>
> Maybe there's a way to do it? Or a way to use regular grep -n in a
> some sort
> of 'for' loop using each item of list1 in sequence? That sounds
> most likely
> - how would I do that? (Regular grep seems to be a single-line
> command.) I
> do not want to do this in an AppleScript repeat loop, but maybe
> some other
> shell script can work on each line of the textified list1 (where
> each list
> item is now its own line in a text string), piping each to grep -n?
> How
> would I do that?
>
> I also have a Python version of the same routine, using .split() in
> a 'for I
> in list1' loop, which is [almost] quick enough I suppose, but once
> again I
> don't know how to retrieve the index of the found item in list2 (I've
> checked Python books to no avail) although I'd guess that's
> possible too.
>
> Thanks for any help. Maybe someone even knows another list osax
> that does
> it? (I've checked has's TextCommands too.)
>
> --
> Paul Berkowitz
You did not put the times for your script, but I believe under 1/2
second for 6000 items should suffice as fast, if we are not talking
about supercomputing.
Deivy
|