MACSCRPT Archives

November 2009

MACSCRPT@LISTSERV.DARTMOUTH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Mark J. Reed" <[log in to unmask]>
Reply To:
Macintosh Scripting Systems <[log in to unmask]>
Date:
Tue, 3 Nov 2009 20:03:53 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (110 lines)
On the shell side, that's more an awk task than a grep task.  This works:

paragraphs of (do shell script "awk -F'|' '{c6=(c6 ? c6\"|\" :
\"\")$6; c7=(c7 ? c7\"|\" : \"\")$7} END {print c6; print c7}' " &
(quoted form of POSIX path of myFile))

Grep is good for finding matching lines (or parts of lines), but not
so much for putting the matches together in a different shape.

Admittedly, awk can look like line noise.  Brief explanation of the
bits used above:

awk '{ some code }' runs 'some code' on each input line.  Within the
block, the variable $0 is set to the text of the current line, and $1,
$2, etc are set to the space-separated words ("fields") of the line.

awk -F'|' '{ some code }'  uses pipes instead of space to split each
line into fields.

awk '{some code} END {other code}' runs 'some code' for each line and
'other code' once at the end of the file.

String concatenation is done by just sticking the strings together, no
& or similar operator. So

c6 = c6"|"$6

means "append a literal pipe and the contents of field 6 to the end of
the variable c6".  The stuff in my working code above is a little
fancier than that to avoid an extra pipe at the beginning of the
output; it uses the C ternary expression (a ? b : c) which has value b
if a is true, c if a is false.  So (c6 ? c6"|" : "") means "if c6 is
not empty, return it plus a pipe; otherwise return the empty string").
.
And of course print outputs its argument, followed by a newline.
(There's also printf() for finer control over what is printed.)

You could also use Perl in "awk mode" (technically "autosplit mode").
It's a little tricker since its separator is a regular expression
instead of a character string, so you have to supply an extra
backslash to use a literal pipe, which should be familiar from your
earlier grep/egrep outings.  It also numbers the fields (which are in
an array named @F) from 0 instead of 1 (using $_ in place of awk's
$0).  But it does have a convenient join() function to put the field
lists back together with pipes without the conditional appending.  It
looks something like this:

paragraphs of (do shell script "perl -F'\\|' -lane 'push(@f5,$F[5]);
push(@f6,$F[6]); END { print join(\"|\",@f5); print join(\"|\",@f6);
}' " & quoted form of posix path of myFile)

And of course at this point you could reach for any of the other
scripting languages as well, but you'd have to code the looping and
splitting yourself for most of them.

On Tue, Nov 3, 2009 at 5:22 PM, [log in to unmask] <[log in to unmask]> wrote:
> Yes, that would be perfect.
>
> On Nov 3, 2009, at 1:29pm, Mark J. Reed wrote:
>
>> I'm not clear what you're asking.  Given these lines:
>>
>> a|b|c|d|e|row1col6|row1col7|rest
>> z|y|x|w|v|row2col6|row2col7|rest
>> ...
>>
>> What do you want as output?  Something like this?
>>
>> row1col6|row2col6|row3col6|...
>> row1col7|row2col7|row3col7|...
>>
>>
>> On Tue, Nov 3, 2009 at 4:08 PM, [log in to unmask] <[log in to unmask]>
>> wrote:
>>>
>>> I need some quick GREP advice.
>>>
>>> I'm working with a pipe delimited data file in this format:
>>>
>>>
>>> 114|20090826|00:00|N|1800|8005082|6719954|TVPG|L||CC|Stereo|N|Color||N|N|N|4:3
>>> Fullscreen|N||||Y
>>>
>>> The 6th and 7th items in the each row of data (|8005082|6719954|, from
>>> the
>>> example) are references to data in other files.
>>>
>>> So what I need is a grep command that will give me 2 lists, one
>>> containing
>>> the 6th number from each row and one containing the 7th.
>>>
>>> The lists could be pipe, comma, tab or return delimited
>>>
>>> I'm fairly sure this can be done via GREP, but I don't know where to
>>> start.
>>>
>>> Any suggestions?
>>>
>>
>>
>>
>> --
>> Mark J. Reed <[log in to unmask]>
>



-- 
Mark J. Reed <[log in to unmask]>

ATOM RSS1 RSS2