Scripting question

Thu Sep 13 12:33:13 PDT 2007

> On 9/13/07, Jerry McAllister <jerrymc at msu.edu> wrote:
>> > The only space is the one separating the SMTP address from the OK or
>> NO.
>>
>> Then you should be able to tell it to sort on the first token in
>> the string with white space as a separator and to eliminate
>> duplicates.   It has been a long time since I had need of sort. I
>> don't remember the arguments/flags but am sure that type of thing can be
>> done.
>>
>> ////jerry
>
> Ya know, it's really easy to get wrapped around the axle on this stuff.
>
> I think I may have a better solution. The file I'm trying to massage
> has a predecessor - the non-unique lines are the result of a
> concatenation of two files.
>
> Silly me, it's better to 'grep -v' with the one file vs. the second
> rather than trying to merge, sort and further massage the result. The
> fix will be to use sed against the first file to remove the ' NO',
> thus providing a clean argument for grepping the other file.
>
> Sigh.
>
> Kurt

It sounds like you've found your solution, but how about the below shell
script?  Probably woefully inefficient, but should work.

- Craig

########### begin script ##############
#!/bin/sh
# Read in an input list of 2 column data pairs and output the pairs where
the first columns are unique.

INPUT_FILE="list.txt"
OUTPUT_FILE="new_list.txt"
NON_UNIQ_LIST=""

for NON_UNIQ in `cat $INPUT_FILE | awk '{print $1}' | sort | uniq -c |
grep -vE '^ *1' | awk '{print $2}'`
do
	NON_UNIQ_LIST=$NON_UNIQ_LIST"|"$NON_UNIQ
done

NON_UNIQ_LIST=`echo $NON_UNIQ_LIST | sed 's/^.//'`

cat $INPUT_FILE | grep -vE $NON_UNIQ_LIST > $OUTPUT_FILE
########### end script ##############