awk help

Tue Apr 18 00:19:37 UTC 2017

On Mon, 17 Apr 2017 18:17:14 -0400, Ernie Luzar wrote:
> In general I am experimenting with ipfilter ippools {IE; in-core 
> I have written a csh "process hits" script that takes 5+ minutes to 
> process that report. I have seen awk used in some public scripts but I 
> have never used it before. I wanted to learn awk and though it would be 
> a good idea to rewrite my csh "process hits" script in awk. To have a 
> fair comparison I needed the awk version to do the rm & touch on the 
> files that the csh version does.

Allow me a short side note:

What you've written (and presented to the list) is not a csh script.
It's a sh script. FreeBSD's default dialog shell is csh, the C shell,
but the default scripting shell is sh, a "kind of" Bourne shell.

Using sh for scripting is something like an "industry standard".
Nobody likes to write scripts for the C shell. :-)

> Its obvious that awk is far superior in performance over native csh 
> programming.

It is. The awk scripting language is intended for text processing,
pattern matching, output manipulation and "text-related" programming,
while sh (not csh!) is much better for "general" programming, and of
course as "all purpose programming glue". :-)

> I have another csh script to expire records from the master file that 
> runs a long time. The csh script follows;
> 
>    # The following logic removes expired records
>    for line in `cat $temp_master_db`; do
>      ip=`echo -n $line | cut -w -f 2`
>      date=`echo -n $line | cut -w -f 1`
> 
>      if [ "$on_one" = "YES" ]; then
>        on_one="NO"
>        previous_ip="$ip"
>        previous_date="$date"
>        continue
>      fi
> 
>      if [ "$ip" != "$previous_ip" ]; then
> 
>        if [ $previous_date -le $expire_date ]; then
>          # Drop the record from the master db file as expired.
>          previous_ip="$ip"
>          previous_date="$date"
>          continue
>        else
>          db_rec="$previous_date   $previous_ip"
>          echo "${db_rec}" >> $master_db_new
>          previous_ip="$ip"
>          previous_date="$date"
>        fi
>      else
>        # Here current ip and previous_ip are the same.
>        # Check if expired.
>        if [ $previous_date -le $expire_date ]; then
>          # Drop the record from the master db file as expired.
>          previous_ip="$ip"
>          previous_date="$date"
>          continue
>        fi
>        if [ $previous_date -le $date ]; then
>          # Drop the record from the master db file as expired.
>          previous_ip="$ip"
>          previous_date="$date"
>          continue
>        fi
> 
>        db_rec="$previous_date  $previous_ip"
>        echo "${db_rec}" >> $master_db_new
>        previous_ip="$ip"
>        previous_date="$date"
> 
>      fi
>    done
> 
>    # At EOF, must still process previous record.
>    if [ $previous_date -le $expire_date ]; then
>      db_rec="$previous_date  $previous_ip"
>      echo "${db_rec}" >> $master_db_new
>    fi
> 
> 
> Is there some standard awk model to achieve this previous-save logic?

>From quickly reading that code, it should be possible to re-implement
this with awk. I'm currently not aware of a "pattern name" of what
you're trying to accomplish, but should be able to "translate" the
sh code into awk code.

> Also can a csh $variable be used inside of an awk program?

No directly. A sh (not csh!) variable is prefixed by $, but the
awk program is typically enclosed in single quotes which prohibit
the normal function of $FOO or ${FOO}; awk uses $ itself, for
example as field identifiers like $0, $1, $2 and so on.

If you'd have _no_ $ in your awk code, you could probably do
something like this:

	#!/bin/sh
	FOO=100
	awk "BEGIN { print $FOO }"

But of course, now you'll get problems using double quotes in awk.

However, there is (at least) a way to deal with this problem: Prefix
the data you're going to process with "special lines", let's say
they start with #, a name (the "variable name", a =, and the "value".
You can easily generate this as a temporary file from your "glue"
script.

Example:

#!/bin/sh

# variables and values
FOO="100"
BAR="123.456.789.0"

# file names
CONFIGFILE="/tmp/config.tmp"
DATA_IN="ip_in.txt"
DATA_OUT="ip_out.txt"

echo "#FOO=${FOO}" >  ${CONFIGFILE}
echo "#BAR=${BAR}" >> ${CONFIGFILE}

cat ${CONFIGFILE} ${DATA_IN} | awk -F "=" '

/^#[A-Z]/ {
        if ($1 == "#FOO")
                foo = $2
        if ($1 == "#BAR")
                bar = $2
}

/Address/ {
	...
	# something that uses foo
}

/Hits/ {
	...
	# something else that uses bar
}
' > ${DATA_OUT}

rm ${CONFIGFILE}

In case you want to "filter out" those "special lines", you can for
example use | grep -v "^#" | in your processing pipeline.

Another option would be a "search and replace" mechanism that
modifies the awk program itself. That can be done with awk or sed
(NB: sed, the stream editor, is one of the most convenient ways
to do a "search and replace" operation: | sed "s/from/to/g" | in
your pipeline. As you see the " quotes, using shell variables
is no problem here.

Let's say your awk script has two "placeholders" called FOO and
BAR (make sure they're unique!). You simply replace them with
the values present in the sh "glue".

Example:

#!/bin/sh

# variables and values
FOO="100"
BAR="123.456.789.0"

# file names
DATA_IN="ip_in.txt"
DATA_OUT="ip_out.txt"
SCRIPT_ORIG="process_ip_orig.awk"
SCRIPT_MOD="process_ip.awk"

sed "s/FOO/${FOO}/g; s/BAR/${BAR}/g" < ${SCRIPT_ORIG} > ${SCRIPT_MOD}

cat ${DATA_IN} | awk -f ${SCRIPT_MOD} > ${DATA_OUT}

rm ${SCRIPT_MOD}

NB: Useless use of cat. :-)

I'm sure there are several other ways of doing this, but maybe those
two examples can help or at least inspire you. :-)

-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...