awk help

Mon Apr 17 22:16:52 UTC 2017

Andreas Perstinger wrote:
> On 2017-04-17 16:11, Ernie Luzar wrote:
>> When I first tested /^Address/ and /^ Hits/ produced no output. I
>> changed them to /Address/ and /Hits/ and this produced output. I
>> could not find any reference to the ^ sign, so I would like to know 
>> what is it suppose to do?
> 
> "^" inside a regular expression is an anchor and matches the beginning
> of the line. (See "man re_format" or e.g.
> http://www.regular-expressions.info/anchors.html ). In the example
> you've posted, the lines containing "Address" and "Hits" are indented
> which means there are spaces/tabs between the beginning of the line and
> these words. Thus the patterns don't match.
> 
>> I am not having success using the system commands rm & touch as shown
>> in the following example.
>>
>> awk 'BEGIN { "date +%Y%m%d" | getline date hits_yes =
>> "/etc/ipf_pool_awk_hits_yes" hits_no = "/etc/ipf_pool_awk_hits_no" rm
>> hits_yes rm hits_no "touch hits_yes" "touch hits_no" }'   $hits_rpt
> 
> You need to use the built-in function "system" in order to use system
> commands, e.g.
> 
> system("rm " hits_yes)
> 
> This concatenates the literal string "rm " with the content of the awk
> variable "hits_yes" which results in the string "rm
> /etc/ipf_pool_awk_hits_yes" and this command is then executed.
> 
>> I know the date system command is working, but can't figure out how
>> to code rm & touch to get them to work. Is this even possible?
> 
> The "date" command works without using the "system" function because it
> is part of the special syntax for the "getline" function.
> 
> But I wonder whether you really need to use commands like "rm" and
> "touch" inside an awk script. What are you trying to accomplish?
> 
> Bye, Andreas

This is what I am trying to accomplish.

In general I am experimenting with ipfilter ippools {IE; in-core 
tables). I used a ippool command that generates the 2 line record pair 
report that I posted about in my first post.

I have written a csh "process hits" script that takes 5+ minutes to 
process that report. I have seen awk used in some public scripts but I 
have never used it before. I wanted to learn awk and though it would be 
a good idea to rewrite my csh "process hits" script in awk. To have a 
fair comparison I needed the awk version to do the rm & touch on the 
files that the csh version does.

Well to say the least, I was shocked at the run time results. Using the 
same hits.rpt file as input, the csh script took 5 minutes to complete 
and the awk script took less than 1 second. They both output the same 
file of ip address that have a hit count > than zero. These two files 
have the same size and contain the same number of lines and diff shows 
no differences between the files.

Its obvious that awk is far superior in performance over native csh 
programming.

I have another csh script to expire records from the master file that 
runs a long time. The csh script follows;

   # The following logic removes expired records
   for line in `cat $temp_master_db`; do
     ip=`echo -n $line | cut -w -f 2`
     date=`echo -n $line | cut -w -f 1`

     if [ "$on_one" = "YES" ]; then
       on_one="NO"
       previous_ip="$ip"
       previous_date="$date"
       continue
     fi

     if [ "$ip" != "$previous_ip" ]; then

       if [ $previous_date -le $expire_date ]; then
         # Drop the record from the master db file as expired.
         previous_ip="$ip"
         previous_date="$date"
         continue
       else
         db_rec="$previous_date   $previous_ip"
         echo "${db_rec}" >> $master_db_new
         previous_ip="$ip"
         previous_date="$date"
       fi
     else
       # Here current ip and previous_ip are the same.
       # Check if expired.
       if [ $previous_date -le $expire_date ]; then
         # Drop the record from the master db file as expired.
         previous_ip="$ip"
         previous_date="$date"
         continue
       fi
       if [ $previous_date -le $date ]; then
         # Drop the record from the master db file as expired.
         previous_ip="$ip"
         previous_date="$date"
         continue
       fi

       db_rec="$previous_date  $previous_ip"
       echo "${db_rec}" >> $master_db_new
       previous_ip="$ip"
       previous_date="$date"

     fi
   done

   # At EOF, must still process previous record.
   if [ $previous_date -le $expire_date ]; then
     db_rec="$previous_date  $previous_ip"
     echo "${db_rec}" >> $master_db_new
   fi

Is there some standard awk model to achieve this previous-save logic?

Also can a csh $variable be used inside of an awk program?

Thanks