awk help

Tue Apr 18 00:08:28 UTC 2017


On 17-04-17 06:17 PM, Ernie Luzar wrote:
> Andreas Perstinger wrote:
>> On 2017-04-17 16:11, Ernie Luzar wrote:
>>> When I first tested /^Address/ and /^ Hits/ produced no output. I
>>> changed them to /Address/ and /Hits/ and this produced output. I
>>> could not find any reference to the ^ sign, so I would like to know 
>>> what is it suppose to do?
>>
>> "^" inside a regular expression is an anchor and matches the beginning
>> of the line. (See "man re_format" or e.g.
>> http://www.regular-expressions.info/anchors.html ). In the example
>> you've posted, the lines containing "Address" and "Hits" are indented
>> which means there are spaces/tabs between the beginning of the line and
>> these words. Thus the patterns don't match.
>>
>>> I am not having success using the system commands rm & touch as shown
>>> in the following example.
>>>
>>> awk 'BEGIN { "date +%Y%m%d" | getline date hits_yes =
>>> "/etc/ipf_pool_awk_hits_yes" hits_no = "/etc/ipf_pool_awk_hits_no" rm
>>> hits_yes rm hits_no "touch hits_yes" "touch hits_no" }' $hits_rpt
>>
>> You need to use the built-in function "system" in order to use system
>> commands, e.g.
>>
>> system("rm " hits_yes)
>>
>> This concatenates the literal string "rm " with the content of the awk
>> variable "hits_yes" which results in the string "rm
>> /etc/ipf_pool_awk_hits_yes" and this command is then executed.
>>
>>> I know the date system command is working, but can't figure out how
>>> to code rm & touch to get them to work. Is this even possible?
>>
>> The "date" command works without using the "system" function because it
>> is part of the special syntax for the "getline" function.
>>
>> But I wonder whether you really need to use commands like "rm" and
>> "touch" inside an awk script. What are you trying to accomplish?
>>
>> Bye, Andreas
>
>
> This is what I am trying to accomplish.
>
> In general I am experimenting with ipfilter ippools {IE; in-core 
> tables). I used a ippool command that generates the 2 line record pair 
> report that I posted about in my first post.
>
> I have written a csh "process hits" script that takes 5+ minutes to 
> process that report. I have seen awk used in some public scripts but I 
> have never used it before. I wanted to learn awk and though it would 
> be a good idea to rewrite my csh "process hits" script in awk. To have 
> a fair comparison I needed the awk version to do the rm & touch on the 
> files that the csh version does.
>
> Well to say the least, I was shocked at the run time results. Using 
> the same hits.rpt file as input, the csh script took 5 minutes to 
> complete and the awk script took less than 1 second. They both output 
> the same file of ip address that have a hit count > than zero. These 
> two files have the same size and contain the same number of lines and 
> diff shows no differences between the files.
>
> Its obvious that awk is far superior in performance over native csh 
> programming.
>
> I have another csh script to expire records from the master file that 
> runs a long time. The csh script follows;
>
>   # The following logic removes expired records
>   for line in `cat $temp_master_db`; do
>     ip=`echo -n $line | cut -w -f 2`
>     date=`echo -n $line | cut -w -f 1`
>
>     if [ "$on_one" = "YES" ]; then
>       on_one="NO"
>       previous_ip="$ip"
>       previous_date="$date"
>       continue
>     fi
>
>     if [ "$ip" != "$previous_ip" ]; then
>
>       if [ $previous_date -le $expire_date ]; then
>         # Drop the record from the master db file as expired.
>         previous_ip="$ip"
>         previous_date="$date"
>         continue
>       else
>         db_rec="$previous_date   $previous_ip"
>         echo "${db_rec}" >> $master_db_new
>         previous_ip="$ip"
>         previous_date="$date"
>       fi
>     else
>       # Here current ip and previous_ip are the same.
>       # Check if expired.
>       if [ $previous_date -le $expire_date ]; then
>         # Drop the record from the master db file as expired.
>         previous_ip="$ip"
>         previous_date="$date"
>         continue
>       fi
>       if [ $previous_date -le $date ]; then
>         # Drop the record from the master db file as expired.
>         previous_ip="$ip"
>         previous_date="$date"
>         continue
>       fi
>
>       db_rec="$previous_date  $previous_ip"
>       echo "${db_rec}" >> $master_db_new
>       previous_ip="$ip"
>       previous_date="$date"
>
>     fi
>   done
>
>   # At EOF, must still process previous record.
>   if [ $previous_date -le $expire_date ]; then
>     db_rec="$previous_date  $previous_ip"
>     echo "${db_rec}" >> $master_db_new
>   fi
>
>
> Is there some standard awk model to achieve this previous-save logic?
>
> Also can a csh $variable be used inside of an awk program?
>
> Thanks

That is an amazing difference in performance - I might have expected a 
five to ten times improvement, but not 300+ times.
I don't see anything very time-consuming in the script above. Is it 
possible for you to post the equivalent csh and awk scripts? Either I or 
someone with more experience with csh might be able to spot the problem.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to 
> "freebsd-questions-unsubscribe at freebsd.org"