sed/awk, instead of Perl

Matthew Seaman m.seaman at infracaninophile.co.uk
Sat Aug 23 09:01:48 UTC 2008


Walt Pawley wrote:
> At 9:59 AM +0200 8/22/08, Oliver Fromme wrote:
> 
>> - The perl command you wrote above is pretty much a sed
>>   command anyway (except you incorrectly used non-portable
>>   regular expression syntax).  Why use perl to execute a
>>   sed command?
> 
> At the risk of beating this to death, I just happened to
> stumble on a real world example of why one might want to use
> Perl for sed-ly stuff. I wanted to pull off the accessor's
> address from each line of an Apache access log file. So, I
> figured after this discussion that sed was the way to go. Then
> I got curious and did the following:
> 
> wump$ ls -l Desktop/klog
> -rw-r--r--  1 wump  1001  52753322 22 Aug 16:37 Desktop/klog
> wump$ time sed "s/ .*//" Desktop/klog > kadr1
> 
> real    0m10.800s
> user    0m10.580s
> sys     0m0.250s
> wump$ time perl -pe 's/ .*//' Desktop/klog > kadr2
> 
> real    0m0.975s
> user    0m0.700s
> sys     0m0.270s
> wump$ cmp kadr1 kadr2
> wump$
> 
> Why disparity in execution speed? Beats me, but my G5's fans
> started to take off running the sed command. I don't think the
> Perl command took long enough to register thermally. Curious.
> 
> FWIW: I did this with an older version of Mac OS X, rather
> FreeBSD so it could easily not show the same results if I moved
> the log file to a FreeBSD box and did it there.

Careful now.  Have you accounted for the effect of the klog file
being cached in VM rather than having to be read afresh from disk?
It makes a very big difference in how fast it is processed.

In order to get meaningful data for this sort of test you should
do a dummy run or two of each command in fairly quick succession,
and then repeat your test runs a number of times and look at the
average and standard deviation of the execution times. You'll often
see "Student's T test" mentioned -- that's a statistical test for
assessing if results calculated from a limited number of samples
represent different underlying distributions.  It sounds horribly
complicated, but nowadays we have computers to do all the difficult
adding up and the result is just a number that tells you how well
your supposition (that command 'a' is faster than command 'b') is
supported by your results.  There's a neat little script somewhere
that will automate that, and even give you an ascii graph output,
but I cannot for the life of me remember what it's called. Sorry.

	Cheers,

	Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.                   7 Priory Courtyard
                                                  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey     Ramsgate
                                                  Kent, CT11 9PW

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20080823/5c4e1836/signature.pgp


More information about the freebsd-questions mailing list