gzip is faster with -O3

Wed Aug 9 23:06:11 UTC 2006

On Wed, 09 Aug 2006, Nikolas Britton wrote:

> On 8/9/06, Matthias Andree <matthias.andree at gmx.de> wrote:
> >
> >1. gzip isn't usually used to compress incompressible data.
> >
> >2. use "time" to figure out how much CPU time it actually burns.
> >   5 GB are somewhat I/O bound, but gcc options don't help with that, so
> >   CPU time is better than wallclock time.
> >
> 
> dd if=/dev/zero of=testfile bs=1m count=5000

That's the other extreme. Reading /dev/zero bears no surprises and has
thus no entropy, i. e. no information - that's also a case where you
don't usually use gzip because the dd command is probably shorter than
the gzipped file :-)

The truth is in between, a tarball of /usr/doc, /usr/src and /usr/ports
(without distfiles and packages) is quite compressible, or a tarball of
/usr/local if you've got some interesting ports installed - many of
them. While I wouldn't claim that were typical gzip fodder, it's at
least not degenerated either way. Degenerated input (zeros or truly
random data) usually triggers worst-case behavior which, albeit
necessary to know, is less interesting to the average user.

If someone found a faster bzip2 compression algorithm or pull good
levers to tune it, that would perhaps be interesting :-)

> gzip comiled with -O3
> # time nice -10 ./gzip -c9 testfile > /dev/null
> 73.187u 8.682s 2:08.41 63.7%    70+617k 40161+0io 0pf+0w
> 
> gzip compiled with -O2
> # time nice -10 ./gzip -c9 testfile > /dev/null
> 61.183u 8.468s 2:00.14 57.9%    58+609k 40162+0io 0pf+0w

If you want a fast compressor, install lzo2, then lzop and use the latter :-)

> Now... what do all of those numbers mean, I've never used time
> before... thanks for the tip btw?

You're welcome.

What you've got looks like the (t)csh time output, which is documented
in the csh(1) manual page - and it documents (what I suspected but
didn't utter) that -O3 isn't faster in all circumstances.

To compare file sizes, run "size gzip" which prints actual code size.

There are cases when code locality can win over unrolling, for instance.

Anyways, to answer your question (which is buried in the csh(1) man
page in a language that is hard to understand for people unfamiliar with
the C programming language) are, in order of appearance:

1 user time in seconds (CPU time spent in the program)

2 system time in s (CPU time spent in the kernel, I/O overhead and such)

3 wallclock time - roughly 2 min (elapsed time without regard to the CPU use)

  - The difference from 3 - (1 + 2) is waiting for hardware or spent in
    concurrently running processes

4 average CPU use (how much of the elapsed time was gzip allowed
                 to have the CPU)

  - This should be (1+2) * 100% / 3

5 shared/unshared memory in kByte

6 input and output operations

7 major page faults and voluntary wait operations

Different shells (bash for instance) cause different output, as does
running the external time utility explicitly, for instance, by running
"\time", "command time" or "/path/to/time".

Kind regards,

-- 
Matthias Andree