HEADS UP: bzip2(1) compression for manpages, Groff
and Texinfo docs
Brad Knowles
brad.knowles at skynet.be
Fri May 2 13:34:23 PDT 2003
At 7:43 PM +0200 2003/05/02, Matthias Buelow wrote:
> The two programs, however, only do the same thing if you consider
> that they're both compressors. bzip2 eats much more resources than
> gzip, both space and time. And the algorithm is rather overkill for
> small files anyways.
Granted, the space savings is not that much. I took
/usr/share/man/man1 from a 4.6.2-RELEASE box and made three copies of
it under /tmp/man, uncompressed all the files, and then re-compressed
them using `compress`, `gzip -9`, and `bzip2`. Here's the results:
% du * | sort -nr
4646 compress
3624 gzip
3422 bzip2
So, bzip2 is not that much of an improvement over gzip (~6%), but
it is a fair improvement over compress (~35.7%). This is just one
section of the man pages, and does not include the cat pages, but I
figure it's probably fairly representative.
I haven't looked at the stuff under /usr/share/info or
/usr/share/doc. I'm not sure which of those files would be
compressed and which ones wouldn't. These three directories comprise
~82MB of disk space, of which about 15MB is in /usr/share/man and
about 64.6MB in /usr/share/doc. At the moment, it doesn't appear
that the files in /usr/share/doc are compressed at all, so there
might be significant storage savings there.
I built a tarball from the /usr/share/doc hierarchy, and tried
the three different compression programs on it. I know that
compression on a tarball is going to be different from compression on
individual files, but this should at least give us some idea.
Anyway, here's the results:
% ls -1s doc* | sort -nr
64368 doc.tar
22896 doc-compress.tar.Z
16080 doc-gzip.tar.gz
12032 doc-bzip2.tar.bz2
So, bzip2 result in a file about 18.6% of the size of the
original, gzip does about 24.9%, and compress is only 35.5%.
Relatively speaking, bzip2 results in a file that is about 74.8% the
size of the version produced by `gzip -9`.
Seeing as /usr/share/doc and /usr/share/info is not currently
compressed (in 4.6.2-RELEASE), any compression algorithm would be a
significant improvement.
--
Brad Knowles, <brad.knowles at skynet.be>
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
More information about the freebsd-current
mailing list