lzma compression/decompression in bsdtar/libarchive?

bf bf2006a at yahoo.com
Tue Nov 25 12:51:33 PST 2008




--- On Tue, 11/25/08, Ivan Voras <ivoras at freebsd.org> wrote:

> From: Ivan Voras <ivoras at freebsd.org>
> Subject: Re: lzma compression/decompression in bsdtar/libarchive?
> To: bf2006a at yahoo.com
> Cc: freebsd-hackers at freebsd.org
> Date: Tuesday, November 25, 2008, 2:41 PM
> 2008/11/25 bf <bf2006a at yahoo.com>:
> >> How useful would LZMA be without supporting the
> .7z file format?
> >> Probably not at all, since there isn't a
> gzip-like file format or
> >> wrapper that supports LZMA.
> >
> > ??  Have you looked at this code?  Yes, there is:
> there is an "LZMA
> > compressed file format" and the 7z file format,
> both of which support
> > LZMA. The former format has been widely adopted by
> people who distribute
> > lzma-compressed tarballs, especially GNU-related
> projects that use the
> > lzmautils port.  Some projects, like GNU coreutils, no
> longer distribute
> > the latest versions of their software in
> bzip2-compressed tarballs.
> 
> That's interesting - I've never seen an .lzma file
> "in the wild".
> 
> But there they are:
> http://ftp.gnu.org/gnu/coreutils/
> 
> [   ] coreutils-6.12.tar.gz            01-Jun-2008 05:03 
> 8.6M
> [   ] coreutils-6.12.tar.lzma          01-Jun-2008 05:04 
> 3.6M
> 
> And there's a compressor in ports: archivers/lzma

Yes, a surprising number of projects now give you the option of lzma-
compressed tarballs, and have for months.  When necessary, they rely on tar to preserve some of the file data you were concerned with, then compress the tarball with lzma, and bundle it in the very simple "lzma
compressed file" format, which is roughly:

"LZMA compressed file format
---------------------------
Offset Size Description
  0     1   Special LZMA properties (lc,lp, pb in encoded form)
  1     4   Dictionary size (little endian)
  5     8   Uncompressed size (little endian). -1 means unknown size
 13         Compressed data"

as described in the documentation.  In the end you obtain compression
ratios better than or equal to bzip2 in almost all cases ( usually
substantially better), and decompression speeds closer to that of gzip.
Compression speed is comparable to, but usually slightly slower than,
bzip2.  archivers/lzma was the first widely-used implementation, but
GNU-inspired projects usually recommend the compatible 
archivers/lzmautils fork.  The benefits can clearly be seen when you
compare the size of lzma-compressed tarballs to those using gzip and bzip2.
You can see more examples at many of the GNU projects, Graphicsmagick
and Imagemagick, etc. -- and many of these are using lzma compression
with suboptimal settings.  The other night I archived a subversion
repository of gentoo portage in a 5.5Mb file by using bsdtar and
archivers/lzma.  This repository is normally about 420Mb in size, and
gentoo's lzma-compressed snapshot tarballs are 29Mb in size, so not all
implementations are equal.

Not so long ago (the end of April, this year) someone tried to switch 
ImageMagick to using lzma-compressed tarballs, and caught a lot of flak
from others who were unfamiliar with this form of compression.  If Tim
could integrate it with libarchive, I'm sure that it would be more 
favorably received.

Among the other high-end compression methods, ppmd has attained a stability that would merit support in libarchive, but many of the others are still evolving, or in their present form are too computationally intensive, for diminishing returns, on any but the newest hardware.

Regards,
          b.


      


More information about the freebsd-hackers mailing list