FreeBSD ports USE_XZ critical issue on low-RAM computers

Sun Jun 20 21:04:19 UTC 2010

On 2010-06-20 Ion-Mihai Tetcu wrote:
> Personally I'd suggest keeping the option to limit the memory, but as
> an option, not as default.

OK.

> One thing I would really love to see going away is the default to
> delete the archive on decompression.

Being somewhat compatible with gzip and bzip2 command line syntax is 
useful, so even though I don't disagree with you, the default is and 
will be to delete the input file.

> Generally, I think programs should support both, the later overriding
> the first: .conf -> env -> command line

It means that I will need to create a config file on all my computers 
that have 512 MiB RAM or less to get the behavior I want. Probably other 
users with older computers have to do that too to avoid insanely slow 
compression and unresponsive system when some script runs "xz -9". While 
I would prefer no need for a config file, people like me seem to be in a 
minority, and creating a config file isn't that big deal.

Using a second environment variable would be quite similar. Only the 
place where the setting is put would differ. A config file could allow 
more flexibility though, e.g. it could be possible to even override the 
preset levels with user-defined custom values (at his or her own risk, 
of course).

> At the moment, what are the plans and the advantages of multithreding
> (both on compression and decompression)?

The "only" advantage is that threading makes things faster when there 
are multiple CPU cores to use. Disadvantages of threading:

  - Compression ratio might be worse. It depends on how the
    threading is done. Different ways have their own pros and cons.

  - Memory usage may be a lot be higher for both compression and
    decompression.

The plan is to get some type of threaded compression support into 
liblzma after the 5.0.0 release. Considering my free time etc. I don't 
promise any kind of development schedule.

The API will done so that applications won't need to think about the 
details of threading too much, and can use the zlib-style loop like they 
do in single-threaded mode.

> > Next question could be how to determine how many threads could be
> > OK for multithreaded decompression. It doesn't "fully" parallelize
> > either, and would be possible only in certain situations. There
> > too the memory usage grows quickly when threads are added. To me,
> > a memory usage limit together with a limit on number of threads
> > looks good; with no limits, the decompressor could end up reading
> > the whole file into RAM (and swap). Threaded decompression isn't
> > so important though, so I'm not even sure if I will ever implement
> > it.
> 
> I'd say offer an option if you want.

Sorry, I explained this poorly. Simple number of threads = something is 
not good for threaded decompression. In a generic situation you don't 
know beforehand how much RAM each decompressor thread would use.

If threaded decompression is implemented, maybe the default should be 
one thread just to keep things simple. But there should be an option to 
use optimal number of threads so that the user doesn't need to worry 
about details too much. My idea for that would be to have a user-
specified maximum number of threads and a memory usage limit. Then xz 
would use up to the allowed number of threads as long as the memory 
usage limit is not exceeded. Without a memory usage limit, memory usage 
could grow to insane amounts if there are very many cores.

It's somewhat similar for threaded compression, except that the amount 
of memory needed per thread at the given compression level is known 
before the compression is started. An option to easily tell xz to use 
optimal number of threads would be useful e.g. in scripts, which may be 
used on different computers, and thus don't want to be bothered to 
figure out how many CPU cores there are. I think a thread limit combined 
with memory usage limit is reasonable here too.

For the above use, there should be default values for the thread and 
memory limits, so that a config file or many command line options 
wouldn't be strictly required to get some threading with the "use 
optimal number of threads" setting. Number of CPU cores and some 
percentage of RAM could work. Users could set better values themselves, 
but defaults are still a nice starting point and may be enough for many.

Note that if I remove the current default memory usage limit from xz, 
the default memory usage limit used to calculate optimal number of 
threads wouldn't be used for anything else; if the limit is too low, xz 
would just drop to single-threaded mode to use minimal amount of RAM.

> We've pondered a bit about switching our packages from .tbz to .xz or
> tar.xz. Given that a package is made once, and downloaded and
> decompressed by a lot of users a lot of times, it would probably make
> sense to go for the smallest possible size;

I had the same reasoning when I got interested in LZMA in 2004. LZMA was 
also much faster to decompress than bzip2.

Slackware uses .txz suffix for .tar.xz packages, so if you prefer a 
single three-letter suffix instead of .tar.xz, .txz is the way to go.

> however, if this would mean that some users won't be able to
> decompress the packages, then probably xz isn't the tools for us.

Decoder memory usage is all about the dictionary size. With 2 MiB 
dictionary you can make most packages smaller with xz than with "bzip2 
-9" while keeping the decoder memory usage (3 MiB) _lower_ than that of 
bzip2 (man page says 3700k without using the slower --small mode).

I would recommend using 8 MiB dictionary for packages. That way 9 MiB of 
memory is needed to decompress. That's what I used for packages years 
ago, and it's also the default in xz ("xz -6"). A dictionary bigger than 
8 MiB is not useful unless the uncompressed file is over 8 MiB. Using 
"xz -6e" might reduce the size a little more with some files, but it's 
not necessarily worth the extra CPU time.

Compressing with "xz -6" needs about 100 MiB memory. It is much more 
than with "bzip2 -9" (man page says 7600k), but should be fine on the 
systems that create the packages.

Using "xz -9" for binary packages would be a bad choice. It doesn't save 
that much space over "xz -6" and can seriously annoy users of older 
computers. In contrast, decompressing files created with "xz -6" works 
nicely on 100 MHz Pentium with 32 MiB RAM (16 MiB should be quite OK 
too). I will need to emphasize much more in the xz docs and possibly 
also in "xz --help" that using -9 really isn't usually what people want.

There are also additional filters that might help. Enabling them 
requires using advanced options. You can try e.g. "xz --x86 --lzma2" 
when compressing data that includes significant amount of x86-32 or 
x86-64 code. That filter has a known problem that makes it perform 
poorly on static libraries (and Linux kernel modules), so applying it to 
all packages isn't necessarily a good idea. In the future (I don't know 
when), there will be a better and easier-to-use filter, that will use 
heuristics to detect when and what extra filtering should be useful.

> Speaking of sizes, do you have any statistical data regarding: source
> size, compression options, compression speed and decompression speed
> (and memory usage, since we're talking about it)?

No. It's good to note here that I haven't so far worked much on the 
actual compression algorithms. The critical parts are directly derived 
from Igor Pavlov's LZMA SDK (the code may look very different at first 
sight, but don't let that mislead you).

As I mentioned in an earlier email, I will tweak the compression 
settings mapped to the compression levels before the 5.0.0 release. To 
do that I will need to collect some data from many different compression 
settings. It probably won't be high quality data, since I have limited 
time for experiments and I just need some rough guidelines to tweak the 
options.

Here are a few known things:

  - Decompression speed is roughly constant x bytes per second of
    _compressed_ data on the same machine. The better the
    compression has been, the faster the decompression tends to be.
    However, if the data doesn't fit to RAM and the system needs
    to swap out parts of the xz process, old floppy disks start to
    become competitive, because the memory is accessed quite
    randomly.

  - Dictionary keeps the most recently processed uncompressed
    data in a ring buffer. Using a dictionary bigger than the
    uncompressed file is useless.

  - Compressor memory usage is roughly 5-12 times the dictionary
    size. It depends on the match finder (see mf under --lzma2 on
    the man page). "xz -vv" shows the encoder memory usage. I might
    make single -v show that info in the future along with the
    decoder memory usage.

  - Decompressor memory usage is a little more than the dictionary
    size. The currently supported extra filters don't use
    significant amount of memory.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode