FreeBSD ports USE_XZ critical issue on low-RAM computers

Tue Jun 22 09:15:00 UTC 2010

On 2010-06-20 Matthias Andree wrote:
> $ export XZ_OPT=-9
> $ export XZ_OPT_OVERRIDES=-M40%
> $ xz -Mmax blah.tar
> 
> would result in the same behaviour as:
> 
> $ xz -9 -M40% blah.tar
> # here, the XZ_OPT_OVERRIDES cancels -Mmax from command line
> 
> and could mean: xz trying -9, but lowering that as necessary to meet
> the -M40% limit.

This or a config file should solve the problem that removing the default 
memory usage limit would create for me and some other people -- at least 
as long we remember to add the environment variable or config file on 
each system. ;-)

Environment variable could be easier than a config file. Adding support 
for it is almost a trivial change to the code, and it is easier to use 
it per-command basis if needed for some reason.

It's still possible that applications using liblzma will use too high 
settings on low-memory systems, but so far I haven't seen such problems 
in real-world situations like I have with scripts that use the xz tool. 
So at least for now there's no need to think about controlling liblzma 
e.g. via an environment variable, and hopefully it will never be needed.

> Environment variables with a big banner "don't XZ_OPT_OVERRIDES use
> in scripts, it is reserved for the user" might work. Then everybody
> can complain to the script author if it touches XZ_OPT_OVERRIDES.

I'm sure it will work.

> > Sure, it cannot "fully" parallelize, whatever that means. But the
> > amount of parallelization that is possible is welcomed by many
> > others (you are the very first person to think it's useless). For
> > example, 7-Zip can use any number of threads with .xz files and
> > there are some liblzma-based experimental tools too.
> 
> Fully parallelizable means neglible overhead on the algorithmic side,
> i. e. near 100% speedup with each new processor added (considering
> Amdahl's law and later refinements).
> 
> If compressing position 20-40MB in a file depends on the outcome of
> compressing positions 0-20MB, the task is not parallelizable at all.
> 
> If two threads manage 140% of throughput of one, it's not "fully"
> parallelizable.

OK, so it's fully parallizable only with a simple method that splits the 
uncompressed data into chunks that are compressed independently. This 
can decrease compression ratio, but often not too much if chunk size is 
big enough. Definition of "too much" naturally depends on the specific 
use case.

There are non-fully parallizable ways too with their own advantages and 
disadvantages. In the long term there will probably be a few different 
threading methods in liblzma.

> > Next question could be how to determine how many threads could be
> > OK for multithreaded decompression. It doesn't "fully" parallelize
> > either, and would be possible only in certain situations. There
> > too the memory usage grows quickly when threads are added. To me,
> > a memory usage limit together with a limit on number of threads
> > looks good; with no limits, the decompressor could end up reading
> > the whole file into RAM (and swap). Threaded decompression isn't
> > so important though, so I'm not even sure if I will ever implement
> > it.
> 
> The easy answer for you is a "-j N" option like make's, with a
> default of 1. Since threads share their address space, the --memory
> option can easily be interpreted either way: overall or per-thread.

My above description was not good. See my previous email and how a 
default limit could be useful here even if single-threaded operation 
should have no limits by default.

> I'd like to avoid this discussion though with the large audiences of
> ports@ and portmgr@ involved.

Feel free to adjust the recipient list.

> I think for adoption in infrastructure,
> we need consistency across all computers before all else.

I can understand that.

For me it is important that if the _default_ memory usage limit is 
thrown away, there needs to be something else to solve the problems that 
the default memory usage limit was designed to fix. I have got some 
useful ideas from this discussion, thanks to you and others commenting 
this thread. I will remove the default limit and probably add support 
for another environment variable. Hopefully this will make most people 
somewhat happy. I'm sorry about the hassle that this issue has created.

> > The dictionary size is only one thing to get high compression. It
> > depends on the file. Some files benefit a lot when dictionary size
> > increases while others benefit mostly from spending more CPU
> > cycles. That's why there is the --extreme option. It allows
> > improving the compression ratio by spending more time without
> > requiring so much RAM.
> 
> The manpages states "factor of two", which barely qualifies as
> "extreme" in my eyes. "extreme" would be an order of magnitude
> (10x).

The option name isn't the greatest, I'm generally bad at naming things. 
Time increase with "xz -2e" is around 10x compared to "xz -2", because 
it turns a fast mode into slow mode without increasing the dictionary 
size. With "xz -6" and "xz -6e" the speed difference is not necessarily 
even 2x. Often "xz -6e" saves only 0.1-0.5 % compared to "xz -6" 
(sometimes much more though), so the extra CPU cycles with big files 
often aren't worth it. It depends on what the use case is.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode