RFC: jemalloc: qdbus sigsegv in malloc_init

Gustau Pérez i Querol gperez at entel.upc.edu
Tue May 1 18:17:40 UTC 2012

Al 30/04/2012 21:34, En/na Jason Evans ha escrit:
> On Apr 30, 2012, at 7:13 AM, Gustau Pérez i Querol wrote:
>>   the kde team is seeing some strange problems with the new version (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I also suspect that the problem described below is affecting the experimental cinnamon port (an alternative to gnome3, possible replacement of gnome2).
>>   The problem happens with both i386 and amd64 with empty /etc/malloc.conf and simple /etc/make.conf. Everything compiled with base gcc (no clang). The kernel was compiled with no debug support, but it can enable if needed. There are reports from avilla at freebsd.org of the same behavior with clang compiled world and kernel and with   MALLOC_PRODUCTION=yes.
>> When qdbus starts, it segfauts. The backtrace of the problem with r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon by hand in a X+twm session, we see it calls calloc many times and after a fixed number of times segfaults. We see it segfaults at rb_gen (a quite large macro defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h).
>> If the daemon is started by hand, I'm able to skip all the calls qdbus makes to calloc till the one causing the segfault. At that point, at rb_gen, we don't exactly know what is going on or how to debug the macro. Ktrace are available, but we were unable to find anything new from them.
>>   With old versions of current before the jemalloc imports (as of March 30th) the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 24th (can be more precise, it was during the jemalloc imports) the daemon segfaulted at malloc_init. Bts are available if needed, and if necessary I can go back to those revision and recompile world+kernel to see its behavior.
>>   Any help from freebsd-current@ (perhaps Jason Evans can help us) will be appreciated. Any additional info, like source revisions, can be provided. I would like to stress that the experimental devel/dbus-qt4 works fine with recent stable.
> The crash is happening in page run management, so there is some pretty bad memory corruption going on by the time of the crash.  If I understand you correctly, you have reproduced the crash on a system that does *not* have MALLOC_PRODUCTION defined, which means that none of the assertions in jemalloc caught the problem.
> Adrian Chadd made the excellent suggestion of trying valgrind; it's likely to point out the problem almost immediately.  If that doesn't work, the utrace functionality in malloc may help you figure out what activity has occurred by the time of the crash, and give you a better understanding of what happened to memory around the address that is involved in the crash.

    Thanks all for your suggestions. It would appear devel/dbus-qt4 has 
some problems with multithread management, the daemon has a problem 
which consists in starting a lot of threads and leading it to be 
finished due to stack exhaustion.

   Valgrind suggested to increase the stack size, doing so made things 
even worse; the qdbus daemon was able to spawn even more threads, 
causing the machine to need more memory than the physically allocated 
(that is, it started to use swap).

   So the problem seems to be not related to jemalloc or malloc. As the 
experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem 
has do to with some differences between head and stable. When we get 
more hints where the problem is, I will post them in a new thread in 
freebsd-current at .

   Anyhow, thanks again for your suggestions!


