From davidxu at freebsd.org Mon Sep 1 08:08:35 2008 From: davidxu at freebsd.org (David Xu) Date: Mon Sep 1 08:08:41 2008 Subject: mysterious hang in pthread_create In-Reply-To: <20080830184512.GH2038@deviant.kiev.zoral.com.ua> References: <48B7101E.7060203@icyb.net.ua> <48B71BA6.5040504@icyb.net.ua> <20080829141043.GX2038@deviant.kiev.zoral.com.ua> <48B8052A.6070908@icyb.net.ua> <20080829143645.GY2038@deviant.kiev.zoral.com.ua> <20080829190506.GA2038@deviant.kiev.zoral.com.ua> <20080830155622.GF2038@deviant.kiev.zoral.com.ua> <20080830184512.GH2038@deviant.kiev.zoral.com.ua> Message-ID: <48BBA369.9010108@freebsd.org> Kostik Belousov wrote: > Ok, let me to tell the whole story. I am sure that in fact you know > it better then me. > > Assuming libthr is the only threading library, there are two locking > implementations for the rtld: 'default' and the one supplied by libthr. > On the first call to pthread_create(), libthr calls _rtld_thread_init() > to substitute the default by the implementation from libthr. > > In fact, default implementation is broken from my point of view. For > instance, thread_flag update is not atomic. Moreover, it does not > correctly handles sequential acquision of several locks, due > to thread_flag. > > The dl_iterate_phdr() function, called by gcc exception handling support > code, does exactly this. It acquires rtld_phdr_lock, then rtld_bind_lock. > [I shall admit it does this after my change]. In particular, this would > leave the bit for the bind lock set in the thread_flag. > > Andriy' example throw the exception and calls dl_iterate_phdr() before > first thread is created. On thread creation, _rtld_thread_init() is > called, that tries to move the locks according to thread_flag. This is > the cause for the reported wlock acquisition. > > I do not want to change anything in the default rtld locking. It is > disfunctional from the time libc_r is gone, and I think it would be > better to make it nop. My change makes the image that is linked with > libthr, to consistently use libthr locks. The ancient bug is in rtld, rlock_acquire() and wlock_acquire() test thread_flag as a boolean value, because pt_iterate_phdr() tries to lock two locks at same time, this test will always fail once it acquired first lock. The following silly patch fixes the problem Andriy encountered: Index: rtld_lock.c =================================================================== --- rtld_lock.c ??? 182594? +++ rtld_lock.c ?????? @@ -184,7 +184,7 @@ int rlock_acquire(rtld_lock_t lock) { - if (thread_mask_set(lock->mask)) { + if (thread_mask_set(lock->mask) & lock->mask) { dbg("rlock_acquire: recursed"); return (0); } @@ -195,7 +195,7 @@ int wlock_acquire(rtld_lock_t lock) { - if (thread_mask_set(lock->mask)) { + if (thread_mask_set(lock->mask) & lock->mask) { dbg("wlock_acquire: recursed"); return (0); } Regards, David Xu From avg at icyb.net.ua Mon Sep 1 08:34:54 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Mon Sep 1 08:35:01 2008 Subject: mysterious hang in pthread_create In-Reply-To: <48BBA369.9010108@freebsd.org> References: <48B7101E.7060203@icyb.net.ua> <48B71BA6.5040504@icyb.net.ua> <20080829141043.GX2038@deviant.kiev.zoral.com.ua> <48B8052A.6070908@icyb.net.ua> <20080829143645.GY2038@deviant.kiev.zoral.com.ua> <20080829190506.GA2038@deviant.kiev.zoral.com.ua> <20080830155622.GF2038@deviant.kiev.zoral.com.ua> <20080830184512.GH2038@deviant.kiev.zoral.com.ua> <48BBA369.9010108@freebsd.org> Message-ID: <48BBA925.1000303@icyb.net.ua> on 01/09/2008 11:10 David Xu said the following: > The ancient bug is in rtld, rlock_acquire() and wlock_acquire() test > thread_flag as a boolean value, because pt_iterate_phdr() tries to > lock two locks at same time, this test will always fail once it > acquired first lock. > > The following silly patch fixes the problem Andriy encountered: I can confirm - this fixed the issue for me. David, thanks! > Index: rtld_lock.c > =================================================================== > --- rtld_lock.c ??? 182594? > +++ rtld_lock.c ?????? > @@ -184,7 +184,7 @@ > int > rlock_acquire(rtld_lock_t lock) > { > - if (thread_mask_set(lock->mask)) { > + if (thread_mask_set(lock->mask) & lock->mask) { > dbg("rlock_acquire: recursed"); > return (0); > } > @@ -195,7 +195,7 @@ > int > wlock_acquire(rtld_lock_t lock) > { > - if (thread_mask_set(lock->mask)) { > + if (thread_mask_set(lock->mask) & lock->mask) { > dbg("wlock_acquire: recursed"); > return (0); > } > > > Regards, > David Xu > -- Andriy Gapon From kostikbel at gmail.com Mon Sep 1 08:45:54 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Mon Sep 1 08:46:00 2008 Subject: mysterious hang in pthread_create In-Reply-To: <48BBA925.1000303@icyb.net.ua> References: <48B8052A.6070908@icyb.net.ua> <20080829143645.GY2038@deviant.kiev.zoral.com.ua> <20080829190506.GA2038@deviant.kiev.zoral.com.ua> <20080830155622.GF2038@deviant.kiev.zoral.com.ua> <20080830184512.GH2038@deviant.kiev.zoral.com.ua> <48BBA369.9010108@freebsd.org> <48BBA925.1000303@icyb.net.ua> Message-ID: <20080901084548.GQ2038@deviant.kiev.zoral.com.ua> On Mon, Sep 01, 2008 at 11:34:45AM +0300, Andriy Gapon wrote: > on 01/09/2008 11:10 David Xu said the following: > >The ancient bug is in rtld, rlock_acquire() and wlock_acquire() test > >thread_flag as a boolean value, because pt_iterate_phdr() tries to > >lock two locks at same time, this test will always fail once it > >acquired first lock. > > > >The following silly patch fixes the problem Andriy encountered: > > I can confirm - this fixed the issue for me. > David, thanks! Does libc_r still work with patch applied ? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-threads/attachments/20080901/088cccea/attachment.pgp From agile at sunbay.com Mon Sep 1 09:00:29 2008 From: agile at sunbay.com (agile@sunbay.com) Date: Mon Sep 1 09:00:44 2008 Subject: threads/126950: rtld malloc is thread-unsafe Message-ID: <200809010900.m8190IX1054301@freefall.freebsd.org> The following reply was made to PR threads/126950; it has been noted by GNATS. From: agile@sunbay.com To: bug-followup@FreeBSD.org Cc: Subject: Re: threads/126950: rtld malloc is thread-unsafe Date: Mon, 1 Sep 2008 11:58:30 +0300 (EEST) ------=_20080901115830_71529 Content-Type: text/plain; charset="utf8" Content-Transfer-Encoding: 8bit patch for 7.0-RELEASE ------=_20080901115830_71529 Content-Type: text/plain; name="126950.patch.txt" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="126950.patch.txt" diff -ur /usr/src/libexec/rtld-elf/rtld.c /usr/src/libexec/rtld-elf/rtld.c --- /usr/src/libexec/rtld-elf/rtld.c 2008-09-01 11:29:15.000000000 +0300 +++ /usr/src/libexec/rtld-elf/rtld.c 2008-09-01 11:29:15.000000000 +0300 @@ -107,7 +107,7 @@ static Obj_Entry *load_object(const char *, const Obj_Entry *); static Obj_Entry *obj_from_addr(const void *); static void objlist_call_fini(Objlist *, int *lockstate, unsigned long *gen); -static void objlist_call_init(Objlist *); +static void objlist_call_init(Objlist *, int *lockstate); static void objlist_clear(Objlist *); static Objlist_Entry *objlist_find(Objlist *, const Obj_Entry *); static void objlist_init(Objlist *); @@ -513,8 +513,8 @@ r_debug_state(NULL, &obj_main->linkmap); /* say hello to gdb! */ - objlist_call_init(&initlist); lockstate = wlock_acquire(rtld_bind_lock); + objlist_call_init(&initlist, &lockstate); objlist_clear(&initlist); wlock_release(rtld_bind_lock, lockstate); @@ -1473,7 +1473,7 @@ * functions. */ static void -objlist_call_init(Objlist *list) +objlist_call_init(Objlist *list, int *lockstate) { Objlist_Entry *elm, *elm_tmp; char *saved_msg; @@ -1483,6 +1483,7 @@ * call into the dynamic linker and overwrite it. */ saved_msg = errmsg_save(); + wlock_release(rtld_bind_lock, *lockstate); STAILQ_FOREACH_SAFE(elm, list, link, elm_tmp) { dbg("calling init function for %s at %p", elm->obj->path, (void *)elm->obj->init); @@ -1490,6 +1491,7 @@ elm->obj->path); call_initfini_pointer(elm->obj, elm->obj->init); } + *lockstate = wlock_acquire(rtld_bind_lock); errmsg_restore(saved_msg); } @@ -1775,7 +1777,7 @@ if (root->refcount == 0) { /* * The object is no longer referenced, so we must unload it. - * First, call the fini functions with no locks held. + * First, call the fini functions. */ objlist_call_fini(&list_fini, &lockstate, &list_fini_gen); @@ -1890,10 +1892,8 @@ name); GDB_STATE(RT_CONSISTENT,obj ? &obj->linkmap : NULL); - /* Call the init functions with no locks held. */ - wlock_release(rtld_bind_lock, lockstate); - objlist_call_init(&initlist); - lockstate = wlock_acquire(rtld_bind_lock); + /* Call the init functions. */ + objlist_call_init(&initlist, &lockstate); objlist_clear(&initlist); wlock_release(rtld_bind_lock, lockstate); return obj; ------=_20080901115830_71529-- From avg at icyb.net.ua Mon Sep 1 10:53:31 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Mon Sep 1 10:53:37 2008 Subject: mysterious hang in pthread_create In-Reply-To: <20080901084548.GQ2038@deviant.kiev.zoral.com.ua> References: <48B8052A.6070908@icyb.net.ua> <20080829143645.GY2038@deviant.kiev.zoral.com.ua> <20080829190506.GA2038@deviant.kiev.zoral.com.ua> <20080830155622.GF2038@deviant.kiev.zoral.com.ua> <20080830184512.GH2038@deviant.kiev.zoral.com.ua> <48BBA369.9010108@freebsd.org> <48BBA925.1000303@icyb.net.ua> <20080901084548.GQ2038@deviant.kiev.zoral.com.ua> Message-ID: <48BBC9A3.1050905@icyb.net.ua> on 01/09/2008 11:45 Kostik Belousov said the following: > On Mon, Sep 01, 2008 at 11:34:45AM +0300, Andriy Gapon wrote: >> on 01/09/2008 11:10 David Xu said the following: >>> The ancient bug is in rtld, rlock_acquire() and wlock_acquire() test >>> thread_flag as a boolean value, because pt_iterate_phdr() tries to >>> lock two locks at same time, this test will always fail once it >>> acquired first lock. >>> >>> The following silly patch fixes the problem Andriy encountered: >> I can confirm - this fixed the issue for me. >> David, thanks! > > Does libc_r still work with patch applied ? In what sense? The test program that I posted seems to hang in both cases (patched and unpatched rtld). -- Andriy Gapon From bugmaster at FreeBSD.org Mon Sep 1 11:07:04 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Sep 1 11:09:15 2008 Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org Message-ID: <200809011107.m81B73Ik068594@freefall.freebsd.org> Current FreeBSD problem reports Critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/76690 threads fork hang in child for -lc_r 1 problem total. Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s bin/32295 threads pthread(3) dont dequeue signals s threa/34536 threads accept() blocks other threads s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/49087 threads Signals lost in programs linked with libc_r o threa/70975 threads [sysvipc] unexpected and unreliable behaviour when usi o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag s threa/76694 threads fork cause hang in dup()/close() function in child (-l o threa/79683 threads svctcp_create() fails if multiple threads call at the o threa/80435 threads panic on high loads o threa/83914 threads [libc] popen() doesn't work in static threaded program s threa/84483 threads problems with devel/nspr and -lc_r on 4.x s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r o threa/101323 threads fork(2) in threaded programs broken. o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/118715 threads kse problem o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 o threa/126950 threads [patch] rtld(1): rtld malloc is thread-unsafe 24 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/30464 threads pthread mutex attributes -- pshared s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/69020 threads pthreads library leaks _gc_mutex o threa/79887 threads [patch] freopen() isn't thread-safe o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/122923 threads 'nice' does not prevent background process from steali o kern/126128 threads [patch] pthread_condattr_getpshared is broken 12 problems total. From kostikbel at gmail.com Mon Sep 1 11:12:21 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Mon Sep 1 11:12:28 2008 Subject: mysterious hang in pthread_create In-Reply-To: <48BBC9A3.1050905@icyb.net.ua> References: <20080829190506.GA2038@deviant.kiev.zoral.com.ua> <20080830155622.GF2038@deviant.kiev.zoral.com.ua> <20080830184512.GH2038@deviant.kiev.zoral.com.ua> <48BBA369.9010108@freebsd.org> <48BBA925.1000303@icyb.net.ua> <20080901084548.GQ2038@deviant.kiev.zoral.com.ua> <48BBC9A3.1050905@icyb.net.ua> Message-ID: <20080901111215.GS2038@deviant.kiev.zoral.com.ua> On Mon, Sep 01, 2008 at 01:53:23PM +0300, Andriy Gapon wrote: > on 01/09/2008 11:45 Kostik Belousov said the following: > >On Mon, Sep 01, 2008 at 11:34:45AM +0300, Andriy Gapon wrote: > >>on 01/09/2008 11:10 David Xu said the following: > >>>The ancient bug is in rtld, rlock_acquire() and wlock_acquire() test > >>>thread_flag as a boolean value, because pt_iterate_phdr() tries to > >>>lock two locks at same time, this test will always fail once it > >>>acquired first lock. > >>> > >>>The following silly patch fixes the problem Andriy encountered: > >>I can confirm - this fixed the issue for me. > >>David, thanks! > > > >Does libc_r still work with patch applied ? > > In what sense? > The test program that I posted seems to hang in both cases (patched and > unpatched rtld). The David' patch changes the code used to support libc_r operations. Even on CURRENT, if you run 4.x-compiled binary with the support of compat-4x libraries, this code from /libexec/ld-elf.so.1 (installed by CURRENT buildworld) provides locking for rtld. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-threads/attachments/20080901/592689b9/attachment.pgp From avg at icyb.net.ua Mon Sep 1 12:17:51 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Mon Sep 1 12:17:58 2008 Subject: mysterious hang in pthread_create In-Reply-To: <20080901111215.GS2038@deviant.kiev.zoral.com.ua> References: <20080829190506.GA2038@deviant.kiev.zoral.com.ua> <20080830155622.GF2038@deviant.kiev.zoral.com.ua> <20080830184512.GH2038@deviant.kiev.zoral.com.ua> <48BBA369.9010108@freebsd.org> <48BBA925.1000303@icyb.net.ua> <20080901084548.GQ2038@deviant.kiev.zoral.com.ua> <48BBC9A3.1050905@icyb.net.ua> <20080901111215.GS2038@deviant.kiev.zoral.com.ua> Message-ID: <48BBDD6A.60002@icyb.net.ua> on 01/09/2008 14:12 Kostik Belousov said the following: > On Mon, Sep 01, 2008 at 01:53:23PM +0300, Andriy Gapon wrote: >> on 01/09/2008 11:45 Kostik Belousov said the following: >>> On Mon, Sep 01, 2008 at 11:34:45AM +0300, Andriy Gapon wrote: >>>> on 01/09/2008 11:10 David Xu said the following: >>>>> The ancient bug is in rtld, rlock_acquire() and wlock_acquire() test >>>>> thread_flag as a boolean value, because pt_iterate_phdr() tries to >>>>> lock two locks at same time, this test will always fail once it >>>>> acquired first lock. >>>>> >>>>> The following silly patch fixes the problem Andriy encountered: >>>> I can confirm - this fixed the issue for me. >>>> David, thanks! >>> Does libc_r still work with patch applied ? >> In what sense? >> The test program that I posted seems to hang in both cases (patched and >> unpatched rtld). > > The David' patch changes the code used to support libc_r operations. > Even on CURRENT, if you run 4.x-compiled binary with the support of > compat-4x libraries, this code from /libexec/ld-elf.so.1 (installed > by CURRENT buildworld) provides locking for rtld. I understand, but I am not sure what exactly needs to be tested. "Still works" is too broad in this context. -- Andriy Gapon From kostikbel at gmail.com Mon Sep 1 13:17:32 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Mon Sep 1 13:17:42 2008 Subject: mysterious hang in pthread_create In-Reply-To: <48BBDD6A.60002@icyb.net.ua> References: <20080830155622.GF2038@deviant.kiev.zoral.com.ua> <20080830184512.GH2038@deviant.kiev.zoral.com.ua> <48BBA369.9010108@freebsd.org> <48BBA925.1000303@icyb.net.ua> <20080901084548.GQ2038@deviant.kiev.zoral.com.ua> <48BBC9A3.1050905@icyb.net.ua> <20080901111215.GS2038@deviant.kiev.zoral.com.ua> <48BBDD6A.60002@icyb.net.ua> Message-ID: <20080901131724.GT2038@deviant.kiev.zoral.com.ua> On Mon, Sep 01, 2008 at 03:17:46PM +0300, Andriy Gapon wrote: > on 01/09/2008 14:12 Kostik Belousov said the following: > > On Mon, Sep 01, 2008 at 01:53:23PM +0300, Andriy Gapon wrote: ... > >>>>> The following silly patch fixes the problem Andriy encountered: > >>>> I can confirm - this fixed the issue for me. > >>>> David, thanks! > >>> Does libc_r still work with patch applied ? > >> In what sense? > >> The test program that I posted seems to hang in both cases (patched and > >> unpatched rtld). > > > > The David' patch changes the code used to support libc_r operations. > > Even on CURRENT, if you run 4.x-compiled binary with the support of > > compat-4x libraries, this code from /libexec/ld-elf.so.1 (installed > > by CURRENT buildworld) provides locking for rtld. > > I understand, but I am not sure what exactly needs to be tested. > "Still works" is too broad in this context. I am not sure too. As I said, this one of the reason I prefered to not touch that code. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-threads/attachments/20080901/47675f4d/attachment.pgp From davidxu at freebsd.org Tue Sep 2 01:36:38 2008 From: davidxu at freebsd.org (David Xu) Date: Tue Sep 2 01:36:43 2008 Subject: mysterious hang in pthread_create In-Reply-To: <20080901131724.GT2038@deviant.kiev.zoral.com.ua> References: <20080830155622.GF2038@deviant.kiev.zoral.com.ua> <20080830184512.GH2038@deviant.kiev.zoral.com.ua> <48BBA369.9010108@freebsd.org> <48BBA925.1000303@icyb.net.ua> <20080901084548.GQ2038@deviant.kiev.zoral.com.ua> <48BBC9A3.1050905@icyb.net.ua> <20080901111215.GS2038@deviant.kiev.zoral.com.ua> <48BBDD6A.60002@icyb.net.ua> <20080901131724.GT2038@deviant.kiev.zoral.com.ua> Message-ID: <48BC990C.9020208@freebsd.org> Kostik Belousov wrote: > On Mon, Sep 01, 2008 at 03:17:46PM +0300, Andriy Gapon wrote: >> on 01/09/2008 14:12 Kostik Belousov said the following: >>> On Mon, Sep 01, 2008 at 01:53:23PM +0300, Andriy Gapon wrote: > ... >>>>>>> The following silly patch fixes the problem Andriy encountered: >>>>>> I can confirm - this fixed the issue for me. >>>>>> David, thanks! >>>>> Does libc_r still work with patch applied ? >>>> In what sense? >>>> The test program that I posted seems to hang in both cases (patched and >>>> unpatched rtld). >>> The David' patch changes the code used to support libc_r operations. >>> Even on CURRENT, if you run 4.x-compiled binary with the support of >>> compat-4x libraries, this code from /libexec/ld-elf.so.1 (installed >>> by CURRENT buildworld) provides locking for rtld. >> I understand, but I am not sure what exactly needs to be tested. >> "Still works" is too broad in this context. > > I am not sure too. As I said, this one of the reason I prefered to not > touch that code. It should not affect other code outside the rtld. In fact, this patch fixes the maintenance of synchronization between locks and thread_flag. In current code, if you acquire a rwlock, and then acquire second rwlock, the first one will work, but acquiring second lock will fail and thread_flag is out of sync, this results bit flag leaking in thread_flag, and later a _rtld_thread_init() call will transfer the unlocked rwlock state to libthr as locked state, also the existing code does not distinguish reader lock and writer lock, it blindly transfer lock state as write-lock, fortunately, in correct case, it should not be called with any lock held, so the transferring does not occur. Another question is why should dl_iterate_phdr() use exclusive lock ? doesn't this cause all C++ exception to be executed in serialization manner ? Regards, David Xu From kostikbel at gmail.com Tue Sep 2 13:59:14 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Sep 2 13:59:21 2008 Subject: mysterious hang in pthread_create In-Reply-To: <48BC990C.9020208@freebsd.org> References: <20080830184512.GH2038@deviant.kiev.zoral.com.ua> <48BBA369.9010108@freebsd.org> <48BBA925.1000303@icyb.net.ua> <20080901084548.GQ2038@deviant.kiev.zoral.com.ua> <48BBC9A3.1050905@icyb.net.ua> <20080901111215.GS2038@deviant.kiev.zoral.com.ua> <48BBDD6A.60002@icyb.net.ua> <20080901131724.GT2038@deviant.kiev.zoral.com.ua> <48BC990C.9020208@freebsd.org> Message-ID: <20080902135907.GX2038@deviant.kiev.zoral.com.ua> On Tue, Sep 02, 2008 at 09:38:20AM +0800, David Xu wrote: > Kostik Belousov wrote: > >On Mon, Sep 01, 2008 at 03:17:46PM +0300, Andriy Gapon wrote: > >>on 01/09/2008 14:12 Kostik Belousov said the following: > >>>On Mon, Sep 01, 2008 at 01:53:23PM +0300, Andriy Gapon wrote: > >... > >>>>>>>The following silly patch fixes the problem Andriy encountered: > >>>>>>I can confirm - this fixed the issue for me. > >>>>>>David, thanks! > >>>>>Does libc_r still work with patch applied ? > >>>>In what sense? > >>>>The test program that I posted seems to hang in both cases (patched and > >>>>unpatched rtld). > >>>The David' patch changes the code used to support libc_r operations. > >>>Even on CURRENT, if you run 4.x-compiled binary with the support of > >>>compat-4x libraries, this code from /libexec/ld-elf.so.1 (installed > >>>by CURRENT buildworld) provides locking for rtld. > >>I understand, but I am not sure what exactly needs to be tested. > >>"Still works" is too broad in this context. > > > >I am not sure too. As I said, this one of the reason I prefered to not > >touch that code. > > It should not affect other code outside the rtld. In fact, this patch > fixes the maintenance of synchronization between locks and thread_flag. > In current code, if you acquire a rwlock, and then acquire second > rwlock, the first one will work, but acquiring second lock will fail > and thread_flag is out of sync, this results bit flag leaking in > thread_flag, and later a _rtld_thread_init() call will transfer the > unlocked rwlock state to libthr as locked state, also the existing > code does not distinguish reader lock and writer lock, it blindly > transfer lock state as write-lock, fortunately, in correct case, > it should not be called with any lock held, so the transferring > does not occur. Yes, I understand this. And, this code is used when threading implementation is provided by libc_r, see above. I have no objection against that patch, but I think that it shall be verified whether compat-4x threaded programs work correctly with the change. > Another question is why should dl_iterate_phdr() use exclusive lock ? > doesn't this cause all C++ exception to be executed in serialization > manner ? It is required by C++ runtime. See the commit log for r178807. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-threads/attachments/20080902/1956f253/attachment.pgp From dfilter at FreeBSD.ORG Wed Sep 3 01:10:05 2008 From: dfilter at FreeBSD.ORG (dfilter service) Date: Wed Sep 3 01:10:12 2008 Subject: threads/126950: commit references a PR Message-ID: <200809030110.m831A5pO054050@freefall.freebsd.org> The following reply was made to PR threads/126950; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: threads/126950: commit references a PR Date: Wed, 3 Sep 2008 01:05:59 +0000 (UTC) kan 2008-09-03 01:05:32 UTC FreeBSD src repository Modified files: libexec/rtld-elf rtld.c Log: SVN rev 182698 on 2008-09-03 01:05:32Z by kan Make sure internal rtld malloc routines are not called from unlocked contexts as rtld's malloc is not thread safe and is only supposed to be called with exclusive bind lock already held. The originating PR submitted a patch on top of different pre-requisite workaroud for unsafe dlopen calls, and the patch was midief slighlty to apply to stock sources for the purpose of this commit. Running rtld malloc from unlocked contexts is a bug on its own. PR: 126950 Submited by: Oleg Dolgov Revision Changes Path 1.127 +17 -14 src/libexec/rtld-elf/rtld.c _______________________________________________ cvs-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/cvs-all To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org" From julian at elischer.org Thu Sep 4 23:43:13 2008 From: julian at elischer.org (Julian Elischer) Date: Thu Sep 4 23:43:25 2008 Subject: libpthread and gdbserver References: 488A213F.70105@netapp.com Message-ID: <48C06F77.6000006@elischer.org> just saw youe email to freebsd-threads... ----- But pt_ta_map_lwp2thr() internally calls pt_ta_map_id2thr() in libthr_db.c so 'pid to tid' conversion is missing. If FreeBsd intends to keep it this way, I will have to modify gdbserver to not use 'pid' to find threads as Linux does. Can someone shed some light on this? Am I mailing the correct mailing-list with these queries? ----- you might try directly mailing davidxu or marcel or jhb (all @freebsd.org). they SHOULD have seen that posting but may not have.. jhb is I think away for a few days. but was doing stuff with the debugger quite recently. From davidxu at freebsd.org Fri Sep 5 01:56:41 2008 From: davidxu at freebsd.org (David Xu) Date: Fri Sep 5 01:57:00 2008 Subject: libpthread and gdbserver In-Reply-To: <48C06F77.6000006@elischer.org> References: 488A213F.70105@netapp.com <48C06F77.6000006@elischer.org> Message-ID: <48C0923F.6080608@freebsd.org> Julian Elischer wrote: > just saw youe email to freebsd-threads... > > > ----- > > But pt_ta_map_lwp2thr() internally calls pt_ta_map_id2thr() in > libthr_db.c so 'pid to tid' conversion is missing. > If FreeBsd intends to keep it this way, I will have to modify gdbserver > to not use 'pid' to find threads as Linux does. > Can someone shed some light on this? Am I mailing the correct > mailing-list with these queries? > > ----- > > you might try directly mailing davidxu or marcel or jhb (all > @freebsd.org). they SHOULD have seen that posting but may not have.. > > jhb is I think away for a few days. > but was doing stuff with the debugger quite recently. > Our thread has its kernel thread id, it is called lwpid, this is not linux like pid, I know linux thread is a special process. From fernando at schapachnik.com.ar Fri Sep 5 02:08:39 2008 From: fernando at schapachnik.com.ar (Fernando Schapachnik) Date: Fri Sep 5 02:08:46 2008 Subject: Profiling Message-ID: <20080905014610.GC1070@funes.schapachnik.com.ar> Hi, I'm developing a highly concurrent, speed oriented, pthread application and would like to obtain some insight in time spent on mutexes, contention, etc. Any good tool for FreeBSD 7? Thanks! Fernando P. Schapachnik fernando@schapachnik.com.ar From julian at elischer.org Fri Sep 5 02:08:50 2008 From: julian at elischer.org (Julian Elischer) Date: Fri Sep 5 02:08:57 2008 Subject: libpthread and gdbserver In-Reply-To: <48C0923F.6080608@freebsd.org> References: 488A213F.70105@netapp.com <48C06F77.6000006@elischer.org> <48C0923F.6080608@freebsd.org> Message-ID: <48C094B8.3050302@elischer.org> the actual question was from dixit @ netapp.com but for some reason it didn't appear on this email.. David Xu wrote: > Julian Elischer wrote: >> just saw youe email to freebsd-threads... >> >> >> ----- >> >> But pt_ta_map_lwp2thr() internally calls pt_ta_map_id2thr() in >> libthr_db.c so 'pid to tid' conversion is missing. >> If FreeBsd intends to keep it this way, I will have to modify gdbserver >> to not use 'pid' to find threads as Linux does. >> Can someone shed some light on this? Am I mailing the correct >> mailing-list with these queries? >> >> ----- >> >> you might try directly mailing davidxu or marcel or jhb (all >> @freebsd.org). they SHOULD have seen that posting but may not have.. >> >> jhb is I think away for a few days. >> but was doing stuff with the debugger quite recently. >> > > Our thread has its kernel thread id, it is called lwpid, this is not > linux like pid, I know linux thread is a special process. > > > > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to "freebsd-threads-unsubscribe@freebsd.org" From julian at elischer.org Fri Sep 5 02:14:48 2008 From: julian at elischer.org (Julian Elischer) Date: Fri Sep 5 02:14:54 2008 Subject: libpthread and gdbserver In-Reply-To: <48C094B8.3050302@elischer.org> References: 488A213F.70105@netapp.com <48C06F77.6000006@elischer.org> <48C0923F.6080608@freebsd.org> <48C094B8.3050302@elischer.org> Message-ID: <48C0961E.5000300@elischer.org> Julian Elischer wrote: > the actual question was from dixit @ netapp.com > but for some reason it didn't appear on this email.. the origianl question can be seen at: http://lists.freebsd.org/pipermail/freebsd-threads/2008-August/004330.html > > > David Xu wrote: >> Julian Elischer wrote: >>> just saw youe email to freebsd-threads... >>> >>> >>> ----- >>> >>> But pt_ta_map_lwp2thr() internally calls pt_ta_map_id2thr() in >>> libthr_db.c so 'pid to tid' conversion is missing. >>> If FreeBsd intends to keep it this way, I will have to modify gdbserver >>> to not use 'pid' to find threads as Linux does. >>> Can someone shed some light on this? Am I mailing the correct >>> mailing-list with these queries? >>> >>> ----- >>> >>> you might try directly mailing davidxu or marcel or jhb (all >>> @freebsd.org). they SHOULD have seen that posting but may not have.. >>> >>> jhb is I think away for a few days. >>> but was doing stuff with the debugger quite recently. >>> >> >> Our thread has its kernel thread id, it is called lwpid, this is not >> linux like pid, I know linux thread is a special process. >> >> >> >> _______________________________________________ >> freebsd-threads@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-threads >> To unsubscribe, send any mail to >> "freebsd-threads-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to "freebsd-threads-unsubscribe@freebsd.org" From bugmaster at FreeBSD.org Mon Sep 8 02:22:29 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Sep 8 02:24:25 2008 Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org Message-ID: <200809080222.m882MSnV006840@freefall.freebsd.org> The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o threa/126950 threads [patch] rtld(1): rtld malloc is thread-unsafe o kern/126128 threads [patch] pthread_condattr_getpshared is broken o threa/122923 threads 'nice' does not prevent background process from steali o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 o threa/118715 threads kse problem o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/101323 threads fork(2) in threaded programs broken. s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/84483 threads problems with devel/nspr and -lc_r on 4.x o threa/83914 threads [libc] popen() doesn't work in static threaded program o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/80435 threads panic on high loads o threa/79887 threads [patch] freopen() isn't thread-safe o threa/79683 threads svctcp_create() fails if multiple threads call at the s threa/76694 threads fork cause hang in dup()/close() function in child (-l s threa/76690 threads fork hang in child for -lc_r o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/70975 threads [sysvipc] unexpected and unreliable behaviour when usi s threa/69020 threads pthreads library leaks _gc_mutex s threa/49087 threads Signals lost in programs linked with libc_r s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/34536 threads accept() blocks other threads s bin/32295 threads pthread(3) dont dequeue signals s threa/30464 threads pthread mutex attributes -- pshared s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o 37 problems total. Bugs can be in one of several states: o - open A problem report has been submitted, no sanity checking performed. a - analyzed The problem is understood and a solution is being sought. f - feedback Further work requires additional information from the originator or the community - possibly confirmation of the effectiveness of a proposed solution. p - patched A patch has been committed, but some issues (MFC and / or confirmation from originator) are still open. r - repocopy The resolution of the problem report is dependent on a repocopy operation within the CVS repository which is awaiting completion. s - suspended The problem is not being worked on, due to lack of information or resources. This is a prime candidate for somebody who is looking for a project to do. If the problem cannot be solved at all, it will be closed, rather than suspended. c - closed A problem report is closed when any changes have been integrated, documented, and tested -- or when fixing the problem is abandoned. From linimon at FreeBSD.org Mon Sep 8 22:51:11 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Mon Sep 8 22:51:22 2008 Subject: threads/127225: bug in lib/libthr/thread/thr_init.c Message-ID: <200809082251.m88MpBOh057121@freefall.freebsd.org> Synopsis: bug in lib/libthr/thread/thr_init.c Responsible-Changed-From-To: freebsd-bugs->freebsd-threads Responsible-Changed-By: linimon Responsible-Changed-When: Mon Sep 8 22:50:49 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=127225 From kris at FreeBSD.org Tue Sep 9 00:00:21 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Sep 9 00:00:34 2008 Subject: bin/127225: bug in lib/libthr/thread/thr_init.c Message-ID: <200809090000.m8900K6N062483@freefall.freebsd.org> The following reply was made to PR threads/127225; it has been noted by GNATS. From: Kris Kennaway To: comperr Cc: freebsd-gnats-submit@FreeBSD.org Subject: Re: bin/127225: bug in lib/libthr/thread/thr_init.c Date: Tue, 09 Sep 2008 09:28:21 +0930 comperr wrote: > FreeBSD starfx 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #2: Sun Sep 7 21:15:11 EDT 2008 variable@starfx:/extra/backups/obj/src/sys/KERNKOOL i386 >> Description: > Firefox3, xchat, and others fail with: > Fatal error 'Cannot allocate red zone for initial thread' at line 384 in file /extra/src/lib/libthr/thread/thr_init.c (errno = 12) > .. (same error) ... > Fatal error 'Cannot allocate red zone for initial thread' at line 384 in file /extra/src/lib/libthr/thread/thr_init.c (errno = 12) > Bus error (core dumped) > >> How-To-Repeat: > run firefox3 >> Fix: > none at this time Usually means you're using an application that was linked to two versions of the same thread library (google the error message for extensive discussion). Usual cause is from an upgrade from an older version of FreeBSD that was not completed correctly. Kris From ivoras at freebsd.org Fri Sep 12 11:09:19 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Sep 12 11:09:25 2008 Subject: Apache-worker stuck at 100% CPU Message-ID: Hi, I'm running apache2.2.9-worker without mod_php (with mod_fcgid and PHP as FastCGI) and a process or a thread repeatedly gets stuck: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 5503 www 41 96 0 50820K 30960K umtxn 0 722:42 99.32% httpd It doesn't use sys time and ktrace doesn't record anything. Any clues where to dig next? I'm not using any unusual modules for apache, and the same configuration worked in 6.3 (this system was upgraded from 6.3 to 7.0-R - I checked that all apache dependancies are compiled for 7.0). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-threads/attachments/20080912/fde497a2/signature.pgp From alfred at freebsd.org Fri Sep 12 17:15:55 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Fri Sep 12 17:16:01 2008 Subject: Apache-worker stuck at 100% CPU In-Reply-To: References: Message-ID: <20080912165808.GE16977@elvis.mu.org> * Ivan Voras [080912 04:09] wrote: > Hi, > > I'm running apache2.2.9-worker without mod_php (with mod_fcgid and PHP > as FastCGI) and a process or a thread repeatedly gets stuck: > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 5503 www 41 96 0 50820K 30960K umtxn 0 722:42 99.32% httpd > > It doesn't use sys time and ktrace doesn't record anything. > > Any clues where to dig next? > > I'm not using any unusual modules for apache, and the same configuration > worked in 6.3 (this system was upgraded from 6.3 to 7.0-R - I checked > that all apache dependancies are compiled for 7.0). Try using "pstack" a few times. It's in ports. Also, gcore(1) might help. You can also try to attach using gdb. Basically, one of these tools should give you a stack trace which can help. It's interesting that the process is in "umtxn" though, is it multithreaded apache? Can you dump the threads? I think top(1) has an option to view each thread, how about trying that? thank you, -Alfred From ivoras at freebsd.org Fri Sep 12 22:16:39 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Sep 12 22:16:45 2008 Subject: Apache-worker stuck at 100% CPU In-Reply-To: <20080912165808.GE16977@elvis.mu.org> References: <20080912165808.GE16977@elvis.mu.org> Message-ID: <9bbcef730809121444u34991c52m2cbc01a8ada47eb5@mail.gmail.com> 2008/9/12 Alfred Perlstein : > Try using "pstack" a few times. It's in ports. > > Also, gcore(1) might help. Will try. > You can also try to attach using gdb. I did, but either I'm missing something or I'm not using it well, because I can't get a backtrace. How do I select threads to backtrace? How do I pick what thread to backtrace? > Basically, one of these tools should give you a stack trace which > can help. > > It's interesting that the process is in "umtxn" though, is it > multithreaded apache? Can you dump the threads? I think top(1) > has an option to view each thread, how about trying that? Yes, it's multithreaded apache. This did help somewhat - when I do it I see that it's not actually stuck in umtxn - there's one thread that consumes the CPU and it's apparently always running (in state CPUx). PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 7212 www 103 0 30340K 7932K CPU2 2 444:23 99.02% httpd I'm currently upgrading the system to 7-STABLE, to see if it helps. From alfred at freebsd.org Fri Sep 12 22:52:52 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Fri Sep 12 22:52:59 2008 Subject: Apache-worker stuck at 100% CPU In-Reply-To: <9bbcef730809121444u34991c52m2cbc01a8ada47eb5@mail.gmail.com> References: <20080912165808.GE16977@elvis.mu.org> <9bbcef730809121444u34991c52m2cbc01a8ada47eb5@mail.gmail.com> Message-ID: <20080912225251.GG16977@elvis.mu.org> * Ivan Voras [080912 14:45] wrote: > 2008/9/12 Alfred Perlstein : > > > Try using "pstack" a few times. It's in ports. > > > > Also, gcore(1) might help. > > Will try. > > > You can also try to attach using gdb. > > I did, but either I'm missing something or I'm not using it well, > because I can't get a backtrace. How do I select threads to backtrace? > How do I pick what thread to backtrace? i think the command is 'info threads' or 'show threads' then i think you just type 'thread FOO' to select the thread. > > > Basically, one of these tools should give you a stack trace which > > can help. > > > > It's interesting that the process is in "umtxn" though, is it > > multithreaded apache? Can you dump the threads? I think top(1) > > has an option to view each thread, how about trying that? > > Yes, it's multithreaded apache. This did help somewhat - when I do it > I see that it's not actually stuck in umtxn - there's one thread that > consumes the CPU and it's apparently always running (in state CPUx). > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 7212 www 103 0 30340K 7932K CPU2 2 444:23 99.02% httpd > > I'm currently upgrading the system to 7-STABLE, to see if it helps. yeah its stuck in userspace doing something.. -- - Alfred Perlstein From ivoras at freebsd.org Sun Sep 14 11:20:08 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Sun Sep 14 11:20:15 2008 Subject: Apache-worker stuck at 100% CPU In-Reply-To: <20080912225251.GG16977@elvis.mu.org> References: <20080912165808.GE16977@elvis.mu.org> <9bbcef730809121444u34991c52m2cbc01a8ada47eb5@mail.gmail.com> <20080912225251.GG16977@elvis.mu.org> Message-ID: Alfred Perlstein wrote: > yeah its stuck in userspace doing something.. Update: upgrading to 7-STABLE apparently fixed it - it's been two days and it's stable. I'm worried because I didn't find out why was it misbehaving in the first place. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-threads/attachments/20080914/d083dadd/signature.pgp From bugmaster at FreeBSD.org Mon Sep 15 15:18:57 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Sep 15 15:21:13 2008 Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org Message-ID: <200809151518.m8FFIuGX019054@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o threa/127225 threads bug in lib/libthr/thread/thr_init.c o threa/126950 threads [patch] rtld(1): rtld malloc is thread-unsafe o kern/126128 threads [patch] pthread_condattr_getpshared is broken o threa/122923 threads 'nice' does not prevent background process from steali o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 o threa/118715 threads kse problem o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/101323 threads fork(2) in threaded programs broken. s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/84483 threads problems with devel/nspr and -lc_r on 4.x o threa/83914 threads [libc] popen() doesn't work in static threaded program o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/80435 threads panic on high loads o threa/79887 threads [patch] freopen() isn't thread-safe o threa/79683 threads svctcp_create() fails if multiple threads call at the s threa/76694 threads fork cause hang in dup()/close() function in child (-l s threa/76690 threads fork hang in child for -lc_r o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/70975 threads [sysvipc] unexpected and unreliable behaviour when usi s threa/69020 threads pthreads library leaks _gc_mutex s threa/49087 threads Signals lost in programs linked with libc_r s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/34536 threads accept() blocks other threads s bin/32295 threads pthread(3) dont dequeue signals s threa/30464 threads pthread mutex attributes -- pshared s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o 38 problems total. From ivoras at freebsd.org Mon Sep 15 23:04:48 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Mon Sep 15 23:04:56 2008 Subject: Apache-worker stuck at 100% CPU In-Reply-To: <20080912225251.GG16977@elvis.mu.org> References: <20080912165808.GE16977@elvis.mu.org> <9bbcef730809121444u34991c52m2cbc01a8ada47eb5@mail.gmail.com> <20080912225251.GG16977@elvis.mu.org> Message-ID: <9bbcef730809151604i28533745m286e7314810d0362@mail.gmail.com> 2008/9/13 Alfred Perlstein : >> Yes, it's multithreaded apache. This did help somewhat - when I show threads in top >> I see that it's not actually stuck in umtxn - there's one thread that >> consumes the CPU and it's apparently always running (in state CPUx). >> >> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 7212 www 103 0 30340K 7932K CPU2 2 444:23 99.02% httpd >> >> I'm currently upgrading the system to 7-STABLE, to see if it helps. It didn't help. Exactly the same symptom happened again. It looks like it happens a few days after the last system reboot. After it happens the first time, restarting Apache immediately produces one such "stuck" thread - it looks like some system state gets corrupted over time. >> How do I pick what thread to backtrace in gdb? > > i think the command is 'info threads' or 'show > threads' then i think you just type > 'thread FOO' to select the thread. Both commands don't work / don't exist. Any others? (background: apache22-worker port, no mod_php, on 7.0 and 7-STABLE suddenly gets stuck at 100% CPU; the same setup worked on 6-STABLE. I'm looking for ideas) From alfred at freebsd.org Tue Sep 16 02:27:38 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Tue Sep 16 02:27:45 2008 Subject: Apache-worker stuck at 100% CPU In-Reply-To: <9bbcef730809151604i28533745m286e7314810d0362@mail.gmail.com> References: <20080912165808.GE16977@elvis.mu.org> <9bbcef730809121444u34991c52m2cbc01a8ada47eb5@mail.gmail.com> <20080912225251.GG16977@elvis.mu.org> <9bbcef730809151604i28533745m286e7314810d0362@mail.gmail.com> Message-ID: <20080916022738.GJ36572@elvis.mu.org> * Ivan Voras [080915 16:05] wrote: > 2008/9/13 Alfred Perlstein : > > >> Yes, it's multithreaded apache. This did help somewhat - when I show threads in top > >> I see that it's not actually stuck in umtxn - there's one thread that > >> consumes the CPU and it's apparently always running (in state CPUx). > >> > >> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >> 7212 www 103 0 30340K 7932K CPU2 2 444:23 99.02% httpd > >> > >> I'm currently upgrading the system to 7-STABLE, to see if it helps. > > It didn't help. Exactly the same symptom happened again. It looks like > it happens a few days after the last system reboot. After it happens > the first time, restarting Apache immediately produces one such > "stuck" thread - it looks like some system state gets corrupted over > time. > > >> How do I pick what thread to backtrace in gdb? > > > > i think the command is 'info threads' or 'show > > threads' then i think you just type > > 'thread FOO' to select the thread. > > Both commands don't work / don't exist. Any others? > > (background: apache22-worker port, no mod_php, on 7.0 and 7-STABLE > suddenly gets stuck at 100% CPU; the same setup worked on 6-STABLE. > I'm looking for ideas) I'm sorry, I really can't help at this point other than to look through the documents myself to figure out how to do a backtrace/select threads. Give it a shot, and let us know and we can go further. Apologies, -Alfred From ivoras at freebsd.org Tue Sep 16 09:12:02 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Tue Sep 16 09:12:14 2008 Subject: Apache-worker stuck at 100% CPU In-Reply-To: <20080916022738.GJ36572@elvis.mu.org> References: <20080912165808.GE16977@elvis.mu.org> <9bbcef730809121444u34991c52m2cbc01a8ada47eb5@mail.gmail.com> <20080912225251.GG16977@elvis.mu.org> <9bbcef730809151604i28533745m286e7314810d0362@mail.gmail.com> <20080916022738.GJ36572@elvis.mu.org> Message-ID: <9bbcef730809160212m72fffc7k93d0c92ace2b7c19@mail.gmail.com> 2008/9/16 Alfred Perlstein : > * Ivan Voras [080915 16:05] wrote: >> >> How do I pick what thread to backtrace in gdb? >> > >> > i think the command is 'info threads' or 'show >> > threads' then i think you just type >> > 'thread FOO' to select the thread. >> >> Both commands don't work / don't exist. Any others? >> >> (background: apache22-worker port, no mod_php, on 7.0 and 7-STABLE >> suddenly gets stuck at 100% CPU; the same setup worked on 6-STABLE. >> I'm looking for ideas) > > I'm sorry, I really can't help at this point other than to look > through the documents myself to figure out how to do a backtrace/select > threads. > > Give it a shot, and let us know and we can go further. Sorry, I should have been more verbose - "info threads" should work but it doesn't - I can attach and get threads from a "regular" multithreaded process, but when yesterday when I attached to the stuck process, I couldn't get the list of threads. I'll try again the next time it gets stuck and try to provide more information. From alfred at freebsd.org Tue Sep 16 16:42:03 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Tue Sep 16 16:42:58 2008 Subject: Apache-worker stuck at 100% CPU In-Reply-To: <9bbcef730809160212m72fffc7k93d0c92ace2b7c19@mail.gmail.com> References: <20080912165808.GE16977@elvis.mu.org> <9bbcef730809121444u34991c52m2cbc01a8ada47eb5@mail.gmail.com> <20080912225251.GG16977@elvis.mu.org> <9bbcef730809151604i28533745m286e7314810d0362@mail.gmail.com> <20080916022738.GJ36572@elvis.mu.org> <9bbcef730809160212m72fffc7k93d0c92ace2b7c19@mail.gmail.com> Message-ID: <20080916164203.GL36572@elvis.mu.org> * Ivan Voras [080916 02:12] wrote: > 2008/9/16 Alfred Perlstein : > > * Ivan Voras [080915 16:05] wrote: > > >> >> How do I pick what thread to backtrace in gdb? > >> > > >> > i think the command is 'info threads' or 'show > >> > threads' then i think you just type > >> > 'thread FOO' to select the thread. > >> > >> Both commands don't work / don't exist. Any others? > >> > >> (background: apache22-worker port, no mod_php, on 7.0 and 7-STABLE > >> suddenly gets stuck at 100% CPU; the same setup worked on 6-STABLE. > >> I'm looking for ideas) > > > > I'm sorry, I really can't help at this point other than to look > > through the documents myself to figure out how to do a backtrace/select > > threads. > > > > Give it a shot, and let us know and we can go further. > > Sorry, I should have been more verbose - "info threads" should work > but it doesn't - I can attach and get threads from a "regular" > multithreaded process, but when yesterday when I attached to the stuck > process, I couldn't get the list of threads. I'll try again the next > time it gets stuck and try to provide more information. If it happens again, you could try sending it a SIGABRT or SEGV and then trying to diagnose the core dump. Or try using gcore to generate a coredump and debug that. -- - Alfred Perlstein From ivoras at freebsd.org Tue Sep 16 16:45:18 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Tue Sep 16 16:47:26 2008 Subject: Apache-worker stuck at 100% CPU In-Reply-To: <20080916164203.GL36572@elvis.mu.org> References: <20080912165808.GE16977@elvis.mu.org> <9bbcef730809121444u34991c52m2cbc01a8ada47eb5@mail.gmail.com> <20080912225251.GG16977@elvis.mu.org> <9bbcef730809151604i28533745m286e7314810d0362@mail.gmail.com> <20080916022738.GJ36572@elvis.mu.org> <9bbcef730809160212m72fffc7k93d0c92ace2b7c19@mail.gmail.com> <20080916164203.GL36572@elvis.mu.org> Message-ID: <9bbcef730809160945y8f472bfw60af0d22149d9376@mail.gmail.com> 2008/9/16 Alfred Perlstein : > * Ivan Voras [080916 02:12] wrote: >> Sorry, I should have been more verbose - "info threads" should work >> but it doesn't - I can attach and get threads from a "regular" >> multithreaded process, but when yesterday when I attached to the stuck >> process, I couldn't get the list of threads. I'll try again the next >> time it gets stuck and try to provide more information. > > If it happens again, you could try sending it a SIGABRT or SEGV > and then trying to diagnose the core dump. > > Or try using gcore to generate a coredump and debug that. It happens approximately every two days; I've rebuild apache with debugging symbols so it will be easier to dig around this time. From bugmaster at FreeBSD.org Mon Sep 22 11:07:05 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Sep 22 11:07:55 2008 Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org Message-ID: <200809221107.m8MB74ZT015532@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o threa/127225 threads bug in lib/libthr/thread/thr_init.c o threa/126950 threads [patch] rtld(1): rtld malloc is thread-unsafe o kern/126128 threads [patch] pthread_condattr_getpshared is broken o threa/122923 threads 'nice' does not prevent background process from steali o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 o threa/118715 threads kse problem o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/101323 threads fork(2) in threaded programs broken. s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/84483 threads problems with devel/nspr and -lc_r on 4.x o threa/83914 threads [libc] popen() doesn't work in static threaded program o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/80435 threads panic on high loads o threa/79887 threads [patch] freopen() isn't thread-safe o threa/79683 threads svctcp_create() fails if multiple threads call at the s threa/76694 threads fork cause hang in dup()/close() function in child (-l s threa/76690 threads fork hang in child for -lc_r o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/70975 threads [sysvipc] unexpected and unreliable behaviour when usi s threa/69020 threads pthreads library leaks _gc_mutex s threa/49087 threads Signals lost in programs linked with libc_r s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/34536 threads accept() blocks other threads s bin/32295 threads pthread(3) dont dequeue signals s threa/30464 threads pthread mutex attributes -- pshared s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o 38 problems total. From dixit at netapp.com Wed Sep 24 21:34:14 2008 From: dixit at netapp.com (Amol Dixit) Date: Wed Sep 24 21:34:20 2008 Subject: SIGTRAP during thr_new syscall Message-ID: <48DAABAF.9030709@netapp.com> Hi, I am seeing an unexpected SIGTRAP being reported to gdbserver when the debugged process creates a new thread via the _pthread_create() call of libthr library. [libthr/thread/thr_create.c,v 1.22.4.1, Freebsd 6.0] Gdbserver has internally set a breakpoint on address of _thread_bp_create() so that it gets notified on thread creation and is expecting a SIGTRAP at address (stop pc) of _thread_bp_create(). But instead SIGTRAP happens as a side-effect of thr_new() system call and the stop pc at that point is that of routine thread_start() which is the starting function of new thread. So gdbserver cannot match expected breakpoint (ie. _thread_bp_create) and is confused. For testing purpose, if I call _thread_bp_create() before thr_new() in _pthread_create(), I get the _expected_ SIGTRAP with address of _thread_bp_create. But that is not the fix. Does anyone have any idea about this SIGTRAP being reported to tracing process gdbserver as part of thr_new? Where is it originating from and why? Thanks, Amol From davidxu at freebsd.org Thu Sep 25 03:43:38 2008 From: davidxu at freebsd.org (David Xu) Date: Thu Sep 25 03:43:44 2008 Subject: SIGTRAP during thr_new syscall In-Reply-To: <48DAABAF.9030709@netapp.com> References: <48DAABAF.9030709@netapp.com> Message-ID: <48DB0950.3060405@freebsd.org> Amol Dixit wrote: > Hi, > I am seeing an unexpected SIGTRAP being reported to gdbserver when the > debugged process creates a new thread via the _pthread_create() call of > libthr library. [libthr/thread/thr_create.c,v 1.22.4.1, Freebsd 6.0] > Gdbserver has internally set a breakpoint on address of > _thread_bp_create() so that it gets notified on thread creation and is > expecting a SIGTRAP at address (stop pc) of _thread_bp_create(). But > instead SIGTRAP happens as a side-effect of thr_new() system call and > the stop pc at that point is that of routine thread_start() which is the > starting function of new thread. So gdbserver cannot match expected > breakpoint (ie. _thread_bp_create) and is confused. > For testing purpose, if I call _thread_bp_create() before thr_new() in > _pthread_create(), I get the _expected_ SIGTRAP with address of > _thread_bp_create. But that is not the fix. > Does anyone have any idea about this SIGTRAP being reported to tracing > process gdbserver as part of thr_new? Where is it originating from and why? > Thanks, > Amol > I found kernel clears trap flag for new process but not for new thread in cpu_fork(), you may try following patch: Index: i386/i386/vm_machdep.c =================================================================== --- i386/i386/vm_machdep.c (revision 183337) +++ i386/i386/vm_machdep.c (working copy) @@ -413,6 +413,15 @@ bcopy(td0->td_frame, td->td_frame, sizeof(struct trapframe)); /* + * If the current thread has the trap bit set (i.e. a debugger had + * single stepped the process to the system call), we need to clear + * the trap flag from the new frame. Otherwise, the new thread will + * receive a (likely unexpected) SIGTRAP when it executes the first + * instruction after returning to userland. + */ + td->td_frame->tf_eflags &= ~PSL_T; + + /* * Set registers for trampoline to user mode. Leave space for the * return address on stack. These are the kernel mode register values. */ From kostikbel at gmail.com Thu Sep 25 09:03:29 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Thu Sep 25 09:03:35 2008 Subject: SIGTRAP during thr_new syscall In-Reply-To: <48DB0950.3060405@freebsd.org> References: <48DAABAF.9030709@netapp.com> <48DB0950.3060405@freebsd.org> Message-ID: <20080925090318.GK47828@deviant.kiev.zoral.com.ua> On Thu, Sep 25, 2008 at 11:45:20AM +0800, David Xu wrote: > Amol Dixit wrote: > >Hi, > >I am seeing an unexpected SIGTRAP being reported to gdbserver when the > >debugged process creates a new thread via the _pthread_create() call of > >libthr library. [libthr/thread/thr_create.c,v 1.22.4.1, Freebsd 6.0] > >Gdbserver has internally set a breakpoint on address of > >_thread_bp_create() so that it gets notified on thread creation and is > >expecting a SIGTRAP at address (stop pc) of _thread_bp_create(). But > >instead SIGTRAP happens as a side-effect of thr_new() system call and > >the stop pc at that point is that of routine thread_start() which is the > >starting function of new thread. So gdbserver cannot match expected > >breakpoint (ie. _thread_bp_create) and is confused. > >For testing purpose, if I call _thread_bp_create() before thr_new() in > >_pthread_create(), I get the _expected_ SIGTRAP with address of > >_thread_bp_create. But that is not the fix. > >Does anyone have any idea about this SIGTRAP being reported to tracing > >process gdbserver as part of thr_new? Where is it originating from and why? > >Thanks, > >Amol > > > > I found kernel clears trap flag for new process but not for new thread > in cpu_fork(), you may try following patch: > > Index: i386/i386/vm_machdep.c > =================================================================== > --- i386/i386/vm_machdep.c (revision 183337) > +++ i386/i386/vm_machdep.c (working copy) > @@ -413,6 +413,15 @@ > bcopy(td0->td_frame, td->td_frame, sizeof(struct trapframe)); > > /* > + * If the current thread has the trap bit set (i.e. a debugger had > + * single stepped the process to the system call), we need to clear > + * the trap flag from the new frame. Otherwise, the new thread will > + * receive a (likely unexpected) SIGTRAP when it executes the first > + * instruction after returning to userland. > + */ > + td->td_frame->tf_eflags &= ~PSL_T; > + > + /* > * Set registers for trampoline to user mode. Leave space for the > * return address on stack. These are the kernel mode register > values. > */ Should this be done for amd64 version of the cpu_set_upcall() as well ? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-threads/attachments/20080925/42cd36b8/attachment.pgp From dchhetri at panasas.com Fri Sep 26 19:19:10 2008 From: dchhetri at panasas.com (Dilip Chhetri) Date: Fri Sep 26 19:19:16 2008 Subject: getting stack trace for other thread on the same process : libthr Message-ID: <48DD32D2.2060304@panasas.com> Question -------- My program is linked with libthr in FreeBSD-7.0. The program has in the order of 20 threads, and a designated monitoring thread at some point wants to know what are other/stuck threads doing. This needs to be done by printing stack backtrace for the thread to stdout. I understand pthread_t structure has pointer to the target thread's stack, but to get the trace I need to know value of stack-pointer register and base-pointer register. I looked at the code and I don't find any mechanism by which I could read the target threads register context (because it all resides within kernel thread structure). Further code study reveals that kernel_thread->td_frame contains the register context for a thread, but is valid only when the thread is executing/sleeping inside the kernel. Is there anything I'm missing here ? Is there an easy way to traverse stack for some thread with in the same process. I considered/considering following approaches, a) use PTRACE ruled out, because you can't trace the process from within the same process b) somehow temporarily stop the target-thread and read td_frame by traversing kernel data structure through /dev/kmem. After doing stack traversal resume the target thread. Detailed problem background -------------------------- We have this process X with ~20 threads, each processing some requests. One of them is designated as monitoring/dispatcher thread. When a new request arrives, dispatcher thread tries to queue the task to idle thread. But if all threads are busy processing requests, the dispatcher thread is supposed to print the stack back trace for each of the busy thread. This is our *debugging* mechanism to find potential fault-points. In FreeBSD-4.6.2, we hacked libc_r:pthread_t to achieve our goal. But in FreeBSD-7.0, we decided to use libthr and hack doesn't seem to be easy. Target setup ------------ * SMP : around 8 CPU * process : it's going to be run as root and have around ~20 threads From tijl at ulyssis.org Fri Sep 26 22:01:42 2008 From: tijl at ulyssis.org (Tijl Coosemans) Date: Fri Sep 26 22:01:50 2008 Subject: getting stack trace for other thread on the same process : libthr In-Reply-To: <48DD32D2.2060304@panasas.com> References: <48DD32D2.2060304@panasas.com> Message-ID: <200809262331.29353.tijl@ulyssis.org> On Friday 26 September 2008 21:06:58 Dilip Chhetri wrote: > Question > -------- > My program is linked with libthr in FreeBSD-7.0. The program has > in the order of 20 threads, and a designated monitoring thread at > some point wants to know what are other/stuck threads doing. This > needs to be done by printing stack backtrace for the thread to > stdout. > > I understand pthread_t structure has pointer to the target > thread's stack, but to get the trace I need to know value of > stack-pointer register and base-pointer register. I looked at the > code and I don't find any mechanism by which I could read the target > threads register context (because it all resides within kernel thread > structure). Further code study reveals that kernel_thread->td_frame > contains the register context for a thread, but is valid only when > the thread is executing/sleeping inside the kernel. > > Is there anything I'm missing here ? Is there an easy way to > traverse stack for some thread with in the same process. > > I considered/considering following approaches, > a) use PTRACE > ruled out, because you can't trace the process from within the > same process > > b) somehow temporarily stop the target-thread and read td_frame by > traversing kernel data structure through /dev/kmem. After doing > stack traversal resume the target thread. > > > Detailed problem background > -------------------------- > We have this process X with ~20 threads, each processing some > requests. One of them is designated as monitoring/dispatcher thread. > When a new request arrives, dispatcher thread tries to queue the task > to idle thread. But if all threads are busy processing requests, the > dispatcher thread is supposed to print the stack back trace for each > of the busy thread. This is our *debugging* mechanism to find > potential fault-points. > > In FreeBSD-4.6.2, we hacked libc_r:pthread_t to achieve our goal. > But in FreeBSD-7.0, we decided to use libthr and hack doesn't seem to > be easy. > > Target setup > ------------ > * SMP : around 8 CPU > * process : it's going to be run as root and have around ~20 threads You could try registering a signal handler for SIGUSR1 that prints a stack backtrace using the stack pointer in the sigcontext and then call pthread_kill(SIGUSR1) on whichever thread you want a backtrace of. From bugmaster at FreeBSD.org Mon Sep 29 11:06:59 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Sep 29 11:09:00 2008 Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org Message-ID: <200809291106.m8TB6wrf040962@freefall.freebsd.org> Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o threa/127225 threads bug in lib/libthr/thread/thr_init.c o threa/126950 threads [patch] rtld(1): rtld malloc is thread-unsafe o kern/126128 threads [patch] pthread_condattr_getpshared is broken o threa/122923 threads 'nice' does not prevent background process from steali o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 o threa/118715 threads kse problem o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/101323 threads fork(2) in threaded programs broken. s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/84483 threads problems with devel/nspr and -lc_r on 4.x o threa/83914 threads [libc] popen() doesn't work in static threaded program o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/80435 threads panic on high loads o threa/79887 threads [patch] freopen() isn't thread-safe o threa/79683 threads svctcp_create() fails if multiple threads call at the s threa/76694 threads fork cause hang in dup()/close() function in child (-l s threa/76690 threads fork hang in child for -lc_r o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/70975 threads [sysvipc] unexpected and unreliable behaviour when usi s threa/69020 threads pthreads library leaks _gc_mutex s threa/49087 threads Signals lost in programs linked with libc_r s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/34536 threads accept() blocks other threads s kern/32295 threads [libc_r] [patch] pthread(3) dont dequeue signals s threa/30464 threads pthread mutex attributes -- pshared s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o 38 problems total. From dchhetri at panasas.com Mon Sep 29 16:59:48 2008 From: dchhetri at panasas.com (Dilip Chhetri) Date: Mon Sep 29 16:59:54 2008 Subject: getting stack trace for other thread on the same process : libthr In-Reply-To: <200809262331.29353.tijl@ulyssis.org> References: <48DD32D2.2060304@panasas.com> <200809262331.29353.tijl@ulyssis.org> Message-ID: <48E10978.2090907@panasas.com> Tijl Coosemans wrote: > On Friday 26 September 2008 21:06:58 Dilip Chhetri wrote: > >>Question >>-------- >> My program is linked with libthr in FreeBSD-7.0. The program has >>in the order of 20 threads, and a designated monitoring thread at >>some point wants to know what are other/stuck threads doing. This >>needs to be done by printing stack backtrace for the thread to >>stdout. >> >> I understand pthread_t structure has pointer to the target >>thread's stack, but to get the trace I need to know value of >>stack-pointer register and base-pointer register. I looked at the >>code and I don't find any mechanism by which I could read the target >>threads register context (because it all resides within kernel thread >>structure). Further code study reveals that kernel_thread->td_frame >>contains the register context for a thread, but is valid only when >>the thread is executing/sleeping inside the kernel. >> >> Is there anything I'm missing here ? Is there an easy way to >>traverse stack for some thread with in the same process. >> >> I considered/considering following approaches, >>a) use PTRACE >> ruled out, because you can't trace the process from within the >> same process >> >>b) somehow temporarily stop the target-thread and read td_frame by >> traversing kernel data structure through /dev/kmem. After doing >> stack traversal resume the target thread. >> >> >>Detailed problem background >>-------------------------- >> We have this process X with ~20 threads, each processing some >>requests. One of them is designated as monitoring/dispatcher thread. >>When a new request arrives, dispatcher thread tries to queue the task >>to idle thread. But if all threads are busy processing requests, the >>dispatcher thread is supposed to print the stack back trace for each >>of the busy thread. This is our *debugging* mechanism to find >>potential fault-points. >> >> In FreeBSD-4.6.2, we hacked libc_r:pthread_t to achieve our goal. >>But in FreeBSD-7.0, we decided to use libthr and hack doesn't seem to >>be easy. >> >>Target setup >>------------ >> * SMP : around 8 CPU >> * process : it's going to be run as root and have around ~20 threads > > > You could try registering a signal handler for SIGUSR1 that prints a > stack backtrace using the stack pointer in the sigcontext and then call > pthread_kill(SIGUSR1) on whichever thread you want a backtrace of. Thanks, but as I mentioned it's a network based program and it may be sleeping/stuck in syscall for some packets, in this case pthread_kill will not work because signals are delivered only when you return from syscall (that's what I haved learned from old UNIX books in my college). From tijl at ulyssis.org Mon Sep 29 20:09:57 2008 From: tijl at ulyssis.org (Tijl Coosemans) Date: Mon Sep 29 20:10:04 2008 Subject: getting stack trace for other thread on the same process : libthr In-Reply-To: <48E10978.2090907@panasas.com> References: <48DD32D2.2060304@panasas.com> <200809262331.29353.tijl@ulyssis.org> <48E10978.2090907@panasas.com> Message-ID: <200809292208.29315.tijl@ulyssis.org> On Monday 29 September 2008 18:59:36 Dilip Chhetri wrote: > Tijl Coosemans wrote: >> On Friday 26 September 2008 21:06:58 Dilip Chhetri wrote: >> >>> Question >>> -------- >>> My program is linked with libthr in FreeBSD-7.0. The program has >>> in the order of 20 threads, and a designated monitoring thread at >>> some point wants to know what are other/stuck threads doing. This >>> needs to be done by printing stack backtrace for the thread to >>> stdout. >>> >>> I understand pthread_t structure has pointer to the target >>> thread's stack, but to get the trace I need to know value of >>> stack-pointer register and base-pointer register. I looked at the >>> code and I don't find any mechanism by which I could read the >>> target threads register context (because it all resides within >>> kernel thread structure). Further code study reveals that >>> kernel_thread->td_frame contains the register context for a thread, >>> but is valid only when the thread is executing/sleeping inside the >>> kernel. >>> >>> Is there anything I'm missing here ? Is there an easy way to >>> traverse stack for some thread with in the same process. >>> >>> I considered/considering following approaches, >>> a) use PTRACE >>> ruled out, because you can't trace the process from within the >>> same process >>> >>> b) somehow temporarily stop the target-thread and read td_frame by >>> traversing kernel data structure through /dev/kmem. After doing >>> stack traversal resume the target thread. >>> >>> >>> Detailed problem background >>> -------------------------- >>> We have this process X with ~20 threads, each processing some >>> requests. One of them is designated as monitoring/dispatcher >>> thread. When a new request arrives, dispatcher thread tries to >>> queue the task to idle thread. But if all threads are busy >>> processing requests, the dispatcher thread is supposed to print the >>> stack back trace for each of the busy thread. This is our >>> *debugging* mechanism to find potential fault-points. >>> >>> In FreeBSD-4.6.2, we hacked libc_r:pthread_t to achieve our goal. >>> But in FreeBSD-7.0, we decided to use libthr and hack doesn't seem >>> to be easy. >>> >>> Target setup >>> ------------ >>> * SMP : around 8 CPU >>> * process : it's going to be run as root and have around ~20 >>> threads >> >> You could try registering a signal handler for SIGUSR1 that prints a >> stack backtrace using the stack pointer in the sigcontext and then >> call pthread_kill(SIGUSR1) on whichever thread you want a backtrace >> of. > > Thanks, but as I mentioned it's a network based program and it may be > sleeping/stuck in syscall for some packets, in this case pthread_kill > will not work because signals are delivered only when you return from > syscall (that's what I haved learned from old UNIX books in my > college). Those kind of syscalls are usually interruptable though. Depending on the SA_RESTART flag they are then either aborted and return EINTR or restarted (or return partial success). See the sigaction(2) manpage. From dchhetri at panasas.com Mon Sep 29 21:43:25 2008 From: dchhetri at panasas.com (Dilip Chhetri) Date: Mon Sep 29 21:43:30 2008 Subject: getting stack trace for other thread on the same process : libthr In-Reply-To: <200809292208.29315.tijl@ulyssis.org> References: <48DD32D2.2060304@panasas.com> <200809262331.29353.tijl@ulyssis.org> <48E10978.2090907@panasas.com> <200809292208.29315.tijl@ulyssis.org> Message-ID: <48E14BF0.1050108@panasas.com> Tijl Coosemans wrote: > On Monday 29 September 2008 18:59:36 Dilip Chhetri wrote: > >>Tijl Coosemans wrote: >> >>>On Friday 26 September 2008 21:06:58 Dilip Chhetri wrote: >>> >>> >>>>Question >>>>-------- >>>> My program is linked with libthr in FreeBSD-7.0. The program has >>>>in the order of 20 threads, and a designated monitoring thread at >>>>some point wants to know what are other/stuck threads doing. This >>>>needs to be done by printing stack backtrace for the thread to >>>>stdout. >>>> >>>> I understand pthread_t structure has pointer to the target >>>>thread's stack, but to get the trace I need to know value of >>>>stack-pointer register and base-pointer register. I looked at the >>>>code and I don't find any mechanism by which I could read the >>>>target threads register context (because it all resides within >>>>kernel thread structure). Further code study reveals that >>>>kernel_thread->td_frame contains the register context for a thread, >>>>but is valid only when the thread is executing/sleeping inside the >>>>kernel. >>>> >>>> Is there anything I'm missing here ? Is there an easy way to >>>>traverse stack for some thread with in the same process. >>>> >>>> I considered/considering following approaches, >>>>a) use PTRACE >>>> ruled out, because you can't trace the process from within the >>>> same process >>>> >>>>b) somehow temporarily stop the target-thread and read td_frame by >>>> traversing kernel data structure through /dev/kmem. After doing >>>> stack traversal resume the target thread. >>>> >>>> >>>>Detailed problem background >>>>-------------------------- >>>> We have this process X with ~20 threads, each processing some >>>>requests. One of them is designated as monitoring/dispatcher >>>>thread. When a new request arrives, dispatcher thread tries to >>>>queue the task to idle thread. But if all threads are busy >>>>processing requests, the dispatcher thread is supposed to print the >>>>stack back trace for each of the busy thread. This is our >>>>*debugging* mechanism to find potential fault-points. >>>> >>>> In FreeBSD-4.6.2, we hacked libc_r:pthread_t to achieve our goal. >>>>But in FreeBSD-7.0, we decided to use libthr and hack doesn't seem >>>>to be easy. >>>> >>>>Target setup >>>>------------ >>>> * SMP : around 8 CPU >>>> * process : it's going to be run as root and have around ~20 >>>> threads >>> >>>You could try registering a signal handler for SIGUSR1 that prints a >>>stack backtrace using the stack pointer in the sigcontext and then >>>call pthread_kill(SIGUSR1) on whichever thread you want a backtrace >>>of. >> >>Thanks, but as I mentioned it's a network based program and it may be >>sleeping/stuck in syscall for some packets, in this case pthread_kill >>will not work because signals are delivered only when you return from >>syscall (that's what I haved learned from old UNIX books in my >>college). > > > Those kind of syscalls are usually interruptable though. Depending on > the SA_RESTART flag they are then either aborted and return EINTR or > restarted (or return partial success). See the sigaction(2) manpage. thanks. I will give that a try, maybe it will work 90% of the time for us. Thats much better than having nothing or something that is too complicated to implement. Thanks once again.