From bugmaster at FreeBSD.org Mon May 5 11:07:14 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon May 5 11:07:30 2008 Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org Message-ID: <200805051107.m45B7E3e070869@freefall.freebsd.org> Current FreeBSD problem reports Critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/76690 threads fork hang in child for -lc_r 1 problem total. Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s bin/32295 threads pthread dont dequeue signals s threa/34536 threads accept() blocks other threads s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/49087 threads Signals lost in programs linked with libc_r o threa/70975 threads unexpected and unreliable behaviour when using SYSV se o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag s threa/76694 threads fork cause hang in dup()/close() function in child (-l o threa/79683 threads svctcp_create() fails if multiple threads call at the o threa/80435 threads panic on high loads o threa/83914 threads [libc] popen() doesn't work in static threaded program s threa/84483 threads problems with devel/nspr and -lc_r on 4.x s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r o threa/101323 threads fork(2) in threaded programs broken. o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/118715 threads kse problem o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 o threa/123062 threads C++ exception handling can loop during stacking unwind 24 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/30464 threads pthread mutex attributes -- pshared s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/69020 threads pthreads library leaks _gc_mutex o threa/79887 threads [patch] freopen() isn't thread-safe o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/122923 threads 'nice' does not prevent background process from steali 11 problems total. From kib at FreeBSD.org Mon May 5 16:26:00 2008 From: kib at FreeBSD.org (kib@FreeBSD.org) Date: Mon May 5 16:26:05 2008 Subject: threads/123062: C++ exception handling can loop during stacking unwinding in multithreaded programs Message-ID: <200805051626.m45GQ0gV098198@freefall.freebsd.org> Synopsis: C++ exception handling can loop during stacking unwinding in multithreaded programs Responsible-Changed-From-To: freebsd-threads->kib Responsible-Changed-By: kib Responsible-Changed-When: Mon May 5 16:25:34 UTC 2008 Responsible-Changed-Why: Take. http://www.freebsd.org/cgi/query-pr.cgi?pr=123062 From bugmaster at FreeBSD.org Mon May 12 11:07:07 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon May 12 11:07:23 2008 Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org Message-ID: <200805121107.m4CB76Lf038173@freefall.freebsd.org> Current FreeBSD problem reports Critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/76690 threads fork hang in child for -lc_r 1 problem total. Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s bin/32295 threads pthread dont dequeue signals s threa/34536 threads accept() blocks other threads s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/49087 threads Signals lost in programs linked with libc_r o threa/70975 threads unexpected and unreliable behaviour when using SYSV se o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag s threa/76694 threads fork cause hang in dup()/close() function in child (-l o threa/79683 threads svctcp_create() fails if multiple threads call at the o threa/80435 threads panic on high loads o threa/83914 threads [libc] popen() doesn't work in static threaded program s threa/84483 threads problems with devel/nspr and -lc_r on 4.x s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r o threa/101323 threads fork(2) in threaded programs broken. o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/118715 threads kse problem o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 23 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/30464 threads pthread mutex attributes -- pshared s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/69020 threads pthreads library leaks _gc_mutex o threa/79887 threads [patch] freopen() isn't thread-safe o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/122923 threads 'nice' does not prevent background process from steali 11 problems total. From avg at icyb.net.ua Tue May 13 13:45:44 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Tue May 13 13:45:49 2008 Subject: RELENG_7 libthr: _umtx_op -1 errno 60 Operation timed out Message-ID: <48299715.8010303@icyb.net.ua> I observe the following issue with some programs (e.g. firefox) from time. A completely idle program suddenly starts using a lot of CPU and sometimes memory too. For example, minimized firefox with no pages open. Restarting the program make the behavior go away. On one such occasion I ran ktrace on firefox and I saw a lot of messages like the following in kdump: 13974 firefox-bin RET _umtx_op -1 errno 60 Operation timed out Attaching with gdb I also saw a quite strange stack (unfortunately firefox and the libs are without debug). I didn't save it but it looked something like the following: __error() ?? ?? pthread_cond_init() ?? I wonder what could be a cause of this. Could it be some sort of resource limitation? -- Andriy Gapon From avg at icyb.net.ua Wed May 14 15:17:47 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed May 14 15:17:54 2008 Subject: thread scheduling at mutex unlock Message-ID: <482B0297.2050300@icyb.net.ua> I am trying the small attached program on FreeBSD 6.3 (amd64, SCHED_4BSD) and 7-STABLE (i386, SCHED_ULE), both with libthr as threads library and on both it produces "BROKEN" message. I compile this program as follows: cc sched_test.c -o sched_test -pthread I believe that the behavior I observe is broken because: if thread #1 releases a mutex and then tries to re-acquire it while thread #2 was already blocked waiting on that mutex, then thread #1 should be "queued" after thread #2 in mutex waiter's list. Is there any option (thread scheduler, etc) that I could try to achieve "good" behavior? P.S. I understand that all this is subject to (thread) scheduler policy, but I think that what I expect is more reasonable, at least it is more reasonable for my application. -- Andriy Gapon From avg at icyb.net.ua Wed May 14 20:14:26 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Wed May 14 20:14:28 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482B0297.2050300@icyb.net.ua> References: <482B0297.2050300@icyb.net.ua> Message-ID: <482B3475.6060803@icyb.net.ua> on 14/05/2008 18:17 Andriy Gapon said the following: > I am trying the small attached program on FreeBSD 6.3 (amd64, > SCHED_4BSD) and 7-STABLE (i386, SCHED_ULE), both with libthr as threads > library and on both it produces "BROKEN" message. > > I compile this program as follows: > cc sched_test.c -o sched_test -pthread > > I believe that the behavior I observe is broken because: if thread #1 > releases a mutex and then tries to re-acquire it while thread #2 was > already blocked waiting on that mutex, then thread #1 should be "queued" > after thread #2 in mutex waiter's list. > > Is there any option (thread scheduler, etc) that I could try to achieve > "good" behavior? > > P.S. I understand that all this is subject to (thread) scheduler policy, > but I think that what I expect is more reasonable, at least it is more > reasonable for my application. > > Daniel Eischen has just kindly notified me that the code (as an attachment) didn't make it to the list, so here it is inline. #include #include #include #include pthread_mutex_t mutex; int count = 0; static void * thrfunc(void * arg) { while (1) { pthread_mutex_lock(&mutex); count++; if (count > 10) { fprintf(stderr, "you have a BROKEN thread scheduler!!!\n"); exit(1); } sleep(1); pthread_mutex_unlock(&mutex); } } int main(void) { pthread_t thr; #if 0 pthread_mutexattr_t attr; pthread_mutexattr_init(&attr); pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE_NP); pthread_mutex_init(&mutex, &attr); #else pthread_mutex_init(&mutex, NULL); #endif pthread_create(&thr, NULL, thrfunc, NULL); sleep(2); pthread_mutex_lock(&mutex); count = 0; printf("you have good thread scheduler\n"); pthread_mutex_unlock(&mutex); return 0; } -- Andriy Gapon From davids at webmaster.com Thu May 15 00:40:37 2008 From: davids at webmaster.com (David Schwartz) Date: Thu May 15 00:40:41 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482B0297.2050300@icyb.net.ua> Message-ID: > I am trying the small attached program on FreeBSD 6.3 (amd64, > SCHED_4BSD) and 7-STABLE (i386, SCHED_ULE), both with libthr as threads > library and on both it produces "BROKEN" message. > > I compile this program as follows: > cc sched_test.c -o sched_test -pthread > > I believe that the behavior I observe is broken because: if thread #1 > releases a mutex and then tries to re-acquire it while thread #2 was > already blocked waiting on that mutex, then thread #1 should be "queued" > after thread #2 in mutex waiter's list. > > Is there any option (thread scheduler, etc) that I could try to achieve > "good" behavior? > > P.S. I understand that all this is subject to (thread) scheduler policy, > but I think that what I expect is more reasonable, at least it is more > reasonable for my application. > > -- > Andriy Gapon Are you out of your mind?! You are specifically asking for the absolute worst possible behavior! If you have fifty tiny things to do on one side of the room and fifty tiny things to do on the other side, do you cross the room after each one? Of course not. That would be *ludicrous*. If you want/need strict alternation, feel free to code it. But it's the maximally inefficient scheduler behavior, and it sure as hell had better not be the default. DS From davidxu at freebsd.org Thu May 15 04:20:46 2008 From: davidxu at freebsd.org (David Xu) Date: Thu May 15 04:20:53 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482B0297.2050300@icyb.net.ua> References: <482B0297.2050300@icyb.net.ua> Message-ID: <482BBA77.8000704@freebsd.org> Andriy Gapon wrote: > I am trying the small attached program on FreeBSD 6.3 (amd64, > SCHED_4BSD) and 7-STABLE (i386, SCHED_ULE), both with libthr as threads > library and on both it produces "BROKEN" message. > > I compile this program as follows: > cc sched_test.c -o sched_test -pthread > > I believe that the behavior I observe is broken because: if thread #1 > releases a mutex and then tries to re-acquire it while thread #2 was > already blocked waiting on that mutex, then thread #1 should be "queued" > after thread #2 in mutex waiter's list. > > Is there any option (thread scheduler, etc) that I could try to achieve > "good" behavior? > > P.S. I understand that all this is subject to (thread) scheduler policy, > but I think that what I expect is more reasonable, at least it is more > reasonable for my application. > In fact, libthr is trying to avoid this conveying, if thread #1 hands off the ownership to thread #2, it will cause lots of context switch, in the idea world, I would let thread #1 to run until it exhausts its time slice, and at the end of its time slices, thread #2 will get the mutex ownership, of course it is difficult to make this work on SMP, but on UP, I would expect the result will be close enough if thread scheduler is sane, so we don't raise priority in kernel umtx code if a thread is blocked, this gives thread #1 some times to re-acquire the mutex without context switches, increases throughput. Regards, David Xu From b.j.casavant at ieee.org Thu May 15 05:56:59 2008 From: b.j.casavant at ieee.org (Brent Casavant) Date: Thu May 15 05:57:04 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482B0297.2050300@icyb.net.ua> References: <482B0297.2050300@icyb.net.ua> Message-ID: On Wed, 14 May 2008, Andriy Gapon wrote: > I believe that the behavior I observe is broken because: if thread #1 > releases a mutex and then tries to re-acquire it while thread #2 was > already blocked waiting on that mutex, then thread #1 should be "queued" > after thread #2 in mutex waiter's list. The behavior of scheduling with respect to mutex contention (apart from pthread_mutexattr_setprotocol()) is not specified by POSIX, to the best of my knowledge, and thus is left to the discretion of the implementation. > Is there any option (thread scheduler, etc) that I could try to achieve > "good" behavior? No portable mechanism, and not any mechanism in the operating systems with which I am familiar. That said, as the behavior is not specified by POSIX, there would be nothing preventing an implementation from providing this as an optional behavior through a custom pthread_mutexattr_???_np() interface. > P.S. I understand that all this is subject to (thread) scheduler policy, > but I think that what I expect is more reasonable, at least it is more > reasonable for my application. As other responders have indicated, the behavior you desire is as unoptimal as possible for the general case. If your application would truly benefit from this sort of queuing behavior, I'd suggest that either you need to implement your own mechanism to accomplish the queueing (probably the easier fix), or that the threading architecture of your application could be designed in a different manner that avoids this problem (probably the more difficult fix). Brent -- Brent Casavant Dance like everybody should be watching. www.angeltread.org KD5EMB, EN34lv From avg at icyb.net.ua Thu May 15 08:36:02 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Thu May 15 08:36:14 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482BBA77.8000704@freebsd.org> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> Message-ID: <482BF5EA.5010806@icyb.net.ua> on 15/05/2008 07:22 David Xu said the following: > In fact, libthr is trying to avoid this conveying, if thread #1 > hands off the ownership to thread #2, it will cause lots of context > switch, in the idea world, I would let thread #1 to run until it > exhausts its time slice, and at the end of its time slices, > thread #2 will get the mutex ownership, of course it is difficult to > make this work on SMP, but on UP, I would expect the result will > be close enough if thread scheduler is sane, so we don't raise > priority in kernel umtx code if a thread is blocked, this gives > thread #1 some times to re-acquire the mutex without context switches, > increases throughput. Brent, David, thank you for the responses. I think I incorrectly formulated my original concern. It is not about behavior at mutex unlock but about behavior at mutex re-lock. You are right that waking waiters at unlock would hurt performance. But I think that it is not "fair" that at re-lock former owner gets the lock immediately and the thread that waited on it for longer time doesn't get a chance. Here's a more realistic example than the mock up code. Say you have a worker thread that processes queued requests and the load is such that there is always something on the queue. Thus the worker thread doesn't ever have to block waiting on it. And let's say that there is a GUI thread that wants to convey some information to the worker thread. And for that it needs to acquire some mutex and "do something". With current libthr behavior the GUI thread would never have a chance to get the mutex as worker thread would always be a winner (as my small program shows). Or even more realistic: there should be a feeder thread that puts things on the queue, it would never be able to enqueue new items until the queue becomes empty if worker thread's code looks like the following: while(1) { pthread_mutex_lock(&work_mutex); while(queue.is_empty()) pthread_cond_wait(...); //dequeue item ... pthread_mutex_unlock(&work mutex); //perform some short and non-blocking processing of the item ... } Because the worker thread (while the queue is not empty) would never enter cond_wait and would always re-lock the mutex shortly after unlocking it. So while improving performance on small scale this mutex re-acquire-ing unfairness may be hurting interactivity and thread concurrency and thus performance in general. E.g. in the above example queue would always be effectively of depth 1. Something about "lock starvation" comes to mind. So, yes, this is not about standards, this is about reasonable expectations about thread concurrency behavior in a particular implementation (libthr). I see now that performance advantage of libthr over libkse came with a price. I think that something like queued locks is needed. They would clearly reduce raw throughput performance, so maybe that should be a new (non-portable?) mutex attribute. -- Andriy Gapon From avg at icyb.net.ua Thu May 15 09:01:17 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Thu May 15 09:01:28 2008 Subject: thread scheduling at mutex unlock References: Message-ID: <482BFBDA.6060705@icyb.net.ua> David Schwartz wrote: > Are you out of your mind?! You are specifically asking for the absolute = > worst possible behavior! > > If you have fifty tiny things to do on one side of the room and fifty = > tiny things to do on the other side, do you cross the room after each = > one? Of course not. That would be *ludicrous*. > > If you want/need strict alternation, feel free to code it. But it's the = > maximally inefficient scheduler behavior, and it sure as hell had better = > not be the default. David, what if you have an infinite number of items on one side and finite number on the other, and you want to process them all (in infinite time, of course). Would you still try to finish everything on one side (the infinite one) or would you try to look at what you have on the other side? I am sorry about fuzzy wording of my original report, I should have mentioned "starvation" somewhere in it. -- Andriy Gapon From davidxu at freebsd.org Thu May 15 09:04:09 2008 From: davidxu at freebsd.org (David Xu) Date: Thu May 15 09:04:22 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482BF5EA.5010806@icyb.net.ua> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> Message-ID: <482BFCE3.7080704@freebsd.org> Andriy Gapon wrote: > Brent, David, > > thank you for the responses. > I think I incorrectly formulated my original concern. > It is not about behavior at mutex unlock but about behavior at mutex > re-lock. You are right that waking waiters at unlock would hurt > performance. But I think that it is not "fair" that at re-lock former > owner gets the lock immediately and the thread that waited on it for > longer time doesn't get a chance. > > Here's a more realistic example than the mock up code. > Say you have a worker thread that processes queued requests and the load > is such that there is always something on the queue. Thus the worker > thread doesn't ever have to block waiting on it. > And let's say that there is a GUI thread that wants to convey some > information to the worker thread. And for that it needs to acquire some > mutex and "do something". > With current libthr behavior the GUI thread would never have a chance to > get the mutex as worker thread would always be a winner (as my small > program shows). > Or even more realistic: there should be a feeder thread that puts things > on the queue, it would never be able to enqueue new items until the > queue becomes empty if worker thread's code looks like the following: > > while(1) > { > pthread_mutex_lock(&work_mutex); > while(queue.is_empty()) > pthread_cond_wait(...); > //dequeue item > ... > pthread_mutex_unlock(&work mutex); > //perform some short and non-blocking processing of the item > ... > } > > Because the worker thread (while the queue is not empty) would never > enter cond_wait and would always re-lock the mutex shortly after > unlocking it. > > So while improving performance on small scale this mutex re-acquire-ing > unfairness may be hurting interactivity and thread concurrency and thus > performance in general. E.g. in the above example queue would always be > effectively of depth 1. > Something about "lock starvation" comes to mind. > > So, yes, this is not about standards, this is about reasonable > expectations about thread concurrency behavior in a particular > implementation (libthr). > I see now that performance advantage of libthr over libkse came with a > price. I think that something like queued locks is needed. They would > clearly reduce raw throughput performance, so maybe that should be a new > (non-portable?) mutex attribute. > You forgot that default scheduling policy is time-sharing, after thread #2 has blocked on the mutex for a while, when thread #1 unlocks the mutex and unblocks thread #1, the thread #2's priority will be raised and it preempts thread #1, the thread #2 then acquires the mutex, that's how it balances between fairness and performance. Regards, David Xu From avg at icyb.net.ua Thu May 15 09:27:36 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Thu May 15 09:27:49 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482BFCE3.7080704@freebsd.org> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> <482BFCE3.7080704@freebsd.org> Message-ID: <482C0206.1050206@icyb.net.ua> on 15/05/2008 12:05 David Xu said the following: > Andriy Gapon wrote: > >> Brent, David, >> >> thank you for the responses. >> I think I incorrectly formulated my original concern. >> It is not about behavior at mutex unlock but about behavior at mutex >> re-lock. You are right that waking waiters at unlock would hurt >> performance. But I think that it is not "fair" that at re-lock former >> owner gets the lock immediately and the thread that waited on it for >> longer time doesn't get a chance. >> >> Here's a more realistic example than the mock up code. >> Say you have a worker thread that processes queued requests and the >> load is such that there is always something on the queue. Thus the >> worker thread doesn't ever have to block waiting on it. >> And let's say that there is a GUI thread that wants to convey some >> information to the worker thread. And for that it needs to acquire >> some mutex and "do something". >> With current libthr behavior the GUI thread would never have a chance >> to get the mutex as worker thread would always be a winner (as my >> small program shows). >> Or even more realistic: there should be a feeder thread that puts >> things on the queue, it would never be able to enqueue new items until >> the queue becomes empty if worker thread's code looks like the following: >> >> while(1) >> { >> pthread_mutex_lock(&work_mutex); >> while(queue.is_empty()) >> pthread_cond_wait(...); >> //dequeue item >> ... >> pthread_mutex_unlock(&work mutex); >> //perform some short and non-blocking processing of the item >> ... >> } >> >> Because the worker thread (while the queue is not empty) would never >> enter cond_wait and would always re-lock the mutex shortly after >> unlocking it. >> >> So while improving performance on small scale this mutex >> re-acquire-ing unfairness may be hurting interactivity and thread >> concurrency and thus performance in general. E.g. in the above example >> queue would always be effectively of depth 1. >> Something about "lock starvation" comes to mind. >> >> So, yes, this is not about standards, this is about reasonable >> expectations about thread concurrency behavior in a particular >> implementation (libthr). >> I see now that performance advantage of libthr over libkse came with a >> price. I think that something like queued locks is needed. They would >> clearly reduce raw throughput performance, so maybe that should be a >> new (non-portable?) mutex attribute. >> > > You forgot that default scheduling policy is time-sharing, after thread > #2 has blocked on the mutex for a while, when thread #1 unlocks the > mutex and unblocks thread #1, the thread #2's priority will be raised > and it preempts thread #1, the thread #2 then acquires the mutex, > that's how it balances between fairness and performance. Maybe. But that's not what I see with my small example program. One thread releases and re-acquires a mutex 10 times in a row while the other doesn't get it a single time. I think that there is a very slim chance of a blocked thread preempting a running thread in this circumstances. Especially if execution time between unlock and re-lock is very small. I'd rather prefer to have an option to have FIFO fairness in mutex lock rather than always avoiding context switch at all costs and depending on scheduler to eventually do priority magic. -- Andriy Gapon From davidxu at freebsd.org Thu May 15 12:57:21 2008 From: davidxu at freebsd.org (David Xu) Date: Thu May 15 12:57:36 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482C0206.1050206@icyb.net.ua> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> <482BFCE3.7080704@freebsd.org> <482C0206.1050206@icyb.net.ua> Message-ID: <482C3333.1070205@freebsd.org> Andriy Gapon wrote: > > Maybe. But that's not what I see with my small example program. One > thread releases and re-acquires a mutex 10 times in a row while the > other doesn't get it a single time. > I think that there is a very slim chance of a blocked thread > preempting a running thread in this circumstances. Especially if > execution time between unlock and re-lock is very small. It does not depends on how many times your thread acquires or re-acquires mutex, or how small the region the mutex is protecting. as long as current thread runs too long, other threads will have higher priorities and the ownership definitely will be transfered, though there will be some extra context switchings. > I'd rather prefer to have an option to have FIFO fairness in mutex > lock rather than always avoiding context switch at all costs and > depending on scheduler to eventually do priority magic. > It is better to implement this behavior in your application code, if it is implemented in thread library, you still can not control how many times acquiring and re-acquiring can be allowed for a thread without context switching, a simple FIFO as you said here will cause dreadful performance problem. From deischen at freebsd.org Thu May 15 13:45:51 2008 From: deischen at freebsd.org (Daniel Eischen) Date: Thu May 15 13:45:57 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482BF5EA.5010806@icyb.net.ua> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> Message-ID: On Thu, 15 May 2008, Andriy Gapon wrote: > Or even more realistic: there should be a feeder thread that puts things on > the queue, it would never be able to enqueue new items until the queue > becomes empty if worker thread's code looks like the following: > > while(1) > { > pthread_mutex_lock(&work_mutex); > while(queue.is_empty()) > pthread_cond_wait(...); > //dequeue item > ... > pthread_mutex_unlock(&work mutex); > //perform some short and non-blocking processing of the item > ... > } > > Because the worker thread (while the queue is not empty) would never enter > cond_wait and would always re-lock the mutex shortly after unlocking it. Well in theory, the kernel scheduler will let both threads run fairly with regards to their cpu usage, so this should even out the enqueueing and dequeueing threads. You could also optimize the above a little bit by dequeueing everything in the queue instead of one at a time. > So while improving performance on small scale this mutex re-acquire-ing > unfairness may be hurting interactivity and thread concurrency and thus > performance in general. E.g. in the above example queue would always be > effectively of depth 1. > Something about "lock starvation" comes to mind. > > So, yes, this is not about standards, this is about reasonable expectations > about thread concurrency behavior in a particular implementation (libthr). > I see now that performance advantage of libthr over libkse came with a price. > I think that something like queued locks is needed. They would clearly reduce > raw throughput performance, so maybe that should be a new (non-portable?) > mutex attribute. -- DE From deischen at freebsd.org Thu May 15 17:54:12 2008 From: deischen at freebsd.org (Daniel Eischen) Date: Thu May 15 17:54:17 2008 Subject: thread scheduling at mutex unlock In-Reply-To: References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> Message-ID: On Thu, 15 May 2008, Daniel Eischen wrote: > On Thu, 15 May 2008, Andriy Gapon wrote: > >> Or even more realistic: there should be a feeder thread that puts things on >> the queue, it would never be able to enqueue new items until the queue >> becomes empty if worker thread's code looks like the following: >> >> while(1) >> { >> pthread_mutex_lock(&work_mutex); >> while(queue.is_empty()) >> pthread_cond_wait(...); >> //dequeue item >> ... >> pthread_mutex_unlock(&work mutex); >> //perform some short and non-blocking processing of the item >> ... >> } >> >> Because the worker thread (while the queue is not empty) would never enter >> cond_wait and would always re-lock the mutex shortly after unlocking it. > > Well in theory, the kernel scheduler will let both threads run fairly > with regards to their cpu usage, so this should even out the enqueueing > and dequeueing threads. > > You could also optimize the above a little bit by dequeueing everything > in the queue instead of one at a time. I suppose you could also enforce your own scheduling with something like the following: pthread_cond_t writer_cv; pthread_cond_t reader_cv; pthread_mutex_t q_mutex; ... thingy_q_t thingy_q; int writers_waiting = 0; int readers_waiting = 0; ... void enqueue(thingy_t *thingy) { pthread_mutex_lock(q_mutex); /* Insert into thingy q */ ... if (readers_waiting > 0) { pthread_cond_broadcast(&reader_cv, &q_mutex); readers_waiting = 0; } while (thingy_q.size > ENQUEUE_THRESHOLD_HIGH) { writers_waiting++; pthread_cond_wait(&writer_cv, &q_mutex); } pthread_mutex_unlock(&q_mutex); } thingy_t * dequeue(void) { thingy_t *thingy; pthread_mutex_lock(&q_mutex); while (thingy_q.size == 0) { readers_waiting++; pthread_cond_wait(&reader_cv, &q_mutex); } /* Dequeue thingy */ ... if ((writers_waiting > 0) && thingy_q.size < ENQUEUE_THRESHOLD_LOW)) { /* Wakeup the writers. */ pthread_cond_broadcast(&writer_cv, &q_mutex); writers_waiting = 0; } pthread_mutex_unlock(&q_mutex); return (thingy); } The above is completely untested and probably contains some bugs ;-) You probably shouldn't need anything like that if the kernel scheduler is scheduling your threads fairly. -- DE From davids at webmaster.com Thu May 15 19:30:18 2008 From: davids at webmaster.com (David Schwartz) Date: Thu May 15 19:30:23 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482BFBDA.6060705@icyb.net.ua> Message-ID: > what if you have an infinite number of items on one side and finite > number on the other, and you want to process them all (in infinite time, > of course). Would you still try to finish everything on one side (the > infinite one) or would you try to look at what you have on the other side? > > I am sorry about fuzzy wording of my original report, I should have > mentioned "starvation" somewhere in it. There is no such thing as a "fair share" when comparing an infinite quantity to a finite quantity. It is just as sensible to do 1 then 1 as 10 then 10 or a billion then 1. What I would do in this case is work on one side for one timeslice then the other side for one timeslice, continuuing until either side was finished, then I'd work exclusively on the other side. This is precisely the purpose for having timeslices in a scheduler. The timeslice is carefully chosen so that it's not so long that you ignore a side for too long. It's also carefully chosen so that it's not so short that you spend all your time switching swides. What sane schedulers do is assume that you want to make as much forward progress as quickly as possible. This means getting as many work units done per unit time as possible. This means as few context switches as possible. A scheduler that switches significantly more often than once per timeslice with a load like this is *broken*. The purpose of the timeslice is to place an upper bound on the number of context switches in cases where forward progress can be made on more than one process. An ideal scheduler would not switch more often than once per timeslice unless it could not make further forward progress. Real-world schedulers actually may allow one side to pre-empt the other, and may switch a bit more often than a scheduler that's "ideal" in the sense described above. This is done in an attempt to boost interactive performance. But your basic assumption that strict alternation is desirable is massively wrong. That's the *worst* *possible* outcome. DS From b.j.casavant at ieee.org Thu May 15 19:51:27 2008 From: b.j.casavant at ieee.org (Brent Casavant) Date: Thu May 15 19:51:31 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482BF5EA.5010806@icyb.net.ua> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> Message-ID: On Thu, 15 May 2008, Andriy Gapon wrote: > With current libthr behavior the GUI thread would never have a chance to get > the mutex as worker thread would always be a winner (as my small program > shows). The example you gave indicates an incorrect mechanism being used for the GUI to communicate with this worker thread. For the behavior you desire, you need a common condition that lets both the GUI and the work item generator indicate that there is something for the worker to do, *and* you need seperate mechanisms for the GUI and work item generator to add to their respective queues. Something like this (could be made even better with a little effor): struct worker_queues_s { pthread_mutex_t work_mutex; struct work_queue_s work_queue; pthread_mutex_t gui_mutex; struct gui_queue_s gui_queue; pthread_mutex_t stuff_mutex; int stuff_todo; pthread_cond_t stuff_cond; }; struct worker_queue_s wq; int main(int argc, char *argv[]) { // blah blah init_worker_queue(&wq); // blah blah } void gui_callback(...) { // blah blah // Set up GUI message pthread_mutex_lock(&wq.gui_mutex); // Add GUI message to queue pthread_mutex_unlock(&wq.gui_mutex); pthread_mutex_lock(&wq.stuff_mutex); wq.stuff_todo++; pthread_cond_signal(&wq.stuff_cond); pthread_mutex_unlock(&wq.stuff_mutex); // blah blah } void* work_generator_thread(void*) { // blah blah while (1) { // Set up work to do pthread_mutex_lock(&wq.work_mutex); // Add work item to queue pthread_mutex_unlock(&wq.work_mutex); pthread_mutex_lock(&wq.stuff_mutex); wq.stuff_todo++; pthread_cond_signal(&wq.stuff_cond); pthread_mutex_unlock(&wq.stuff_mutex); } // blah blah } void* worker_thread(void* arg) { // blah blah while (1) { // Wait for there to be something to do pthread_mutex_lock(&wq.stuff_mutex); while (wq.stuff_todo < 1) { pthread_cond_wait(&wq.stuff_cond, &wq.stuff_mutex); } pthread_mutex_unlock(&wq.stuff_mutex); // Handle GUI messages pthread_mutex_lock(&wq.gui_mutex); while (!gui_queue_empty(&wq.gui_queue) { // dequeue and process GUI messages pthread_mutex_lock(&wq.stuff_mutex); wq.stuff_todo--; pthread_mutex_unlock(&wq.stuff_mutex); } pthread_mutex_unlock(&wq.gui_mutex); // Handle work items pthread_mutex_lock(&wq.work_mutex); while (!work_queue_empty(&wq.work_queue)) { // dequeue and process work item pthread_mutex_lock(&wq.stuff_mutex); wq.stuff_todo--; pthread_mutex_unlock(&wq.stuff_mutex); } pthread_mutex_unlock(&wq.work_mutex); } // blah blah } This should accomplish what you desire. Caution that I haven't compiled, run, or tested it, but I'm pretty sure it's a solid solution. The key here is unifying the two input sources (the GUI and work queues) without blocking on either one of them individually. The value of (wq.stuff_todo < 1) becomes a proxy for the value of (gui_queue_empty(...) && work_queue_empty(...)). I hope that helps, Brent -- Brent Casavant Dance like everybody should be watching. www.angeltread.org KD5EMB, EN34lv From davids at webmaster.com Thu May 15 20:26:38 2008 From: davids at webmaster.com (David Schwartz) Date: Thu May 15 20:26:44 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482BF5EA.5010806@icyb.net.ua> Message-ID: > Brent, David, > > thank you for the responses. > I think I incorrectly formulated my original concern. > It is not about behavior at mutex unlock but about behavior at mutex > re-lock. You are right that waking waiters at unlock would hurt > performance. But I think that it is not "fair" that at re-lock former > owner gets the lock immediately and the thread that waited on it for > longer time doesn't get a chance. You are correct, but fairness is not the goal, performance is. If you want fairness, you are welcome to code it. But threads don't file union grievances, and it would be absolute foolishness for a scheduler to sacrifice performance to make threads happier. The scheduler decides which thread runs, you decide what the running thread does. You are expected to use your control over that latter part to exercise whatever your application notion of "fairness" is. Your test program is a classic example of a case where the use of a mutex is inappropriate. > Here's a more realistic example than the mock up code. > Say you have a worker thread that processes queued requests and the load > is such that there is always something on the queue. Thus the worker > thread doesn't ever have to block waiting on it. > And let's say that there is a GUI thread that wants to convey some > information to the worker thread. And for that it needs to acquire some > mutex and "do something". > With current libthr behavior the GUI thread would never have a chance to > get the mutex as worker thread would always be a winner (as my small > program shows). Nonsense. The worker thread would be doing work most of the time and wouldn't be holding the mutex. > Or even more realistic: there should be a feeder thread that puts things > on the queue, it would never be able to enqueue new items until the > queue becomes empty if worker thread's code looks like the following: > > while(1) > { > pthread_mutex_lock(&work_mutex); > while(queue.is_empty()) > pthread_cond_wait(...); > //dequeue item > ... > pthread_mutex_unlock(&work mutex); > //perform some short and non-blocking processing of the item > ... > } > > Because the worker thread (while the queue is not empty) would never > enter cond_wait and would always re-lock the mutex shortly after > unlocking it. So what? The feeder thread could get the mutex after the mutex is unlocked before the worker thread goes to do work. The only reason your test code encountered a "problem" was because you yielded the CPU while you held the mutex and never used up a timeslice. > So while improving performance on small scale this mutex re-acquire-ing > unfairness may be hurting interactivity and thread concurrency and thus > performance in general. E.g. in the above example queue would always be > effectively of depth 1. > Something about "lock starvation" comes to mind. Nope. You have to create a situation where the mutex is held much more often than not held to get this behavior. That's a pathological case where the use of a mutex is known to be inappropriate. > So, yes, this is not about standards, this is about reasonable > expectations about thread concurrency behavior in a particular > implementation (libthr). > I see now that performance advantage of libthr over libkse came with a > price. I think that something like queued locks is needed. They would > clearly reduce raw throughput performance, so maybe that should be a new > (non-portable?) mutex attribute. If you want queued locks, feel free to code them and use them. But you have to work very hard to create a case where they are useful. If you find you're holding the mutex more often than not, you're doing something *very* wrong. DS From avg at icyb.net.ua Thu May 15 20:48:34 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Thu May 15 20:48:41 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482C3333.1070205@freebsd.org> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> <482BFCE3.7080704@freebsd.org> <482C0206.1050206@icyb.net.ua> <482C3333.1070205@freebsd.org> Message-ID: <482CA191.1030004@icyb.net.ua> on 15/05/2008 15:57 David Xu said the following: > Andriy Gapon wrote: >> >> Maybe. But that's not what I see with my small example program. One >> thread releases and re-acquires a mutex 10 times in a row while the >> other doesn't get it a single time. >> I think that there is a very slim chance of a blocked thread >> preempting a running thread in this circumstances. Especially if >> execution time between unlock and re-lock is very small. > It does not depends on how many times your thread acquires or > re-acquires mutex, or > how small the region the mutex is protecting. as long as current thread > runs too long, > other threads will have higher priorities and the ownership definitely > will be transfered, > though there will be some extra context switchings. David, did you examine or try the small program that I sent before? The "lucky" thread slept for 1 second each time it held mutex. So in total it spent about 8 seconds sleeping and holding the mutex. And the "unlucky" thread, consequently, spent 8 seconds blocked waiting for that mutex. And it didn't get "lucky". Yes, technically the "lucky" thread was not running while holding the mutex, so probably this is why scheduling algorithm didn't immediately work. I did more testing and see that the "unlucky" thread eventually gets a chance (eventually means after very many lock/unlock cycles), but I think that it is penalized too much still. I wonder if with current code it is possible and easy to make this behavior more deterministic. Maybe something like the following: if (oldest_waiter.wait_time < X) do what we do now... else go into kernel for possible switch I have very little idea about unit and value of X. >> I'd rather prefer to have an option to have FIFO fairness in mutex >> lock rather than always avoiding context switch at all costs and >> depending on scheduler to eventually do priority magic. >> > It is better to implement this behavior in your application code, if it > is implemented in thread library, you still can not control how many > times acquiring and re-acquiring can be allowed for a thread without > context switching, a simple FIFO as you said here will cause dreadful > performance problem. I almost agree. But I still wouldn't take your last statement for a fact. "Dreadful performance" - on micro-scale maybe, not necessarily on macro scale. After all, never switching context would be the best performance for a single CPU-bound task, but you wouldn't think that this is the best performance for the whole system. As a data point: it seems that current Linux threading library is not significantly worse than libthr, but my small test program on Fedora 7 works to my expectations. -- Andriy Gapon From avg at icyb.net.ua Thu May 15 20:56:21 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Thu May 15 20:56:30 2008 Subject: thread scheduling at mutex unlock In-Reply-To: References: Message-ID: <482CA372.3000400@icyb.net.ua> on 15/05/2008 22:29 David Schwartz said the following: >> what if you have an infinite number of items on one side and finite >> number on the other, and you want to process them all (in infinite >> time, of course). Would you still try to finish everything on one >> side (the infinite one) or would you try to look at what you have >> on the other side? >> >> I am sorry about fuzzy wording of my original report, I should have >> mentioned "starvation" somewhere in it. > > There is no such thing as a "fair share" when comparing an infinite > quantity to a finite quantity. It is just as sensible to do 1 then 1 > as 10 then 10 or a billion then 1. > > What I would do in this case is work on one side for one timeslice > then the other side for one timeslice, continuuing until either side > was finished, then I'd work exclusively on the other side. This is > precisely the purpose for having timeslices in a scheduler. > > The timeslice is carefully chosen so that it's not so long that you > ignore a side for too long. It's also carefully chosen so that it's > not so short that you spend all your time switching swides. > > What sane schedulers do is assume that you want to make as much > forward progress as quickly as possible. This means getting as many > work units done per unit time as possible. This means as few context > switches as possible. > > A scheduler that switches significantly more often than once per > timeslice with a load like this is *broken*. The purpose of the > timeslice is to place an upper bound on the number of context > switches in cases where forward progress can be made on more than one > process. An ideal scheduler would not switch more often than once per > timeslice unless it could not make further forward progress. > > Real-world schedulers actually may allow one side to pre-empt the > other, and may switch a bit more often than a scheduler that's > "ideal" in the sense described above. This is done in an attempt to > boost interactive performance. > > But your basic assumption that strict alternation is desirable is > massively wrong. That's the *worst* *possible* outcome. David, thank you for the tutorial, it is quite enlightening. But first of all, did you take a look at my small test program? There are 1 second sleeps in it, this is not about timeslices and scheduling at that level at all. This is about basic expectation about fairness of acquiring a lock at macro level. I know that when one thread acquires and releases and reacquires a mutex during 10 seconds while the other thread is blocked on that mutex for 10 seconds, then this is not about timeslices. -- Andriy Gapon From avg at icyb.net.ua Thu May 15 21:02:46 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Thu May 15 21:02:49 2008 Subject: thread scheduling at mutex unlock In-Reply-To: References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> Message-ID: <482CA4F3.6090501@icyb.net.ua> on 15/05/2008 22:51 Brent Casavant said the following: > On Thu, 15 May 2008, Andriy Gapon wrote: > >> With current libthr behavior the GUI thread would never have a chance to get >> the mutex as worker thread would always be a winner (as my small program >> shows). > > The example you gave indicates an incorrect mechanism being used for the > GUI to communicate with this worker thread. For the behavior you desire, > you need a common condition that lets both the GUI and the work item > generator indicate that there is something for the worker to do, *and* > you need seperate mechanisms for the GUI and work item generator to add > to their respective queues. Brent, that was just an example. Probably a quite bad example. I should only limit myself to the program that I sent and I should repeat that the result that it produces is not what I would call reasonably expected. And I will repeat that I understand that the behavior is not prohibited by standards (well, never letting other threads to run is probably not prohibited either). > Something like this (could be made even better with a little effor): > > struct worker_queues_s { > pthread_mutex_t work_mutex; > struct work_queue_s work_queue; > > pthread_mutex_t gui_mutex; > struct gui_queue_s gui_queue; > > pthread_mutex_t stuff_mutex; > int stuff_todo; > pthread_cond_t stuff_cond; > }; > struct worker_queue_s wq; > > int > main(int argc, char *argv[]) { > // blah blah > init_worker_queue(&wq); > // blah blah > } > > void > gui_callback(...) { > // blah blah > > // Set up GUI message > > pthread_mutex_lock(&wq.gui_mutex); > // Add GUI message to queue > pthread_mutex_unlock(&wq.gui_mutex); > > pthread_mutex_lock(&wq.stuff_mutex); > wq.stuff_todo++; > pthread_cond_signal(&wq.stuff_cond); > pthread_mutex_unlock(&wq.stuff_mutex); > > // blah blah > } > > void* > work_generator_thread(void*) { > // blah blah > > while (1) { > // Set up work to do > > pthread_mutex_lock(&wq.work_mutex); > // Add work item to queue > pthread_mutex_unlock(&wq.work_mutex); > > pthread_mutex_lock(&wq.stuff_mutex); > wq.stuff_todo++; > pthread_cond_signal(&wq.stuff_cond); > pthread_mutex_unlock(&wq.stuff_mutex); > } > > // blah blah > } > > void* > worker_thread(void* arg) { > // blah blah > > while (1) { > // Wait for there to be something to do > pthread_mutex_lock(&wq.stuff_mutex); > while (wq.stuff_todo < 1) { > pthread_cond_wait(&wq.stuff_cond, > &wq.stuff_mutex); > } > pthread_mutex_unlock(&wq.stuff_mutex); > > // Handle GUI messages > pthread_mutex_lock(&wq.gui_mutex); > while (!gui_queue_empty(&wq.gui_queue) { > // dequeue and process GUI messages > pthread_mutex_lock(&wq.stuff_mutex); > wq.stuff_todo--; > pthread_mutex_unlock(&wq.stuff_mutex); > } > pthread_mutex_unlock(&wq.gui_mutex); > > // Handle work items > pthread_mutex_lock(&wq.work_mutex); > while (!work_queue_empty(&wq.work_queue)) { > // dequeue and process work item > pthread_mutex_lock(&wq.stuff_mutex); > wq.stuff_todo--; > pthread_mutex_unlock(&wq.stuff_mutex); > } > pthread_mutex_unlock(&wq.work_mutex); > } > > // blah blah > } > > This should accomplish what you desire. Caution that I haven't > compiled, run, or tested it, but I'm pretty sure it's a solid > solution. > > The key here is unifying the two input sources (the GUI and work queues) > without blocking on either one of them individually. The value of > (wq.stuff_todo < 1) becomes a proxy for the value of > (gui_queue_empty(...) && work_queue_empty(...)). > > I hope that helps, > Brent > -- Andriy Gapon From b.j.casavant at ieee.org Thu May 15 22:23:55 2008 From: b.j.casavant at ieee.org (Brent Casavant) Date: Thu May 15 22:24:06 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482CA4F3.6090501@icyb.net.ua> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> <482CA4F3.6090501@icyb.net.ua> Message-ID: On Fri, 16 May 2008, Andriy Gapon wrote: > that was just an example. Probably a quite bad example. > I should only limit myself to the program that I sent and I should repeat that > the result that it produces is not what I would call reasonably expected. And > I will repeat that I understand that the behavior is not prohibited by > standards (well, never letting other threads to run is probably not prohibited > either). Well, I don't know what to tell you at this point. I believe I understand the nature of the problem you're encountering, and I believe there are perfectly workable mechanisms in Pthreads to allow you to accomplish what you desire without depending on implementation-specific details. Yes, it's more work on your part, but if done well it's one-time work. The behavior you desire is useful only in limited situations, and can be implemented at the application level through the use of Pthreads primitives. If Pthreads behaved as you apparently expect, it would be impossible to implement the current behavior at the application level. Queueing mutexes are innappropriate in the majority of code designs. I'll take your word that it is appropriate in your particular case, but that does not make it appropriate for more typical designs. Several solutions have been presented, including one from me. If you choose not to implement such solutions, then best of luck to you. OK, I'm a sucker for punishment. So use this instead of Pthreads mutexes. This should work on both FreeBSD and Linux (FreeBSD has some convenience routines in the sys/queue.h package that Linux doesn't): #include #include struct thread_queue_entry_s { TAILQ_ENTRY(thread_queue_entry_s) tqe_list; pthread_cond_t tqe_cond; pthread_mutex_t tqe_mutex; int tqe_wakeup; }; TAILQ_HEAD(thread_queue_s, thread_queue_entry_s); typedef struct { struct thread_queue_s qm_queue; pthread_mutex_t qm_queue_lock; unsigned int qm_users; } queued_mutex_t; int queued_mutex_init(queued_mutex_t *qm) { TAILQ_INIT(&qm->qm_queue); qm->qm_users = 0; return pthread_mutex_init(&qm->qm_queue_lock, NULL); } int queued_mutex_lock(queued_mutex_t *qm) { struct thread_queue_entry_s waiter; pthread_mutex_lock(&qm->qm_queue_lock); qm->qm_users++; if (1 == qm->qm_users) { /* Nobody was waiting for mutex, we own it. * Fast path out. */ pthread_mutex_unlock(&qm->qm_queue_lock); return 0; } /* There are others waiting for the mutex. Slow path. */ /* Initialize this thread's wait structure */ pthread_cond_init(&waiter->tqe_cond, NULL); pthread_mutex_init(&waiter->tqe_mutex, NULL); pthread_mutex_lock(&waiter->tqe_mutex); waiter->tqe_wakeup = 0; /* Add this thread's wait structure to queue */ TAILQ_INSERT_TAIL(&qm->qm_queue, &waiter, tqe_list); pthread_mutex_unlock(&qm->qm_queue_lock); /* Wait for somebody to hand the mutex to us */ while (!waiter->tqe_wakeup) { pthread_cond_wait(&waiter->tqe_cond, &waiter->tqe_mutex); } /* Destroy this thread's wait structure */ pthread_mutex_unlock(&waiter->tqe_mutex); pthread_mutex_destroy(&waiter->tqe_mutex); pthread_cond_destroy(&waiter->tqe_cond); /* We own the queued mutex (handed to us by unlock) */ return 0; } int queued_mutex_unlock(queued_mutex_t *qm) { struct thread_queue_entry_s *waiter; pthread_mutex_lock(&qm->qm_queue_lock); qm->qm_users--; if (0 == qm->qm_users) { /* No waiters to wake up. Fast path out. */ pthread_mutex_unlock(&qm->qm_queue_lock); return 0; } /* Wake up first waiter. Slow path. */ /* Remove the first waiting thread. */ waiter = qm->qm_queue.tqh_first; TAILQ_REMOVE(&qm->qm_queue, waiter, tqe_list); pthread_mutex_unlock(&qm->qm_queue_lock); /* Wake up the thread. */ pthread_mutex_lock(&waiter->tqe_mutex); waiter->tqe_wakeup = 1; pthread_cond_signal(&waiter->tqe_cond); pthread_mutex_unlock(&waiter->tqe_mutex); return 0; } int queued_mutex_destroy(queued_mutex_t *qm) { pthread_mutex_lock(&qm->qm_queue_lock); if (qm->qm_users > 1) { pthread_mutex_unlock(&qm->qm_queue_lock); return EBUSY; } return pthread_mutex_destroy(&qm->qm_queue_lock); } These queued_mutex_t mutexes should have the behavior you're looking for, and will be portable to any platform with Pthreads and sys/queue.h. Be warned that I haven't compiled, run, or debugged this, but the code should be pretty solid (typos aside). Of course, in production code I'd check a bunch of return values, but those would just get in the way of this illustration. So use something this or change the application's threading model (like my previous post showed). There's no use complaining about the Pthreads implementation in this regard because your application's use of mutexes is the exception, not the rule. The fact that Linux behaves as you expect is irrelevant, as POSIX doesn't speak to this facet of implementation, so both Linux and BSD are correct. Relying on this behavior in Linux is ill-advised as it is non-portable, and likely to break in future releases. Brent -- Brent Casavant Dance like everybody should be watching. www.angeltread.org KD5EMB, EN34lv From davids at webmaster.com Thu May 15 22:37:43 2008 From: davids at webmaster.com (David Schwartz) Date: Thu May 15 22:37:49 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482CA372.3000400@icyb.net.ua> Message-ID: > David, > thank you for the tutorial, it is quite enlightening. > But first of all, did you take a look at my small test program? Yes. It demonstrates the classic example of mutex abuse. A mutex is not an appropriate synchronization mechanism when it's going to be held most of the time and released briefly. > There are 1 second sleeps in it, this is not about timeslices and > scheduling at that level at all. This is about basic expectation about > fairness of acquiring a lock at macro level. I know that when one thread > acquires and releases and reacquires a mutex during 10 seconds while the > other thread is blocked on that mutex for 10 seconds, then this is not > about timeslices. I guess it comes down to what your test program is supposed to test. Threading primitives can always be made to look bad in toy test programs that don't even remotely reflect real-world use cases. No sane person optimizes for such toys. The reason your program behaves the way it does is because the thread that holds the mutex relinquishes the CPU while it holds it. As such, it appears to be very nice and is its dynamic priority level rises. In a real-world case, the threads waiting for the mutex will have their priorities rise while the thread holding the mutex will use up its timeslice working. This is simply not appropriate use of a mutex. It would be absolute foolishness to encumber the platform's default mutex implementation with any attempt to make such abuses do more what you happen to expect them to do. In fact, the behavior I expect is the behavior seen. So the defect is in your unreasonable expectations. The scheduler's goal is to allow the running thread to make forward progress, and it does this perfectly. DS From alfred at freebsd.org Fri May 16 20:35:55 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Fri May 16 20:35:59 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <482BF5EA.5010806@icyb.net.ua> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> Message-ID: <20080516201555.GL32532@elvis.mu.org> [[ -stable removed, additional cc' to interested parties ]] Guys, Andriy's issue appears to actually be severe. The main issue is that we have a starvation situation for heavily contended mutexes. If I am reading the test case, any heavily contented mutex will exhibit this problem. You do NOT need a sleep(3) call to provoke it, you just need the following scenario: thread A: 10: gets mutex does heavy processing (N (99%) of its time) releases mutex does something else (M (1%) of the time) goto 10; thread B...: tries to get mutex... do not care. The way that scheduling works means that you can do a back of a napkin calculation to figure out that this sort of situation will certainly starve threads without some support from the underlying locking mechanism to fix it. Let's say that that "thread A" spends N (99%) of it's time doing "heavy processing" then that means that only M (1%) of the time the scheduler will preempt it at the correct time so that it is not holding the lock. Clearly this will lead to starvation where "thread A" has a 99% chance per-quantum of being left holding the lock when it is preempted. I think it's time that people look into the so-called "working" implementations (linux/solaris) for ways to handle this sort of thing. . <- that's a period. -Alfred From b.j.casavant at ieee.org Fri May 16 21:18:15 2008 From: b.j.casavant at ieee.org (Brent Casavant) Date: Fri May 16 21:18:19 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <20080516201555.GL32532@elvis.mu.org> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> <20080516201555.GL32532@elvis.mu.org> Message-ID: On Fri, 16 May 2008, Alfred Perlstein wrote: > If I am reading the test case, any heavily contented mutex will > exhibit this problem. [snip] > Let's say that that "thread A" spends N (99%) of it's time doing > "heavy processing" then that means that only M (1%) of the time the > scheduler will preempt it at the correct time so that it is not > holding the lock. > > Clearly this will lead to starvation where "thread A" has a > 99% chance per-quantum of being left holding the lock when it > is preempted. You're analysis is completely correct. However, I believe the conclusion that a change is necessary to address the problem is not. Any code which holds a lock over a long period of processing is inherently going to cause "unfairness" in scheduling, just as you observed in your comment. However, this is an artifact of poor application design, not the design of the locking primitive. The correct solution is to re-architecht the code such that the lock is held for much less time. This could mean, for example, holding the lock for just long enough to remove work items from a queue, then performing the time-intensive portion of the operation outside of the lock. The nature of the solution depends on the specific nature of the code, but in the end I would be very surprised if the original complaint could not be solved by rethinking the application's locking mechanisms and usage. Let's pretend, for a moment, that the mutexes in question were changed to utilize a queuing behavior. Now imagine a scenario where Thread A drops and reacquires a particular lock fairly rapidly in a loop, and another Thread B does much the same. A typical example might be two worker threads which are processing requests from a common queue -- something we'll probably see more of as multicore CPUs become common. What happens in this case is Thread A may have been able to process several queue items during its timeslice, but each time it goes to re-acquire the lock it ends blocking until Thread B has a chance to run and cycle through the lock. On a dual-processor system with no other processes of note running (the best scenario) you are, with almost every loop iteration, seeing one thread or the other sit idle until the other has a chance to go through it's loop. In other words, you'll approach a 50% performance reduction in the worst case. It's sort of like traffic control. It's the difference between a stop sign and a traffic light. A stop sign (turning vehicles not considered) is like a queuing mutex -- a single vehicle goes through in one direction, then a single vehicle goes through in the other direction. A traffic light is like a non-queueing mutex -- a large number of vehicles go through in one direction, then a large number of vehicles go through in the other direction. That is, with a traffic light, each direction is allocated a timeslice, and the whole bunch of "work" gets done, instead of unnecessarily wasting time switching back and forth. A traffic light results in more efficient utilization of the contended resource, just as a non-queued mutex does. There's a reason that stop signs get replaced with traffic lights at busy intersections, despire their much higher cost. So, let's get back to the original situation. What we have is a traffic light that's allowing traffic in one direction 99% of the time, and the other direction 1% of the time. _Of_course_ the traffic on the 1% direction is going to encounter delays and starvation; the traffic control mechanism is being used incorrectly! The solution is not to replace it with a stop sign, which admittedly makes everything more "fair", because it comes at the incredibly high price of never allowing the free flow of traffic in the other direction. Other "solutions" are just as bad, for slightly different use cases. You could force a thread yield whenever a contended mutex is released along with a wakeup and immediate schedule of the other thread -- which will have exactly the same problems as the queued mutex, the lock-dropping thread will just context switch at a different point in its execution. I'm sure there are other clever ideas as well, but in the end, no matter what you come up with for a solution, it's going to trade off one set of advantages and disadvantages for another. At least the current design allows you the flexibility (with some additional code, like I posted yesterday) of working however you'd like it to work. A queued or forced-switch design does not, and thus would only serve to hobble the mutex. Fix how the light is used (the application holding the lock too long), don't mess with the light itself (the mutex). Brent -- Brent Casavant Dance like everybody should be watching. www.angeltread.org KD5EMB, EN34lv From deischen at freebsd.org Fri May 16 21:55:32 2008 From: deischen at freebsd.org (Daniel Eischen) Date: Fri May 16 21:55:36 2008 Subject: thread scheduling at mutex unlock In-Reply-To: References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> <20080516201555.GL32532@elvis.mu.org> Message-ID: On Fri, 16 May 2008, Brent Casavant wrote: > So, let's get back to the original situation. What we have is a > traffic light that's allowing traffic in one direction 99% of > the time, and the other direction 1% of the time. _Of_course_ > the traffic on the 1% direction is going to encounter delays > and starvation; the traffic control mechanism is being used > incorrectly! The solution is not to replace it with a stop sign, > which admittedly makes everything more "fair", because it comes > at the incredibly high price of never allowing the free flow of > traffic in the other direction. I don't think this is really the problem - from what I understood, the lock is not held for a long amount of time without being dropped. It is just that more events are arriving and the lock is taken and dropped more often as events (objects, whatever) are added to the queue. It isn't a "take lock, process many events, add each to queue, release lock". This problem may be made more obtuse in the case of SMP. Consider two CPUs, with each thread running on a separate CPU. When one thread releases the lock, sees that it was contested, and reenters the kernel to wakeup the other thread, it immediately returns and beats the other woken thread to the mutex. Even a yield() might not help - the scheduler might want to keep the starving thread on the other CPU. I think to be fair, the contested mutex case should try to handoff the mutex, in lieu of any priority protocol that is in place for the threads or mutex. And actually, I think in order to properly implement priority mutexes, there must be a handoff. I think it would be very hard for the scheduler to solve this problem since it doesn't know the dependencies the threads have on the mutex (or any other userland resource). There are other ways to solve this in the application, something like I posted yesterday, perhaps a priority ceiling mutex with the starving thread having an elevated priority. But unfortunately, I think that only works if the application is running with superuser privileges (with libkse it will/would work without su privileges). -- DE From alfred at freebsd.org Sat May 17 01:50:24 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Sat May 17 01:50:27 2008 Subject: thread scheduling at mutex unlock In-Reply-To: References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> <20080516201555.GL32532@elvis.mu.org> Message-ID: <20080517015023.GM32532@elvis.mu.org> * Daniel Eischen [080516 14:55] wrote: > > I think to be fair, the contested mutex case should try > to handoff the mutex, in lieu of any priority protocol > that is in place for the threads or mutex. And actually, > I think in order to properly implement priority mutexes, > there must be a handoff. > Is this what you are saying? Because it is what I believe. -- - Alfred Perlstein From deischen at freebsd.org Sat May 17 05:11:49 2008 From: deischen at freebsd.org (Daniel Eischen) Date: Sat May 17 05:11:52 2008 Subject: thread scheduling at mutex unlock In-Reply-To: <20080517015023.GM32532@elvis.mu.org> References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org> <482BF5EA.5010806@icyb.net.ua> <20080516201555.GL32532@elvis.mu.org> <20080517015023.GM32532@elvis.mu.org> Message-ID: On Fri, 16 May 2008, Alfred Perlstein wrote: > * Daniel Eischen [080516 14:55] wrote: >> >> I think to be fair, the contested mutex case should try >> to handoff the mutex, in lieu of any priority protocol >> that is in place for the threads or mutex. And actually, >> I think in order to properly implement priority mutexes, >> there must be a handoff. >> > > Is this what you are saying? Because it is what I believe. Yes, I think so. It doesn't seem very fair to give one thread the ability to consistently acquire the mutex when another thread has been waiting for it. -- DE From bugmaster at FreeBSD.org Mon May 19 11:07:01 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon May 19 11:07:59 2008 Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org Message-ID: <200805191107.m4JB70XE011744@freefall.freebsd.org> Current FreeBSD problem reports Critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/76690 threads fork hang in child for -lc_r 1 problem total. Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s bin/32295 threads pthread dont dequeue signals s threa/34536 threads accept() blocks other threads s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/49087 threads Signals lost in programs linked with libc_r o threa/70975 threads [sysvipc] unexpected and unreliable behaviour when usi o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag s threa/76694 threads fork cause hang in dup()/close() function in child (-l o threa/79683 threads svctcp_create() fails if multiple threads call at the o threa/80435 threads panic on high loads o threa/83914 threads [libc] popen() doesn't work in static threaded program s threa/84483 threads problems with devel/nspr and -lc_r on 4.x s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r o threa/101323 threads fork(2) in threaded programs broken. o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/118715 threads kse problem o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 23 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/30464 threads pthread mutex attributes -- pshared s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/69020 threads pthreads library leaks _gc_mutex o threa/79887 threads [patch] freopen() isn't thread-safe o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/122923 threads 'nice' does not prevent background process from steali 11 problems total. From avg at icyb.net.ua Fri May 23 07:09:40 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Fri May 23 07:09:44 2008 Subject: RELENG_7 libthr: _umtx_op -1 errno 60 Operation timed out In-Reply-To: <48299715.8010303@icyb.net.ua> References: <48299715.8010303@icyb.net.ua> Message-ID: <48366DAB.2000200@icyb.net.ua> on 13/05/2008 16:26 Andriy Gapon said the following: > I observe the following issue with some programs (e.g. firefox) from > time. A completely idle program suddenly starts using a lot of CPU and > sometimes memory too. For example, minimized firefox with no pages open. > Restarting the program make the behavior go away. > On one such occasion I ran ktrace on firefox and I saw a lot of messages > like the following in kdump: > 13974 firefox-bin RET _umtx_op -1 errno 60 Operation timed out > Attaching with gdb I also saw a quite strange stack (unfortunately > firefox and the libs are without debug). I didn't save it but it looked > something like the following: > __error() > ?? > ?? > pthread_cond_init() > ?? > > I wonder what could be a cause of this. > Could it be some sort of resource limitation? Here is more details from userland part. I re-built libthr with debugging and here's a typical stack trace for the described above condition: 0x28ad26b3 in _umtx_op_err () at /system/src/lib/libthr/arch/i386/i386/_umtx_op_err.S:36 36 SYSCALL_ERR(_umtx_op) (gdb) bt #0 0x28ad26b3 in _umtx_op_err () at /system/src/lib/libthr/arch/i386/i386/_umtx_op_err.S:36 #1 0x28ad22bd in _thr_ucond_wait (cv=0x8209860, m=0x8209840, timeout=0x0, check_unparking=1) at /system/src/lib/libthr/thread/thr_umtx.c:129 #2 0x28ad0609 in cond_wait_common (cond=0x8208108, mutex=0x820810c, abstime=0x0, cancel=1) at /system/src/lib/libthr/thread/thr_cond.c:204 #3 0x28ad06e8 in __pthread_cond_wait (cond=0x8208108, mutex=0x820810c) at /system/src/lib/libthr/thread/thr_cond.c:228 I am a newbie in this area - what is the difference between _pthread_cond_wait and __pthread_cond_wait? And _X vs __X in general? -- Andriy Gapon From avg at icyb.net.ua Fri May 23 07:28:28 2008 From: avg at icyb.net.ua (Andriy Gapon) Date: Fri May 23 07:28:31 2008 Subject: RELENG_7 libthr: _umtx_op -1 errno 60 Operation timed out In-Reply-To: <48366DAB.2000200@icyb.net.ua> References: <48299715.8010303@icyb.net.ua> <48366DAB.2000200@icyb.net.ua> Message-ID: <4836721A.1030402@icyb.net.ua> on 23/05/2008 10:09 Andriy Gapon said the following: > on 13/05/2008 16:26 Andriy Gapon said the following: >> I observe the following issue with some programs (e.g. firefox) from >> time. A completely idle program suddenly starts using a lot of CPU and >> sometimes memory too. For example, minimized firefox with no pages open. >> Restarting the program make the behavior go away. >> On one such occasion I ran ktrace on firefox and I saw a lot of messages >> like the following in kdump: >> 13974 firefox-bin RET _umtx_op -1 errno 60 Operation timed out >> Attaching with gdb I also saw a quite strange stack (unfortunately >> firefox and the libs are without debug). I didn't save it but it looked >> something like the following: >> __error() >> ?? >> ?? >> pthread_cond_init() >> ?? >> >> I wonder what could be a cause of this. >> Could it be some sort of resource limitation? > > Here is more details from userland part. > I re-built libthr with debugging and here's a typical stack trace for > the described above condition: > 0x28ad26b3 in _umtx_op_err () at > /system/src/lib/libthr/arch/i386/i386/_umtx_op_err.S:36 > 36 SYSCALL_ERR(_umtx_op) > (gdb) bt > #0 0x28ad26b3 in _umtx_op_err () at > /system/src/lib/libthr/arch/i386/i386/_umtx_op_err.S:36 > #1 0x28ad22bd in _thr_ucond_wait (cv=0x8209860, m=0x8209840, > timeout=0x0, check_unparking=1) at > /system/src/lib/libthr/thread/thr_umtx.c:129 > #2 0x28ad0609 in cond_wait_common (cond=0x8208108, mutex=0x820810c, > abstime=0x0, cancel=1) at /system/src/lib/libthr/thread/thr_cond.c:204 > #3 0x28ad06e8 in __pthread_cond_wait (cond=0x8208108, mutex=0x820810c) > at /system/src/lib/libthr/thread/thr_cond.c:228 oops, sorry, the above stack trace is for a good program in cond_wait. I'll try to get one for a program in a bad state once I get it. > I am a newbie in this area - what is the difference between > _pthread_cond_wait and __pthread_cond_wait? And _X vs __X in general? > -- Andriy Gapon From bugmaster at FreeBSD.org Mon May 26 11:06:57 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon May 26 11:07:50 2008 Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org Message-ID: <200805261106.m4QB6umc065063@freefall.freebsd.org> Current FreeBSD problem reports Critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/76690 threads fork hang in child for -lc_r 1 problem total. Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s bin/32295 threads pthread(3) dont dequeue signals s threa/34536 threads accept() blocks other threads s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/49087 threads Signals lost in programs linked with libc_r o threa/70975 threads [sysvipc] unexpected and unreliable behaviour when usi o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag s threa/76694 threads fork cause hang in dup()/close() function in child (-l o threa/79683 threads svctcp_create() fails if multiple threads call at the o threa/80435 threads panic on high loads o threa/83914 threads [libc] popen() doesn't work in static threaded program s threa/84483 threads problems with devel/nspr and -lc_r on 4.x s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r o threa/101323 threads fork(2) in threaded programs broken. o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/118715 threads kse problem o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 23 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- s threa/30464 threads pthread mutex attributes -- pshared s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/69020 threads pthreads library leaks _gc_mutex o threa/79887 threads [patch] freopen() isn't thread-safe o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/122923 threads 'nice' does not prevent background process from steali 11 problems total. From davidxu at freebsd.org Fri May 30 06:40:36 2008 From: davidxu at freebsd.org (David Xu) Date: Fri May 30 06:40:38 2008 Subject: pthread_cleanup_push as a macro Message-ID: <483FA1C0.2010506@freebsd.org> I would like to make pthread_cleanup_push and pthread_cleanup_pop as a pair of macros, the current implementation has to malloc() and free() a pthread_cleanup memory block everytime, this is slow, the new one simply uses stack space, note that other OSes have already done it in this way. The patch keeps old functions and should not have binary compatible problem. http://people.freebsd.org/~davidxu/patch/pthread_cleanup_push.patch David Xu From alfred at freebsd.org Fri May 30 18:41:41 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Fri May 30 18:41:51 2008 Subject: pthread_cleanup_push as a macro In-Reply-To: <483FA1C0.2010506@freebsd.org> References: <483FA1C0.2010506@freebsd.org> Message-ID: <20080530184141.GG48790@elvis.mu.org> * David Xu [080529 23:40] wrote: > I would like to make pthread_cleanup_push and pthread_cleanup_pop as a > pair of macros, the current implementation has to malloc() and free() a > pthread_cleanup memory block everytime, this is slow, the new one > simply uses stack space, note that other OSes have already done it in > this way. The patch keeps old functions and should not have binary > compatible problem. > > http://people.freebsd.org/~davidxu/patch/pthread_cleanup_push.patch Heh, when I had to use QNX on a project this totally confused me, but if others are doing it then go for it. Does Solaris do it? -Alfred From jhb at FreeBSD.org Fri May 30 21:48:45 2008 From: jhb at FreeBSD.org (John Baldwin) Date: Fri May 30 21:48:48 2008 Subject: pthread_cleanup_push as a macro In-Reply-To: <483FA1C0.2010506@freebsd.org> References: <483FA1C0.2010506@freebsd.org> Message-ID: <200805301748.29689.jhb@freebsd.org> On Friday 30 May 2008 02:42:08 am David Xu wrote: > I would like to make pthread_cleanup_push and pthread_cleanup_pop as a > pair of macros, the current implementation has to malloc() and free() a > pthread_cleanup memory block everytime, this is slow, the new one > simply uses stack space, note that other OSes have already done it in > this way. The patch keeps old functions and should not have binary > compatible problem. > > http://people.freebsd.org/~davidxu/patch/pthread_cleanup_push.patch Please do! -- John Baldwin From deischen at freebsd.org Fri May 30 23:05:50 2008 From: deischen at freebsd.org (Daniel Eischen) Date: Fri May 30 23:05:53 2008 Subject: pthread_cleanup_push as a macro In-Reply-To: <200805301748.29689.jhb@freebsd.org> References: <483FA1C0.2010506@freebsd.org> <200805301748.29689.jhb@freebsd.org> Message-ID: On Fri, 30 May 2008, John Baldwin wrote: > On Friday 30 May 2008 02:42:08 am David Xu wrote: >> I would like to make pthread_cleanup_push and pthread_cleanup_pop as a >> pair of macros, the current implementation has to malloc() and free() a >> pthread_cleanup memory block everytime, this is slow, the new one >> simply uses stack space, note that other OSes have already done it in >> this way. The patch keeps old functions and should not have binary >> compatible problem. >> >> http://people.freebsd.org/~davidxu/patch/pthread_cleanup_push.patch > > Please do! I agree - Solaris does this too. I am unsure why you really need a strong_reference - I would prefer something that doesn't require it. -- DE From davidxu at freebsd.org Fri May 30 23:27:00 2008 From: davidxu at freebsd.org (David Xu) Date: Fri May 30 23:27:02 2008 Subject: pthread_cleanup_push as a macro In-Reply-To: References: <483FA1C0.2010506@freebsd.org> <200805301748.29689.jhb@freebsd.org> Message-ID: <48408D5E.2010609@freebsd.org> Daniel Eischen wrote: > On Fri, 30 May 2008, John Baldwin wrote: > >> On Friday 30 May 2008 02:42:08 am David Xu wrote: >>> I would like to make pthread_cleanup_push and pthread_cleanup_pop as a >>> pair of macros, the current implementation has to malloc() and free() a >>> pthread_cleanup memory block everytime, this is slow, the new one >>> simply uses stack space, note that other OSes have already done it in >>> this way. The patch keeps old functions and should not have binary >>> compatible problem. >>> >>> http://people.freebsd.org/~davidxu/patch/pthread_cleanup_push.patch >> >> Please do! > > I agree - Solaris does this too. I am unsure why you really need > a strong_reference - I would prefer something that doesn't require > it. > This becauses original _pthread_cleanup_push and _pthread_cleanup_pop are functions but not weak aliases, it is to keep compatibility. From jhb at freebsd.org Sat May 31 01:49:33 2008 From: jhb at freebsd.org (John Baldwin) Date: Sat May 31 01:49:36 2008 Subject: pthread_cleanup_push as a macro In-Reply-To: <20080530184141.GG48790@elvis.mu.org> References: <483FA1C0.2010506@freebsd.org> <20080530184141.GG48790@elvis.mu.org> Message-ID: <200805302145.49524.jhb@freebsd.org> On Friday 30 May 2008 02:41:41 pm Alfred Perlstein wrote: > * David Xu [080529 23:40] wrote: > > I would like to make pthread_cleanup_push and pthread_cleanup_pop as a > > pair of macros, the current implementation has to malloc() and free() a > > pthread_cleanup memory block everytime, this is slow, the new one > > simply uses stack space, note that other OSes have already done it in > > this way. The patch keeps old functions and should not have binary > > compatible problem. > > > > http://people.freebsd.org/~davidxu/patch/pthread_cleanup_push.patch > > Heh, when I had to use QNX on a project this totally confused > me, but if others are doing it then go for it. > > Does Solaris do it? It's explicitly documented in the standard that push and pop may be implemented as macros and that they have to be paired at the same block level (i.e. it's permitted for push to start a new block and declare a new local variable and for pop to end that block similar to DROP_GIANT/PICKUP_GIANT in the kernel). Internally the thread libraries already do this for internal push/pops to avoid deadlocks. -- John Baldwin From alfred at freebsd.org Sat May 31 01:55:32 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Sat May 31 01:55:36 2008 Subject: pthread_cleanup_push as a macro In-Reply-To: <200805302145.49524.jhb@freebsd.org> References: <483FA1C0.2010506@freebsd.org> <20080530184141.GG48790@elvis.mu.org> <200805302145.49524.jhb@freebsd.org> Message-ID: <20080531015532.GN48790@elvis.mu.org> * John Baldwin [080530 18:49] wrote: > On Friday 30 May 2008 02:41:41 pm Alfred Perlstein wrote: > > * David Xu [080529 23:40] wrote: > > > I would like to make pthread_cleanup_push and pthread_cleanup_pop as a > > > pair of macros, the current implementation has to malloc() and free() a > > > pthread_cleanup memory block everytime, this is slow, the new one > > > simply uses stack space, note that other OSes have already done it in > > > this way. The patch keeps old functions and should not have binary > > > compatible problem. > > > > > > http://people.freebsd.org/~davidxu/patch/pthread_cleanup_push.patch > > > > Heh, when I had to use QNX on a project this totally confused > > me, but if others are doing it then go for it. > > > > Does Solaris do it? > > It's explicitly documented in the standard that push and pop may be > implemented as macros and that they have to be paired at the same block level > (i.e. it's permitted for push to start a new block and declare a new local > variable and for pop to end that block similar to DROP_GIANT/PICKUP_GIANT in > the kernel). Internally the thread libraries already do this for internal > push/pops to avoid deadlocks. I figured it was "OK", just was interested in how Solaris did it. -- - Alfred Perlstein From deischen at freebsd.org Sat May 31 10:36:54 2008 From: deischen at freebsd.org (Daniel Eischen) Date: Sat May 31 10:36:58 2008 Subject: pthread_cleanup_push as a macro In-Reply-To: <48408D5E.2010609@freebsd.org> References: <483FA1C0.2010506@freebsd.org> <200805301748.29689.jhb@freebsd.org> <48408D5E.2010609@freebsd.org> Message-ID: On Sat, 31 May 2008, David Xu wrote: > Daniel Eischen wrote: >> On Fri, 30 May 2008, John Baldwin wrote: >> >>> On Friday 30 May 2008 02:42:08 am David Xu wrote: >>>> I would like to make pthread_cleanup_push and pthread_cleanup_pop as a >>>> pair of macros, the current implementation has to malloc() and free() a >>>> pthread_cleanup memory block everytime, this is slow, the new one >>>> simply uses stack space, note that other OSes have already done it in >>>> this way. The patch keeps old functions and should not have binary >>>> compatible problem. >>>> >>>> http://people.freebsd.org/~davidxu/patch/pthread_cleanup_push.patch >>> >>> Please do! >> >> I agree - Solaris does this too. I am unsure why you really need >> a strong_reference - I would prefer something that doesn't require >> it. >> > > This becauses original _pthread_cleanup_push and _pthread_cleanup_pop are > functions but not weak aliases, it is to keep compatibility. Yes, but _pthread_cleanup_push and _pthread_cleanup_pop are still functions, you are just adding _imp. Are you afraid of _imp being overridden (if it was a weak alias)? I think another tool you could use is __sym_default(__pthread_cleanup_pop_imp, _pthread_cleanup_pop, FBSD_1.1) -- DE