threads/74180: KSE problem. Applications those riched maximum
possible threads at a time, would hang on threads join. look at detailed
description !
Peter Edwards
peadar at freebsd.org
Mon Dec 18 10:40:22 PST 2006
The following reply was made to PR threads/74180; it has been noted by GNATS.
From: Peter Edwards <peadar at freebsd.org>
To: bug-followup at FreeBSD.org, acs at swamp.homeunix.org
Cc:
Subject: Re: threads/74180: KSE problem. Applications those riched maximum
possible threads at a time, would hang on threads join. look at detailed
description !
Date: Mon, 18 Dec 2006 18:03:34 +0000
There's some bugs in the posted sample that will indeed cause it to hang
unpredictably.
For condition variables, you need to test some condition before
sleeping, and the condition needs to be protected by the mutex you
release as you go to sleep (this is where they get their name from)
For example, in the case posted, after you start, say, 2000 threads, the
main thread may reach the pthread_cond_broadcast() before some subset of
those 2000 reach pthread_cond_wait() The broadcast only wakes up those
threads that are _currently_ waiting on the condvar, so threads that
reach the pthread_cond_wait() after that will hang indefinitely.
So, before going asleep, you need to test if the main thread has already
hit the pthread_cond_broadcast(): eg,
>
> static bool done = false;
> ...
> pthread_mutex_lock(&lock);
> while (!done)
> pthread_cond_wait(&WakeThemUp, &lock);
> pthread_mutex_unlock(&lock);
>
> ...
>
> done = true;
> pthread_cond_signal(&WakeThemUp);
Note the "while (cond)" rather than the "if (cond)" around the
cond_wait, it's allowed for pthread_cond_wait to return spuriously.
This still leaves a race condition between the assignment of the "done"
sentinel with the waking of the condition (ie, between the waiter thread
testing "done" and going asleep, "done" is assigned by the waker
thread): Generally, you need to hold the mutex while you change the
condition that the other threads are waiting on, and signal/broadcast
the condvar, so you really need
>
> pthread_mutex_lock(&lock);
> done = true;
> pthread_cond_signal(&WakeThemUp);
> pthread_mutex_unlock(&lock);
Essentially, condition variables - in conjunction with a mutex - give
you the ability to have two threads communicate via some external
condition (in this case, just the value of "done": the CV just gives you
the ability for a consumer to atomically test that condition and go to
sleep if its false, and for a producer to atomically change the value of
the condition and wake up the consumer.
I'm not entirely sure why the program only works the first time its
invoked, but its likely that the main thread does a lot of work in the
kernel on the first iteration, while the resources allocated are
available more readilly (as they are recycled) for successive
invocations of the test. This would cause the main thread to lag behind
those threads it created for the first invocation, but race ahead
afterwards. Note: I'm not saying this _is_ the case, but it's
plausible, and serves to indicate why things might not always happen the
same way.
More information about the freebsd-threads
mailing list