threads/74180: KSE problem. Applications those riched maximum possible threads at a time, would hang on threads join. look at detailed description !

Peter Edwards peadar at freebsd.org
Mon Dec 18 10:40:22 PST 2006


The following reply was made to PR threads/74180; it has been noted by GNATS.

From: Peter Edwards <peadar at freebsd.org>
To: bug-followup at FreeBSD.org,  acs at swamp.homeunix.org
Cc:  
Subject: Re: threads/74180: KSE problem. Applications those riched maximum
 possible threads at a time, would hang on threads join. look at detailed
 description !
Date: Mon, 18 Dec 2006 18:03:34 +0000

 There's some bugs in the posted sample that will indeed cause it to hang 
 unpredictably.
 
 For condition variables, you need to test some condition before 
 sleeping, and the condition needs to be protected by the mutex you 
 release as you go to sleep (this is where they get their name from)
 
 For example, in the case posted, after you start, say, 2000 threads, the 
 main thread may reach the pthread_cond_broadcast() before some subset of 
 those 2000 reach pthread_cond_wait() The broadcast only wakes up those 
 threads that are _currently_ waiting on the condvar, so threads that 
 reach the pthread_cond_wait() after that will hang indefinitely.
 
 So, before going asleep, you need to test if the main thread has already 
 hit the pthread_cond_broadcast(): eg,
 
  >
  > static bool done = false;
  > ...
  > pthread_mutex_lock(&lock);
  > while (!done)
  >     pthread_cond_wait(&WakeThemUp, &lock);
  > pthread_mutex_unlock(&lock);
  >
  > ...
  >
  > done = true;
  > pthread_cond_signal(&WakeThemUp);
 
 Note the "while (cond)" rather than the "if (cond)" around the 
 cond_wait, it's allowed for pthread_cond_wait to return spuriously.
 
 This still leaves a race condition between the assignment of the "done" 
 sentinel with the waking of the condition (ie, between the waiter thread 
 testing "done" and going asleep, "done" is assigned by the waker 
 thread): Generally, you need to hold the mutex while you change the 
 condition that the other threads are waiting on, and signal/broadcast 
 the condvar, so you really need
 
  >
  > pthread_mutex_lock(&lock);
  > done = true;
  > pthread_cond_signal(&WakeThemUp);
  > pthread_mutex_unlock(&lock);
 
 Essentially, condition variables - in conjunction with a mutex - give 
 you the ability to have two threads communicate via some external 
 condition (in this case, just the value of "done": the CV just gives you 
 the ability for a consumer to atomically test that condition and go to 
 sleep if its false, and for a producer to atomically change the value of 
 the condition and wake up the consumer.
 
 I'm not entirely sure why the program only works the first time its 
 invoked, but its likely that the main thread does a lot of work in the 
 kernel on the first iteration, while the resources allocated are 
 available more readilly (as they are recycled)  for successive
 invocations of the test. This would cause the main thread to lag behind 
 those threads it created for the first invocation, but race ahead 
 afterwards.  Note: I'm not saying this _is_ the case, but it's 
 plausible, and serves to indicate why things might not always happen the 
 same way.
 
 


More information about the freebsd-threads mailing list