pthreads mostly hang when signal

Unga unga888 at yahoo.com
Wed Nov 9 14:19:20 UTC 2011


Hi all

I have a C program with pthreads.

This program creates thousands of detached pthreads for short jobs at the beginning. They all come and go within the first 10 seconds.


But it has 3 permanently running pthreads.

I signal these 3 permanently running pthreads to stop processing. I send SIGUSR1 for this purpose.

If I don't interrupt these 3 permanently running pthreads, they run without any issue and do their job.

If I send the SIGUSR1 to these 3 permanently running pthreads, they rarely work. That is, these threads immediately receive SIGUSR1 signals.

But when I signal, they mostly hang. That is, these threads don't receive the signal.


Following code fragment shows how I send the signal and wait till they stop:
     LockMutex(threadCountMutex);   
     tCount = threadCount;              
     UnlockMutex(threadCountMutex); 
 
     printf("B4 stop Thread_1. Threads: %d\n", tCount); 
     
     LockMutex(Thread_1varMutex);
    Thread_1var.Thread_1Stopped = 0; // Thread_1 not stopped yet.
     UnlockMutex(Thread_1varMutex);


     if (pthread_kill(tid1, SIGUSR1) != 0) // Send a signal to the thread
        {                               // to stop processing.
         fprintf(stderr, "pthread_kill failed for Thread_1!\n");
         exit(1);
        }
    Delay(25); // Let Thread_1 to settle.


     // Check now whether the Thread_1 thread received the signal and stopped processing.
     for (threadActivateDelay=0 ;threadActivateDelay < threadActivateTimeOut; threadActivateDelay += 50)
     {
      LockMutex(Thread_1varMutex);
      Thread_1Stopped = Thread_1var.Thread_1Stopped;
      UnlockMutex(Thread_1varMutex);
      
      if (Thread_1Stopped)
         break;
      else
         Delay(50); // Let Thread_1 thread to settles. 50ms

      LockMutex(threadCountMutex);   
      tCount = threadCount;              
      UnlockMutex(threadCountMutex); 
     
      printf("Wait till Thread_1 stopped, Threads: %d  Delay: %d\n", tCount, threadActivateDelay+50);     
     }

     printf("Came out of Thread_1 loop, threadActivateDelay: %d, threadActivateTimeOut: %d\n", threadActivateDelay, threadActivateTimeOut); 
     if (threadActivateDelay >= threadActivateTimeOut) // Something is wrong. Thread may be dead.
        {
     fprintf(stderr, "Time out. Thread_1 may be dead!\n");
         exit(1);
        }


Note, Thread_1var.Thread_1Stopped is set to 1 by the Thread_1 once it receive the SIGUSR1.

Result of two runs of the program is as follows (values are in milliseconds):
./prog
B4 stop Thread_1. Threads: 3
Thread_1 cought SIGUSR1
Wait till Thread_1 stopped, Threads: 3  Delay: 50
Came out of Thread_1 loop, threadActivateDelay: 50, threadActivateTimeOut: 3000
B4 stop Thread_2. Threads: 3
Thread_2 cought SIGUSR1
Came out of Thread_2 loop, threadActivateDelay: 0, threadActivateTimeOut: 3000
B4 stop Thread_3. Threads: 3
Wait till Thread_3 stopped, Threads: 3  Delay: 50
Wait till Thread_3 stopped, Threads: 3  Delay: 100
Wait till Thread_3 stopped, Threads: 3  Delay: 150
:
:
Wait till Thread_3 stopped, Threads: 3  Delay: 3000
Came out of Thread_3 loop, threadActivateDelay: 3000, threadActivateTimeOut: 3000
Time out. Thread_3 may be dead!



./prog
B4 stop Thread_1. Threads: 3
Wait till Thread_1 stopped, Threads: 3  Delay: 50
Wait till Thread_1 stopped, Threads: 3  Delay: 100
Wait till Thread_1 stopped, Threads: 3  Delay: 150
:
:
Wait till Thread_1 stopped, Threads: 3  Delay: 3000
Came out of Thread_1 loop, threadActivateDelay: 3000, threadActivateTimeOut: 3000
Time out. Thread_1 may be dead!


I have tested this program on FreeBSD 8.1 and 9.0 RC1, both i386. Different runs hang different threads. Also as I mention earlier, rarely all three threads stop immediately.

My issue is quite similar to the problem: http://security.freebsd.org/advisories/FreeBSD-EN-10:02.sched_ule.asc

But it doesn't freeze the system. 

Increase threadActivateTimeOut to 60000ms also doesn't work once hang.

Please also note, once receive a SIGUSR1, the thread wait on sigwait() till it receive another signal.


So what have I hit with? Is it a programming error in my side or scheduling error or something else?

Appreciate very much if FreeBSD guys could help me to solve this issue.

Many thanks in advance.

Best regards
Unga


More information about the freebsd-stable mailing list