system scope threads entering STOP state
Guy Helmer
ghelmer at palisadesys.com
Wed Aug 17 16:18:10 GMT 2005
Julian Elischer wrote:
> Guy Helmer wrote:
>
>> Julian Elischer wrote:
>>
>>> Guy Helmer wrote:
>>>
>>>> I have a long-running multithreaded process on FreeBSD 5.4 (SMP,
>>>> PREEMTPION, SCHED_4BSD) linked with libpthread and I'm creating the
>>>> threads with attribute PTHREAD_SCOPE_SYSTEM. The threads need to
>>>> be processing input in near-real-time or its input buffers overflow.
>>>>
>>>> I've modified the program so that a thread can fork/execl/waitpid
>>>> (without WNOHANG) to use an external program for further processing
>>>> on a batch of input (sometimes via a pipe, other times via writing
>>>> to a file). However, even under a light input load, the program is
>>>> now dropping input. While running top(1) in thread mode, I
>>>> occasionally find all the program's threads are in the STOP state
>>>> for several consecutive seconds. Is there anything related to the
>>>> frequent use of fork, execve, or wait4 that would be likely to
>>>> cause such a situation? I'm not seeing anything obvious in my
>>>> reading of the kernel sources.
>>>
>>> duirng a fork the parent process is in a variant of the "STOPPED"
>>> state, or, rather, if you
>>> look at top -H you should see that all teh threads except for that
>>> doing the fork, are in
>>> the STOPPED state.
>>>
>>> This is because while a thread is forking the process needs to be
>>> single threaded so that
>>> there is a consistent image to be copied to teh child.
>>>
>>> the single threaded state is also enterred for exit() and execve(),
>>> though that should not affect your program.
>>>
>>> I can't imagine why the state would persist for any length of time,
>>> unless there is another thread
>>> that is in an uninterruptible wait. In that case the other threads
>>> have to wait for it to complete
>>> what it is doing and come back. I have considerred whether such a
>>> thread should not be considerred
>>> "already suspended" and in fact some earlier versions of the code
>>> did that, however it leads to some
>>> inconsistancies and the danger that such a thread will be suspended
>>> holding some resource
>>> that it should not hold for any length of time.
>>
>> Thanks for the explanation. I was [aware] that the other threads
>> would be stopped during a fork(2) but it looked to me like the STOP
>> would be brief.
>> Would an "uninterruptible wait" include system calls like a write(2)
>> of a large buffer? That would explain it...
>
> it's hard to say.. Possibly yes, if it had to allocate buffer space.
> However this is a question for
> others..
>
> Is it possible to duplicate this on request?
[where did the past month go?]
I think I found the culprit - I think the process in question was
actually dumping core and it is a large process - between 50MB and 100MB
- so that would explain the 10+ seconds all the threads were in the STOP
state. It was difficult to notice while running top(1) since a watchdog
process immediately restarts the multi-threaded process if it exits due
to things like segfaults, and I was paying attention to the state
column, not the PID column.
Sorry for what was a bit of a wild-goose chase,
Guy
--
Guy Helmer, Ph.D.
Principal System Architect
Palisade Systems, Inc.
More information about the freebsd-threads
mailing list