hastd: parent got stuck in waitpid()

Pawel Jakub Dawidek pjd at FreeBSD.org
Wed Sep 22 19:10:59 UTC 2010


On Sun, Sep 19, 2010 at 12:57:10PM +0300, Mikolaj Golub wrote:
> Hi,
> 
> When trying to produce the scenario described in another thread (hastd: possible
> race when a worker is starting) I stepped on another issue. I was running the
> following script:
> 
> #!/bin/sh
> 
> for i in `jot 1000`; do
>         hastctl status storage > /dev/null
> done &
> for i in `jot 1000`; do
>         hastctl role init storage
>         hastctl role primary storage
> done
> 
> Parent hastd got stuck but that time when changing the role to init and
> terminating the worker: in waitpid() after sending kill() to the worker. It
> looked like the signal was lost. I don't have a clue how this might happen but
> it is rather easy reproducible in my environment with the script above.

Could you try r213009?

The problem was (I believe) that signal mask was configured after we
forked, so there was a window where signal could have been delivered,
but before we could handled it properly. Now signal mask is configured
in the main process and the primary process inherits it, so there is no
window anymore.

Your test also triggered different bug for me - a descriptor leak, which
is now also fixed.

Thanks for the reports!

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20100922/eb9b0b80/attachment.pgp


More information about the freebsd-fs mailing list