Hast locking up under 9.2

Mikolaj Golub trociny at FreeBSD.org
Sun Nov 24 17:25:28 UTC 2013


On Sat, Nov 23, 2013 at 11:59:51PM +0200, Mikolaj Golub wrote:

> So I propose:
> 
> 1) Use hio_countdown only for counting components we waiting to
>    complete, i.e. initially it is always 2 for any replication mode.
> 
> 2) To distinguish between "memsync ack" or "memsync fin" responses from 
>    the secondary, add and use hio_memsyncacked field.
> 
> 3) Call write_complete() in component threads _before_ releasing
>    hio_countdown (i.e. before the hio may be returned to the done
>    queue).
> 
> 4) Add and use hio_writecount refcounter to detect when
>    write_complete() should be called in memsync case.
> 
> 5) As hio_done is used only for async, rename it to hio_asyncdone and
>    check/modify outside of more generic write_complete(), only when it
>    is needed.
> 
> Now, write_complete():
>  - for fullsync is called by ggate_send_thread;
>  - for async case -- either by local component thread or by ggate_send_thread;
>  - for memsync case -- by one of the component threads.

I just realized that in the case when the write failed locally I can't
do write_complete() until "memsync fin" is recieved (to get the status
from the secondary), i.e. in this case write_complete() should be
called by ggate_send_thread.

Here is an updated patch:

http://people.freebsd.org/~trociny/patches/hast.primary.c.memsync_write_complete.2.patch

I have reverted (5), so hio_done is used to detect if write_complete
is needed in ggate_send_thread for memsync case too.

-- 
Mikolaj Golub


More information about the freebsd-stable mailing list