HAST: primary might get stuck when there are connectivity problems with secondary

Mikolaj Golub to.my.trociny at gmail.com
Fri Apr 23 13:00:12 UTC 2010


On Fri, 23 Apr 2010 08:29:50 +0200 Pawel Jakub Dawidek wrote:

 PJD> On Wed, Apr 21, 2010 at 12:02:16PM +0300, Mikolaj Golub wrote:

 >> I interpret this in the following way. Although the network is down
 >> hast_proto_send() in remote_send_thread() returns success (sent data are
 >> stored in the kernel buffer). Then kernel tries to send data and eventually
 >> fails after timeout and close the socket. hastd is not aware about this,
 >> remote_send_thread() is blocked in "Taking request" at this time, sync thread
 >> is waiting for status from the secondary about sent data but secondary does
 >> not send it because it did not receive any data.

 PJD> "Taking request" only means that HAST is waiting for a request. In case
 PJD> of ggate_send thread it means it is waiting for I/O request from the
 PJD> kernel. It stops here, because there is no new request.

 >> Restarting hastd on the secondary usually helps. A workaround is to set
 >> net.inet.tcp.keepidle to some small value (e.g. 300 sec) on the
 >> secondary. Then the secondary will notice much earlier that the peer has
 >> closed the connection and will restart the worker itself:
 >> 
 >> Apr 21 11:52:21 hastb hastd: [storage] (secondary) Unable to receive request header: Connection reset by peer.
 >> Apr 21 11:52:21 hastb hastd: [storage] (secondary) Worker process (pid=1398) exited ungracefully: status=19200.

 PJD> What you are observing here is because currently there is no keep-alive
 PJD> mechanism in hastd, so if there are no I/O requests, primary won't
 PJD> notice that secondary is down until next I/O request arrives.
 PJD> I don't consider it critical, but confusing. Note that when secondary
 PJD> goes down and there are no I/O requests to primary, there will be
 PJD> nothing to synchronize, so fast reconnect isn't crucial in this case.
 PJD> We need one I/O request coming to primary to discover that secondary is
 PJD> down and start reconnect loop.

This does not look for me just "confusing" :-). When I wrote that hast had
gotten stuck I meant that all operations with FS above hast (ZFS in my case)
had hanged and this lasted until hastd on the secondary was restarted.

May be I miss something but I think the following happens when there is
network outage:

remote_send_thread():

1) the request is moved to recv queue;

2) the request is sent to secondary calling hast_proto_send();

3) hast_proto_send() returns OK although data are not sent yet (they are
buffered by kernel) so the request is not removed from recv queue and not
moved immediately to done queue as it would be in the case when
hast_proto_send() returned error.

The kernel is trying to send the request for some time, fails by timeout and
close the connection.

remote_recv_thread():

1) takes the request from recv queue;

2) waits for the respond from the secondary in hast_proto_recv_hdr() which
will never come because the secondary has not received the request and has
nothing to reply.

So we have a request or several gotten stuck in in the recv queue. It looks
like FS is waiting for the response on this request(s) sent by
ggate_send_thread() and gets stuck.

Below are logs for a similar example I made today:

The last messages before "outage":

Apr 23 14:10:20 hasta hastd: [storage] (primary) ggate_send: (0x28411c40) Got request.
Apr 23 14:10:20 hasta hastd: [storage] (primary) ggate_send: (0x28411c40) Moving request to the free queue.
Apr 23 14:10:20 hasta hastd: [storage] (primary) ggate_send: Taking request.

During "outage":

Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411c80) Request received from the kernel: WRITE(1925971968, 131072).
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411c80) Moving request to the send queues.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: Taking free request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411cc0) Got free request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411cc0) Waiting for request from the kernel.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411cc0) Request received from the kernel: WRITE(1925348352, 1024).
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411cc0) Moving request to the send queues.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: Taking free request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411c00) Got free request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411c00) Waiting for request from the kernel.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411c00) Request received from the kernel: WRITE(922860544, 1024).
Apr 23 14:10:49 hasta hastd: [storage] (primary) local_send: (0x28411c80) Got request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) remote_send: (0x28411c80) Got request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) remote_send: (0x28411c80) Moving request to the recv queue.
Apr 23 14:10:49 hasta hastd: [storage] (primary) remote_send: Taking request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) remote_send: (0x28411cc0) Got request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) remote_send: (0x28411cc0) Moving request to the recv queue.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411c00) Moving request to the send queues.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: Taking free request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411bc0) Got free request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411bc0) Waiting for request from the kernel.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411bc0) Request received from the kernel: WRITE(1926103040, 131072).
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411bc0) Moving request to the send queues.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: Taking free request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411b80) Got free request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) ggate_recv: (0x28411b80) Waiting for request from the kernel.
Apr 23 14:10:49 hasta hastd: [storage] (primary) local_send: Taking request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) local_send: (0x28411cc0) Got request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) local_send: Taking request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) local_send: (0x28411c00) Got request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) local_send: Taking request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) local_send: (0x28411bc0) Got request.
Apr 23 14:10:49 hasta hastd: [storage] (primary) local_send: Taking request.

During "outage" two requests 0x28411c80 and 0x28411cc0 was received,
remote_send reported that it had succesfully sent the requests and moved them
to recv queue. And the requests got stuck there

(gdb) p hio_recv_list[1].tqh_first
$42 = (struct hio *) 0x28411c80
(gdb) p *hio_recv_list[1].tqh_first->hio_next
$43 = {tqe_next = 0x28411cc0, tqe_prev = 0x284eb240}
(gdb) p *hio_recv_list[1].tqh_first
$44 = {hio_countdown = 1, hio_errors = 0x284c93b0, hio_ggio = {gctl_version = 2, gctl_unit = 0, 
    gctl_seq = 5027, gctl_cmd = 2, gctl_offset = 1925971968, gctl_length = 131072, gctl_data = 0x29144000, 
    gctl_error = 0}, hio_next = 0x284eb810}
(gdb) p *hio_recv_list[1].tqh_first->hio_next->tqe_next
$45 = {hio_countdown = 1, hio_errors = 0x284c93b8, hio_ggio = {gctl_version = 2, gctl_unit = 0, 
    gctl_seq = 5028, gctl_cmd = 2, gctl_offset = 1925348352, gctl_length = 1024, gctl_data = 0x29164000, 
    gctl_error = 0}, hio_next = 0x284eb820}

After this there were no messages at all even after recovering the
connectivity between the hosts.

The backtrace:

Thread 8 (Thread 28404140 (LWP 100041)):
#0  0x2823edd7 in __error () from /lib/libthr.so.3
#1  0x2823e9b8 in __error () from /lib/libthr.so.3
#2  0x284c3520 in ?? ()
#3  0x00000008 in ?? ()
#4  0x00000001 in ?? ()
#5  0x284c3500 in ?? ()
#6  0x00000000 in ?? ()
#7  0x08066480 in ?? ()
#8  0xbfbfeb58 in ?? ()
#9  0x2823d31f in pthread_setcancelstate () from /lib/libthr.so.3
#10 0x2823cbbe in pthread_cond_signal () from /lib/libthr.so.3
#11 0x08058e68 in cv_wait (cv=0x8067e2c, lock=0x8067e28) at synch.h:125
#12 0x0805b74e in cv_timedwait (cv=0x8067e2c, lock=0x8067e28, timeout=0) at synch.h:135
#13 0x0805b71c in guard_thread (arg=0x284cab00) at /usr/src/sbin/hastd/primary.c:1787
#14 0x080581f6 in hastd_primary (res=0x284cab00) at /usr/src/sbin/hastd/primary.c:773
#15 0x0804c4e8 in control_set_role (cfg=0x8066500, nvout=0x284eb0b0, role=2 '\002', res=0x284cab00, 
    name=0x28481442 "storage", no=0) at /usr/src/sbin/hastd/control.c:114
#16 0x0804cd01 in control_handle (cfg=0x8066500) at /usr/src/sbin/hastd/control.c:332
#17 0x0804f06c in main_loop () at /usr/src/sbin/hastd/hastd.c:424
#18 0x0804f3d8 in main (argc=0, argv=0xbfbfee90) at /usr/src/sbin/hastd/hastd.c:520

Thread 7 (Thread 28404280 (LWP 100070)):
#0  0x28344773 in ioctl () from /lib/libc.so.7
#1  0x080588b4 in ggate_recv_thread (arg=0x284cab00) at /usr/src/sbin/hastd/primary.c:894
#2  0x2823428f in pthread_getprio () from /lib/libthr.so.3
#3  0x00000000 in ?? ()

Thread 6 (Thread 284043c0 (LWP 100085)):
#0  0x2823edd7 in __error () from /lib/libthr.so.3
#1  0x2823e9b8 in __error () from /lib/libthr.so.3
#2  0x284c32a0 in ?? ()
#3  0x00000008 in ?? ()
#4  0x00000001 in ?? ()
#5  0x284c3280 in ?? ()
#6  0x00000000 in ?? ()
#7  0x00000000 in ?? ()
#8  0x00000000 in ?? ()
#9  0x2823d31f in pthread_setcancelstate () from /lib/libthr.so.3
#10 0x2823cbbe in pthread_cond_signal () from /lib/libthr.so.3
#11 0x08058e68 in cv_wait (cv=0x284c9080, lock=0x284c9078) at synch.h:125
#12 0x08058f27 in local_send_thread (arg=0x284cab00) at /usr/src/sbin/hastd/primary.c:1032
#13 0x2823428f in pthread_getprio () from /lib/libthr.so.3
#14 0x00000000 in ?? ()

Thread 5 (Thread 28404500 (LWP 100086)):
#0  0x2823edd7 in __error () from /lib/libthr.so.3
#1  0x2823e670 in __error () from /lib/libthr.so.3
#2  0x284300a0 in ?? ()
#3  0x0000000d in ?? ()
#4  0x00000000 in ?? ()
#5  0x00000000 in ?? ()
#6  0x00000000 in ?? ()
---Type <return> to continue, or q <return> to quit---
#7  0x28404500 in ?? ()
#8  0xbf7fcee8 in ?? ()
#9  0x282384b1 in pthread_rwlock_init () from /lib/libthr.so.3
Previous frame identical to this frame (corrupt stack?)

Thread 4 (Thread 28404640 (LWP 100087)):
#0  0x282faa57 in recvfrom () from /lib/libc.so.7
#1  0x28280be2 in recv () from /lib/libc.so.7
#2  0x0805c277 in proto_common_recv (fd=10, data=0xbf6fbf27 "", size=5)
    at /usr/src/sbin/hastd/proto_common.c:78
#3  0x0805d4e0 in tcp4_recv (ctx=0x28417580, data=0xbf6fbf27 "", size=5)
    at /usr/src/sbin/hastd/proto_tcp4.c:325
#4  0x0805bde1 in proto_recv (conn=0x2a3f9180, data=0xbf6fbf27, size=5) at /usr/src/sbin/hastd/proto.c:198
#5  0x0804ddae in hast_proto_recv_hdr (conn=0x2a3f9180, nvp=0xbf6fbf7c) at /usr/src/sbin/hastd/hast_proto.c:298
#6  0x08059ee9 in remote_recv_thread (arg=0x284cab00) at /usr/src/sbin/hastd/primary.c:1282
#7  0x2823428f in pthread_getprio () from /lib/libthr.so.3
#8  0x00000000 in ?? ()

Thread 3 (Thread 28404780 (LWP 100088)):
#0  0x2823edd7 in __error () from /lib/libthr.so.3
#1  0x2823e9b8 in __error () from /lib/libthr.so.3
#2  0x284c34a0 in ?? ()
#3  0x00000008 in ?? ()
#4  0x00000001 in ?? ()
#5  0x284c3480 in ?? ()
#6  0x00000000 in ?? ()
#7  0x28404780 in ?? ()
#8  0xbf5faec8 in ?? ()
#9  0x2823d31f in pthread_setcancelstate () from /lib/libthr.so.3
#10 0x2823cbbe in pthread_cond_signal () from /lib/libthr.so.3
#11 0x08058e68 in cv_wait (cv=0x8067e14, lock=0x8067e10) at synch.h:125
#12 0x0805a3fb in ggate_send_thread (arg=0x284cab00) at /usr/src/sbin/hastd/primary.c:1383
#13 0x2823428f in pthread_getprio () from /lib/libthr.so.3
#14 0x00000000 in ?? ()

Thread 2 (Thread 284048c0 (LWP 100089)):
#0  0x2823edd7 in __error () from /lib/libthr.so.3
#1  0x2823e9b8 in __error () from /lib/libthr.so.3
#2  0x284c31a0 in ?? ()
#3  0x00000008 in ?? ()
#4  0x00000001 in ?? ()
#5  0x284c3180 in ?? ()
#6  0x00000000 in ?? ()
#7  0x00000000 in ?? ()
#8  0xbf4f9ea8 in ?? ()
#9  0x2823d31f in pthread_setcancelstate () from /lib/libthr.so.3
#10 0x2823cbbe in pthread_cond_signal () from /lib/libthr.so.3
#11 0x08058e68 in cv_wait (cv=0x8067e20, lock=0x8067e1c) at synch.h:125
#12 0x0805a7bc in sync_thread (arg=0x284cab00) at /usr/src/sbin/hastd/primary.c:1472
#13 0x2823428f in pthread_getprio () from /lib/libthr.so.3
#14 0x00000000 in ?? ()

Thread 1 (Thread 28404a00 (LWP 100090)):
#0  0x282faa55 in recvfrom () from /lib/libc.so.7
#1  0x28280be2 in recv () from /lib/libc.so.7
---Type <return> to continue, or q <return> to quit---
#2  0x0805c277 in proto_common_recv (fd=9, data=0xbf3f8f47 "*", size=5)
    at /usr/src/sbin/hastd/proto_common.c:78
#3  0x0805c69e in sp_recv (ctx=0x284eb190, data=0xbf3f8f47 "*", size=5)
    at /usr/src/sbin/hastd/proto_socketpair.c:177
#4  0x0805bde1 in proto_recv (conn=0x284eb170, data=0xbf3f8f47, size=5) at /usr/src/sbin/hastd/proto.c:198
#5  0x0804ddae in hast_proto_recv_hdr (conn=0x284eb170, nvp=0xbf3f8f80) at /usr/src/sbin/hastd/hast_proto.c:298
#6  0x0804ce27 in ctrl_thread (arg=0x284cab00) at /usr/src/sbin/hastd/control.c:373
#7  0x2823428f in pthread_getprio () from /lib/libthr.so.3
#8  0x00000000 in ?? ()
#0  0x282faa57 in recvfrom () from /lib/libc.so.7

 PJD> Could you confirm this is what you are observing? If not, what FreeBSD
 PJD> version do you use? If it is HEAD, do you have r205738?

Yes I read the thread started by Kevin Day :-) and this was the first thing I
checked. I am using CURRENT.

-- 
Mikolaj Golub


More information about the freebsd-fs mailing list