hast vs ggate+gmirror sychrnoisation speed

Mikolaj Golub to.my.trociny at gmail.com
Wed Oct 27 19:05:29 UTC 2010


On Tue, 26 Oct 2010 17:01:01 +0100 Pete French wrote:

 PF>  Actually, I just llooked I dmesg on the secondary - it is full
 PF> of messages thus:

 PF> Oct 26 15:44:59 serpentine-passive hastd[10394]: [serp0] (secondary) Unable to receive request header: RPC version wrong.
 PF> Oct 26 15:45:00 serpentine-passive hastd[782]: [serp0] (secondary) Worker process exited ungracefully (pid=10394, exitcode=75).
 PF> Oct 26 15:46:59 serpentine-passive hastd[10421]: [serp0] (secondary) Unable to receive request header: RPC version wrong.
 PF> Oct 26 15:47:04 serpentine-passive hastd[782]: [serp0] (secondary) Worker process exited ungracefully (pid=10421, exitcode=75).

I saw this too but only sporadic messages so I forgot and did not investigate
then this :-).

Now running synchronization I see them too (but again only sporadic). Setting
the assertion and looking at the received header:

(gdb) list
309                     goto fail;
310
311             if (hdr.version != HAST_PROTO_VERSION) {
312                     assert(0);
313                     errno = ERPCMISMATCH;
314                     goto fail;
315             }
316
317             hdr.size = le32toh(hdr.size);
318
(gdb) p/x hdr
$2 = {version = 0x9, size = 0x65657266}

So it looks like garbage.

In hast_proto_send() we send header and then data. Couldn't it be that
remote_send and sync threads interfere and their packets are mixed? May be some
synchronization is needed here?

I set sleep(1) in hast_proto_send() between proto_send(header) and
proto_send(data). The error started to occur frequently.

-- 
Mikolaj Golub


More information about the freebsd-stable mailing list