HAST: primary might get stuck when there are connectivity problems with secondary

Mikolaj Golub to.my.trociny at gmail.com
Thu Apr 29 08:03:46 UTC 2010


On Wed, 28 Apr 2010 23:46:36 +0200 Pawel Jakub Dawidek wrote:

 PJD> Could you see if the following patch fixes the problem for you:

 PJD>         http://people.freebsd.org/~pjd/patches/hastd_timeout.patch

 PJD> The patch sets timeout on both incoming and outgoing sockets on primary
 PJD> and on outgoing socket on secondary. Incoming socket on secondary is
 PJD> left with no timeout to avoid problem you described above.

The patch works for me.

After disabling the network connection between the primary and the secondary
FS operations on the primary do not get stuck and the following messages are
observed:

Apr 29 10:37:41 hasta hastd: [storage] (primary) Unable to receive reply header: Resource temporarily unavailable.
Apr 29 10:37:57 hasta hastd: [tank] (primary) Unable to receive reply header: Resource temporarily unavailable.
Apr 29 10:37:57 hasta hastd: [tank] (primary) Unable to send request (Resource temporarily unavailable): WRITE(972292096, 14336).
Apr 29 10:38:56 hasta hastd: [storage] (primary) Unable to connect to 172.20.66.202: Operation timed out.
Apr 29 10:39:12 hasta hastd: [tank] (primary) Unable to connect to 172.20.66.202: Operation timed out.

After restoring the network connection the primary reconnects to the secondary
and the status changes back from "degraded" to "complete".

Thank you.

-- 
Mikolaj Golub


More information about the freebsd-fs mailing list