HAST + ZFS: no action on drive failure

Sun Jul 3 15:55:13 UTC 2011

On Sat, 2 Jul 2011 14:43:15 -0700 Timothy Smith wrote:

 TS> Hello Mikolaj,

 TS> So, just to be clear, if a local drive fails in my pool, but the
 TS> corresponding remote drive remains available, then hastd will both write to
 TS> and read from the remote drive? That's really very cool!

Yes.

 TS> I looked more closely at the hastd(8) man page. There is some indication of
 TS> what you say, but not so clear:

 TS> "Read operations (BIO_READ) are handled locally unless I/O error occurs or local
 TS> version of the data is not up-to-date yet (synchronization is in progress)."

This is about READ operations, and for WRITE we have just above:

     Every write, delete and flush operation (BIO_WRITE,
     BIO_DELETE, BIO_FLUSH) is send to local component and synchronously
     replicated to the remote (secondary) node if it is available.

There might be things that should be improved in documetation but I don't feel
capable to do this :-)

 TS> Perhaps this can be modified a bit? Adding, "or the local disk is
 TS> unavailable. In such a case, the I/O operation will be handled by the remote
 TS> resource."

 TS> It does makes sense however, since HAST is base on the idea of raid. This
 TS> feature increases the redundancy of the system greatly. My boss will  be
 TS> very impressed, as am I!

 TS> I did notice however that when the pulled drive is reinserted, I need to
 TS> change the associated hast resource to init, then back to primary to allow
 TS> hastd to once again use it (perhaps the same if the secondary drive is
 TS> failed?). Unless it will do this on it's own after some time? I did not wait
 TS> more than a few minutes. But this is easy enough to script or to monitor the
 TS> log and present a notification to admin at such a time.

When you are reinserting the drive the resource should be in init state.

Remember, some data was updated on secondary only, so the right sequence of
operations could be:

1) Failover (switch primary to init and secondary to primary).

2) Fix the disk issue.

3) If this is a new drive, recreate HAST metadata on it with hastctl utility.

4) Switch the repaired resource to secondary and wait until the new primary
connects to it and updates metadata. After this synchronization is started.

5) You can switch to the previous primary before the synchronization is
complete -- it will continue in right direction, but then you should expect
performance degradation until the synchronization is complete -- the READ
requests will go to remote node. So it might be better to wait until the
synchronization is complete before switching back.

-- 
Mikolaj Golub