Weird issue with hastd(8)

Mon Jun 27 15:50:09 UTC 2011

On Sat, Jun 25, 2011 at 05:54:13PM +0300, Mikolaj Golub wrote:
> For me the idea to send updates to secondary only via
> synchronization thread, starting it periodically looks
> interesting. Sure it should not be the replacement for "real"
> async mode, but having something like this in hast apart other
> synchronization modes might be useful.
> 
> Comparing it with "real" async  that is described in manual it has
> the following advantages:
> 
> 1) It is much easier to implement.
> 
> 2) If you have frequent updates of the same blocks, "real" async
> will send them all, while with sync thread approach we will skip
> many intermediate updates.

I must say I don't agree with your points here. We should not implement
one more replication mode, because it is easier to implement. Imagine
situation when we finally get proper 'async' mode and we will need to
explain to the users the difference between 'async' and 'async2' modes
as "async2 was easier to implement back when we had no async yet, but
for you it does more or less the same". And we will need to keep support
for both of them. If anything, I'd prefer to call it 'async' and then
change underlying algorithm entirely. This will handle users confusion,
but still leaves the need to protocol compatiblity between hastds
implementing older and newer 'async'.

The second argument reveals weakness of this approach. The very
important thing is to keep data consistent when nodes are connected.
By 'consistent' I mean that in every point in time if primary dies,
secondary can start operating - it may have a bit older data in async
mode, but the data will be consistent - you can fsck file system and
start your services. In the way you described no care is taken to move
the data to the secondary node in proper order, ie. some later writes
can be send before earlier writes, because eg. they are placed in lower
extent and if you have primary failure right there, the secondary data
view won't be consistent and your file system will most likely by
corrupt.
In async mode you can skip and combine only consecutive writes.
For example if your queue contains the following writes
(number. offset size):

	1.    0 1024
	2.  512 1024
	3.    0 1024
	4. 4096 1024
	5.    0 1536

You can compress it to:

	2+3.    0 1536
	  4. 4096 1024
	  5.    0 1536

Where we ignore first write entirely and combine writes 2 and 3, but we
cannot simply skip first three writes, only because we have fifth write
that covers them, as there is 4096,1024 request in between.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20110627/4460582c/attachment.pgp