Unexpected gmirror behavior: Is this a bug?

Ivan Voras ivoras at freebsd.org
Fri Apr 24 20:13:24 UTC 2009


Ivan Voras wrote:
> Peter Steele wrote:
>> We had a somewhat startling scenario occur with gmirror. We have systems with four drives ad4, ad6, ad8, and ad10, with the OS setup on a mirrored slice across all four drives. The ad4 drive failed at one point, due to a simple bad connection in its drive bay. While it was offline, the system was continued to be used for a while and new data was added to the mirrored file system. 
>>
>> We eventually took the box down to deal with ad4, and tried simply pulling and reinserting the drive. On reboot we saw that the BIOS detected the drive, so that was good. However, when FreeBSD got to the point of starting up the GEOM driver, instead of reinserting ad4 into the more current mirror consisting of ad6/ad8/ad10 and resyncing it with that data, the GEOM driver assumed ad4 was the "good" mirror and ended up resyncing ad6/ad8/ad10 with the data from ad4, causing the new files we had added to those drives to be lost. 
>>
>> This only happens with ad4. If ad6 for example goes offline in the same way, when it is reinserted it does not become the dominant drive and resync its data with the other drives. Rather its data is overwritten with the data from the 3 member mirror, as you'd expect. 
>>
>> So, clearly ad4, the first disk, is treated specially. The question is this a bug or a feature? Is there anyway to prevent this behavior? This would be a disastrous thing to happen in the field on one of our customer systems. 
> 
> This definitely looks like a bug. Try asking again on the freebsd-geom@
> list. Provide output of "gmirror list".
> 
> From what you said it looks like you did the procedure safely - you
> turned off the server, then pulled the drive and reinserted it, then
> turned it on again, right?

Sorry, that was a useless response - what I said should be a no-op.

So, your steps were:

1. ad4, ad6, ad8 and ad10 in a 4-way mirror
2. ad4 fails. At this point did you do a "gmirror list"? I.e. did
gmirror detect it failing? If I read it correctly, the GenID field
should have been increased in this case.
3. The system continues to be used
4. You power it down, take out and reinsert ad4
5. On boot, ad4 is detected, inserted in the mirror but as a "known
good" copy, not a stale one.

Correct?

What version of FreeBSD are you using?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20090424/d7b92b4b/signature.pgp


More information about the freebsd-questions mailing list