gvinum losing state about failed drives
Paul Schenkeveld
fb-geom at psconsult.nl
Sun Mar 12 11:19:08 UTC 2006
Hi,
My hardware:
Intel L440GX+ serverboard, 2x 700MHz P3, 1GB ECC RAM
2x Seagate SCSI 73GB off mainboard SCSI controller
2x add-in Promise ATA133 controller
4x Hitachi 500GB ATA133 disks off the Promise controllers
add-in Intel gigabit ethernet controller
My gvinum config:
12 volumes mirrored across da0 and da1
1 volume 500GB mirrored across ad4 and ad8
1 volume 500GB mirrored across ad6 and ad10
After my 4-STABLE to 6-STABLE upgrade of the first server I had two
occasions where two ATA disks became unavailable because the controller
stopped responding. The first time I lost ad8 and ad10 containing
vol12.p1 and vol13.p1, the second time (after everything was manually
repaired) I lost vol12.p0 and vol13.p0.
When the ATA controller stops, two gvinum drives go down, the plexes
and the subdisks on them go down as well. After a reboot, however,
all drives, plexes and subdisks are up again. By comparing the
plexes by hand (using optimized cmp which still takes 5.5 hours for
500GB) I see that they are not equal, understandably because some
data was updated while one plex was down.
Seems that the failure of a drive and its subdisks is not recorded in
the metadata of the other drives.
I'm now contemplating a rollback of the upgrade as this server has been
down too long already but I'll try to get me a similar setup here to
do more testing.
Regards,
Paul Schenkeveld
More information about the freebsd-geom
mailing list