gmirror not synced
Jeremy Chadwick
freebsd at jdc.parodius.com
Fri Jan 6 22:43:33 UTC 2012
On Thu, Jan 05, 2012 at 09:56:02AM +0000, Matthew Seaman wrote:
> On 04/01/2012 19:43, Gareth de Vaux wrote:
> > Hi all, I've noticed that the md5 hashes of a couple of files on
> > a gmirror change when I recalculate the hashes. The output usually
> > cycles between 2 hashes per file.
> >
> > I'm guessing this is because each calculation reads the file
> > randomly from 1 of 2 component drives, and the files in question
> > had a few bit flips during their original sync. I also assume
> > this's something you have to live with for gmirror? Is removing
> > and completely rebuilding the secondary drive the only thing you
> > can do (which might fix these bit flips but incur others elsewhere)?
>
> No, that's not something acceptable at all. Randomly flipping bits in
> files is a really nasty failure mode.
>
> What does 'gmirror list' tell you about the state of the gmirror? Is
> there any possibility that your hardware is failing? Check the SMART
> attributes of the disk in the first instance (it isn't brilliant for
> picking up impending failure, but it should be pretty accurate once the
> drive is actually generating errors.) Also try a few passes of
> memtest86 to try and spot problems with RAM. Cleaning dust out of air
> vents and heatsinks and generally making sure the machine is not
> overheating is a good idea too.
Another possibility is a disk with intermittently faulty cache, or a
drive who has basically given up (firmware bug, design flaw, etc.)
honouring ECC[1][2] when reading/writing sectors.
For the former point, SMART statistics from the drives could help
determine if this is the case, but I stress the word "could". This is
usually stored in Attribute 184 ("End-to-End_Error") but is not
available on very many drives.
Gareth, please install ports/sysutils/smartmontools (make sure it's
version 5.42 or newer) and provide output from "smartctl -x /dev/disk"
and I'll review it for you.
[1]: http://www.storagereview.com/guide/error.html
(read all subsections too)
[2]: http://www.dewassoc.com/kbase/hard_drives/hard_disk_sector_structures.htm
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-stable
mailing list