RAID failure with READ_DMA status=51 - how to avoid again?
odilist at sonic.net
Thu Mar 1 01:02:22 UTC 2007
I would like to RAID my system but am wondering if I am asking for trouble,
given that I got some kind of read failure error followed by file system
corruption the first time I did it. Would it be reasonable for me to try
RAIDing again, and if so, under what conditions? Details are as follows:
I moved my home FreeBSD 6.0 system, which had previously been on a single IDE
drive, onto two SATA drives (set to 3.0 G) in a RAID-1 array, with hardware
raid (Nvidia) on the motherboard (ASUS A8N-E). I used dump as instructed in
the FreeBSD FAQ. This went okay.
I then installed a third, large (400GB) SATA drive and backed up the system on
the RAID (minus /proc, /tmp, and so on) to it using rdiff-backup. This seemed
to go OK.
Then, when I shut down immediately afterwards, I saw this:
Feb 27 08:43:19 bsd kernel: ad8: FAILURE - READ_DMA status=51<READY,DSC,ERROR>
Feb 27 08:43:19 bsd kernel: ar0: WARNING - mirror protection lost. RAID1 array
in DEGRADED mode
Feb 27 08:43:19 bsd kernel: ar0: writing of nVidia MediaShield metadata is NOT
I rebooted, the message from the bios that the RAID was healthy came up, but
FreeBSD said the file system was not healthy, and I had to run fsck about
five times for it to come up clean. The system booted to desktop, crashed
after about ten seconds, rebooted, and turned up with a dirty filesytem
I have since dismantled RAID, removed one of the SATA drives, fsck'ed
repeatedly, and then reinstalled KDE, figuring that that as it only crashed
when it had finished loading the desktop, that something might be amiss
there. The system is running again.
All the drives are brand new, as is the cabling. The drives show up in
messages as "SATA150" (is 3.0G not supported in FreeBSD?), although the board
supports 3.0G transfer rates. There is an errata sheet in the motherboard
manual with a matrix indicating on which drive, given multiple SATA drives,
the OS should be installed. It's silent on why this is advised and on the
subject of the proper order if RAID is involved. Extended offline SMART test
on the current drive with smartctl completed without error and overall-health
self-assessment test result: PASSED. Thanks in advance for any advice.
More information about the freebsd-questions