SATA controller testing update
韓家標 Bill Hacker
askbill at conducive.net
Tue Nov 20 05:11:44 PST 2007
Nathan Butcher wrote:
> Just posting to say that Soren's upcoming patch to fix the Promise SATA
> controller issue seems to work fine for me. I imported and ran 4 drives
> of ZFS on the controller - and no more checksum issues. Tried running
> bonnie++ and had no probelms whatsoever. So far so good. I'll keep my
> ZFS pool on the card for a while in case anything pops up.
> What I have noticed though, is that occasionally my root mounted system
> drive (which is on my JMB363 controller running AHCI, as /dev/ad6)
> occasionally gets randomly dismounted with the latest BETA3. It has
> happened twice now.
> This issue hasn't happened on any other drive on my system (despite
> there being 11 other drives on other controllers).
> I have no idea how to reproduce this issue, and since it takes out my
> main system drive, I can't get any debugging info. All I see first is
> that the drive gets dismounted before the screen fills up and scrolls
> over with messages about missing nodes.
I may have a way to reproduce at least a vaguely similar fault that we've just
started looking at, but on ICH9, not Promise:
GigaByte GA G33-DS3R Core-2 Quad, 2 GB DDR-800, 2 X Toshiba 160 GB 2.5" SATA on
IHC9 as GMIRROR RAID1 'split' gm0 taking in the entire device (ad0 and ad2).
With gm0 in good shape, a cp of a Qemu .img file
- from /dev/mirror/gm0s3d ufs /pub
- to /dev/mirror/gm0s3e ufs /bak/backups
But an inadvertant 'mv' (technically illegal, as it crosses a mount-point)
doesn't throw an error message.
Instead, it unaccountably causes GEOM to shed /dev/ad2 'instantly' from gm0.
Several hoops must be jumped thru to get it back, as /dev/ad2 thereafter reports
as 'not attached'.
In addition to the usual GMIRROR commands to clean house and set up for a
rebuild, I've had to set sysctl kern.geom.debugflags = 16, then do a newfs of
the whole ad2 device, wiping out disklabel et al, then do what gmirror needs,
re-insert, and let it rebuild. Which it does just fine.
I'm about to set up a 'better instrumented' test box to get more specifics, so
nothing further here yet.
I report it only because there is SO little logging that first impression is
that the trigger incident is below the GEOM layer.
This is with 7-BETA1 i386 of 20 October, testing to be started with 7-BETA3 of
last night, but I've got to buy a couple of similar drives first.
Both the initially reporting MB and the test board have IHC9 *and* JMB363, so
will try to reproduce on each controller with 7-BETA3 and 8-<head> before
looking at patches.
More info as I get it.
More information about the freebsd-current