zpool degraded - 'UNAVAIL cannot open' functioning drive

Jeremy Chadwick koitsu at FreeBSD.org
Sun Aug 10 07:11:17 UTC 2008


On Fri, Aug 08, 2008 at 08:23:41AM +0400, Andrey V. Elsukov wrote:
> Jeremy Chadwick wrote:
>> In almost every case I've looked at so far, the individuals' chipsets,
>> disks, and overall setup are different.  SMART statistics on the drives
>> show absolutely no sign of errors, or anything that indicates a hardware
>> failure.  Many of the users are using AHCI as well (myself included, and
>> I have seen the DMA error issue myself), which is more reliable than
>> classic IDE.
>
> I have done some work on AHCI part of ATA driver and I am looking
> for testers...
> http://perforce.freebsd.org/changeList.cgi?CMD=changes&FSPC=//depot/user/butcher/src/...

These look quite good.  Regarding change 146184, do you know if this
addresses the problems documented in PR 102211, PR 108924, or what I
described in
http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html ?

>> It would be benefitial if there was some form of sysctl to increase the
>> verbosity from the ATA subsystem when an error happens.  The existing
>> data we get back is terse, and barely useful.  I know for a fact there's
>> more debug information that could be output in such scenarios.  And
>> please do not reply with "good idea, send patches" unless you're wanting
>> to be chewed out.  :-)
>
> Ok, I'll try to add some verbose 'printfs' in my branch in perforce :)

That'd be great.

It appears to me, WRT FreeBSD, that error conditions do not bother to
handle SATA-related errors; everything is assumed to be ATA, so the
extra granularity SATA implements is not available on FreeBSD.

This also starts to enter the realm of why FreeBSD does not implement
support for NCQ -- is this because the ATA driver was built solely
around ATA, rather than AHCI?  Linux appears to have two different
drivers depending upon if you're using AHCI or not.  FreeBSD's ata(4)
code seems to have everything intermixed/jumbled around, so it looks a
lot like spaghetti...  Is this the problem?

>>> I'm going to do some analysis and find out whether I can find any of 
>>> our  systems that may be experiencing ATA errors that don't correlate 
>>> with  what their SMART data is saying. To date I haven't caught any, 
>>> but  that's not to say they may not be happening... just that all of 
>>> the ones  I have caught to date do appear to have been 
>>> hardware-related issues...
>
> IMHO. Today we have many hardware versions and revisions and some of
> them are buggy. But another OSes (windows, linux) work with buggy
> hardware without big problems. Yes, some developers have docs and can
> make workarounds.. I think our ata driver needs new error handling
> subsystem, which can correctly handle errors.

Yep, I understand there is in fact bugs in consumer and commercial-grade
hardware/firmwares.  However, FreeBSD users will want to know if they're
suffering from said bugs, or some other issue.

I'm more than willing to document both scenarios (known buggy hardware
and other bugs which are NOT the result of hardware flaws), but I
(obviously) need data and example output for this.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-fs mailing list