zpool degraded - 'UNAVAIL cannot open' functioning drive

Jeremy Chadwick koitsu at FreeBSD.org
Thu Aug 7 07:14:35 UTC 2008


On Thu, Aug 07, 2008 at 10:33:29AM +0400, Andrey V. Elsukov wrote:
> Jeremy Chadwick wrote:
>> Correct, it's a FreeBSD ATA subsystem/driver problem.
>
> I tried 8.0-CURRENT on marvell's, nvida's and intel's controllers.
> Hot plug and attach/detach works on any of these controllers without
> any problems.. What i should to do to get similar problems? :)

I haven't tried CURRENT; I don't track HEAD.  I will work on setting up
another testbed environment at home and repeating my tests on HEAD.
That will take me some time, however.

My test method is very simple, at least in regards to disk removal.
Here's the step-by-step I've used to hit the bugs in question:

http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html

>> My advice at this point in time, because as of today I have officially
>> lost faith in it: avoid ata(4) at all costs.
>
> I tried to contact you some time ago, but didn't receive any
> answers.. Do you still want to resolve your problems with ATA?

Yes, I did receive your mails, but you just wanted to know "if I was
still having problems".  I should have replied, but I did not.  That is
my fault, and for that I apologise.

The issues aren't problems specific to me -- they are affecting a
significant userbase, specifically folks who use servers in production
environments.  But maybe I've misunderstood what you meant by "your
problems" -- my apologies if I have.

But have you looked at my Wiki page, documenting most (but not all) of
the issues?

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

We still don't have an answer to the famous "DMA timeout issue", which
continues to haunt many.  I provided a small analysis in my Wiki, but
the technical justification is over my head -- it needs review from
someone who is familiar with the ATA protocol.  I inteprete the
NID_NOT_FOUND error to mean FreeBSD is asking the disk to r/w to/from an
invalid LBA.  I received one mail from a user (I forget if a mailing
list was CC'd or not -- I need to dig up the mail) who said that in some
cases NID_NOT_FOUND is normal.

The FreeNAS folks reported that increasing the internal ATA command
timeout from 5 seconds to 10 or 15 has helped (FreeNAS users), but those
on FreeBSD who suffer from said timeouts and have tried the patches said
they have made no difference.

That said, I have some questions:

1) Are you trying to tell me that individuals running commercial
services in production environments should run CURRENT?  I don't think
many are willing to do this; I know I'm not, and I can probably speak
for Randy Bush.  ;-)

2) If the issues above were fixed in HEAD, why were none of the PRs
listed in my Wiki updated to reflect that?

3) If the above issues were fixed in HEAD, can you point me to the CVS
commits for them?  Any time I see ATA commits happen in RELENG_7, I
immediately use cvsweb to look at the changes and commit message -- that
means I look at HEAD, RELENG_7, and any other branchpoint.  I haven't
seen anything committed for these issues.

4) If the above issues were actually fixed in HEAD, are there scheduled
plans to MFC the fixes?

I appreciate you taking the time to help track these down and
investigate them, but I feel like you, myself, Scott Long, and the users
are the only ones who care about these issues.  The maintainer is alive
and active, but hasn't said a word, and some of those PRs go untouched
for 2+ years...

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-fs mailing list