ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599

Karl Denninger karl at denninger.net
Thu Aug 11 02:28:51 GMT 2005


On Wed, Aug 10, 2005 at 09:47:38PM -0400, Chuck Swiger wrote:
> Karl Denninger wrote:
> >On Thu, Aug 11, 2005 at 12:46:04AM +0200, S?ren Schmidt wrote:
> [ ... ]
> >>I've already gone WAY out of my way to try to support the sii3112,  
> >>and I'm not inclined to waste more of my precious spare time on it.  
> >>However, if it really is that important to enough people to try to  
> >>workaround the silicon bugs (which very likely isn't possible), get  
> >>together and get me failing HW on my desk and time to work on it.
> >
> >Ok, then do the RIGHT THING and document that the SiI chips are declared
> >BROKEN by FreeBSD and likely to cause people trouble - including 
> >irrevocable data corruption.
> >
> >This would have saved me COUNTLESS hours when I first ran into this 
> >issue.  Indeed, it was not until someone else started posting excerpts 
> >from commit logs (months after I filed the PR originally!) that I was 
> >aware FreeBSD developers considered these chipsets "damaged goods".
> 
> Look, Karl, we're all as sorry as we can be that you've spent lots of time 
> on this issue and/or you've had data get corrupted.  You should not rely on 
> that sympathy to be endless.

I'm not asking for ANY sympathy.

I'm asking for the documentation to properly reflect KNOWN problems with
a given chipset or device, rather than burying them.

> FreeBSD attempts to document that it works with common hardware which 
> follows industry standards and is not otherwise broken.  

Good, as far as it goes.

> The information 
> available to me suggests the SiI 3112 is broken.  It has multiple hardware 
> defects involving ATA request-size handling (SIIBUG in ata_sii_allocate() 
> in dev/ata/ata-chipset.c around line ~2300, or what the Linux guys call 
> SIL_QUIRK_MOD15WRITE), and with LBA48 if used with various Seagate drives.

That's all "quirk" stuff - that is, it will work, but perhaps not return the
best performance.

> I've also gotten the impression that the chipset is prone to locking up the 
> entire system under high load, especially under RAID-1 mirroring or other 
> parallel access cases, because it mishandles interrupts or some such.

Now that's an entirely different matter, <AND IT DESERVES MENTION IN THE
RELEASE NOTES OR HARDWARE GUIDE>.

This is particularly true when the "workarounds" were just fine for 4.x, but
suddenly blow chunks under 5.4 and later due to arthitectural changes made
in that driver.

> Given that this is the case, I would be looking to get my money back or a 
> replacement from the vendor who sold me this crappy hardware, far more than 
> I would be looking towards implementing software workarounds which cripple 
> the performance of the system in order to safely work around the hardware 
> errata.

Not the point here.

I agree the hardware is crappy, if the reported issues are correct.

> >Where is fair warning in the hardware compatability guide?
> 
> http://www.freebsd.org/platforms/amd64/motherboards.html ...?

Note the "SiI UNTESTED" lines in that list?

Its not untested.  Its known to be broken!  Again, WHERE IS THE WARNING?

Second, I don't have an AMD system - I have a HT P4.

>From the online man page for ata.4, which is EXPLICITLY referenced as THE
authoritative list of which disk controllers it supports:


The currently supported ATA/SATA controller chips are:

     Acard:	     ATP850P, ATP860A, ATP860R, ATP865A, ATP865R
     ALI:	     Aladdin (ALi5229) compatible chips.
     AMD:	     AMD756, AMD766, AMD768, AMD8111.
     CMD:	     CMD646, CMD648, CMD649.
     Cypress:	     Cypress 82C693.
     Cyrix:	     Cyrix 5530.
     HighPoint:      HPT302, HPT366, HPT366, HPT368, HPT370, HPT371, HPT372,
		     HPT374.
     Intel:	     PIIX, PIIX3, PIIX4, ICH, ICH0, ICH2, ICH3, ICH4, ICH5.
     ITE:	     IT8212F.
     National:	     SC1100.
     nVidia:	     nForce, nForce2, nForce3.
     Promise:	     PDC20246, PDC20262, PDC20263, PDC20265, PDC20267,
		     PDC20268, PDC20269, PDC20270, PDC20271, PDC20275,
		     PDC20276, PDC20277, PDC20318, PDC20319, PDC20371,
		     PDC20375, PDC20376, PDC20377, PDC20378, PDC20379,
		     PDC20617, PDC20618, PDC20619, PDC20620.
     ServerWorks:    ROSB4, CSB5, CSB6.
     Silicon Image:  SiI0680, SiI3112, SiI3114, SiI3512.

Note that last line (I omitted the rest)

Note CLEARLY the absence of any warning that these chipsets are BROKEN, 
either here or anywhere else in that document page.

SUPPORTED means <SUPPORTED>

If you're going to withdraw support, as Soren is apparently doing, then 
by God, WITHDRAW IT!  Get that line out of there and remove the sense data
from the module or make it conditional on a specific option in the kernel.

> >Is it thus necessary for us "mere users" to consider this an issue that 
> >will simply not be addressed?  If so, then just say so up front <AND 
> >DOCUMENT THAT THE SII CHIPSETS DON'T WORK RIGHT.>
> [ ...some 200 (!) lines of ranting at poor Soren deleted :-)... ]

I'm ranting because Soren and the rest of the development and release teams
KNOW this is screwed up BUT CLAIM IT IS A SUPPORTED CHIPSET!

This despite an OPEN PR from <FEBRUARY> that I filed and a whole host of
reports on this list that -STABLE is anything but.

I also have a PR open from about three weeks ago documenting that 6.x is
actually WORSE in this regard.

I understand people missing something they don't know is screwed.  But this
is not a case of lack of knowledge - actual knowledge has been present since
at least February, and yet RIGHT NOW the online documentation still says
this controller is perfectly ok to use, and that it is SUPPORTED, while
Soren just got done disavowing support for it.

> Anyway, yeah, you got it: some SII chipsets don't work right.

Document it and life goes on.

Or fix it and life goes on.

EITHER choice does not lead to MANY people wasting countless hours chasing
ghosts.  I'm not the only one who has done so, as reading here shows.

That SIX MONTHS have gone by since I filed my PR and a ONE LINE EDIT to the 
ata.4 doc page hasn't been made is outrageous.

I'd make the edit myself but lack the permissions necessary to do so.

> FreeBSD tries to compensate; for some people it works OK for what they are 
> doing, and for others it doesn't.  Blow $25 and get a cheap 4-port SATA-150 
> RAID card using something other than a SiI 3112.  Blow $50 and you can even 
> get one from a vendor like Promise or Highpoint that's at least somewhat 
> reputable, and/or provides open source drivers and FreeBSD support for 
> their products.

There is a report here that some of those "other ones" don't work right with
ATA-beyond-4.x too.

I bought a twe card (3ware 8502), which DOES work right, maybe because the
3ware guys actually wrote the driver AND ACTUALLY SUPPORT THEIR OWN STUFF
(how else would they sell boards but to have WORKING drivers?)  I would have 
done that in FEBRUARY if I knew that the SiI chipsets were hopelessly screwed
and that FreeBSD had no intention of fixing it.

Instead, led to believe that people WERE interested in chasing this I put 
in patches at the behest of one developer, reporting back changes through 
a few cycles, and then the interest in chasing this apparently died on the 
FreeBSD developer end.

Look, if it can't be fixed then it can't be fixed.  I accept that.  

Just don't claim that the chipset is supported, and that we should expect to
have it work properly when it is not and it won't!

> If it makes you feel better, submit this as a PR against the docs category:
> 
> --- ata.4~      Tue Apr  5 14:28:00 2005
> +++ ata.4       Wed Aug 10 21:43:05 2005
> @@ -129,7 +129,7 @@
>  .It ServerWorks:
>  ROSB4, CSB5, CSB6.
>  .It Silicon Image:
> -SiI0680, SiI3112, SiI3114, SiI3512.
> +SiI0680, SiI3114, SiI3512.  SiI3112 has hardware errata and may not work.
>  .It SiS:

How about removing the sense data from the module from the base config, make
it conditional,  and note that it must be explicitly enabled by a kernel 
option if you insist on attempting data suicide.

This way nobody ends up with a corrupted disk without FAIR warning.

As an option if Soren WANTS to fix this my sandbox machine remains available
to him - or any other developer - who needs a test bed system that reliably
and repeatedly fails under load.

--
-- 
Karl Denninger (karl at denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net	My home on the net - links to everything I do!
http://scubaforum.org		Your UNCENSORED place to talk about DIVING!
http://genesis3.blogspot.com	Musings Of A Sentient Mind




More information about the freebsd-stable mailing list