DANGER WILL ROBINSON! SERIOUS problem with current 5.4-PRERELEASE - UPDATE (real this time)

Karl Denninger karl at denninger.net
Wed Mar 30 21:00:48 PST 2005


Ok, here's what I've got so far.

Pulling the SECOND delta both gets rid of the stability problem AND the
requeue fix (e.g. getting rid of that denies the essential purpose of the
deltas in the first place.)

Removing the FIRST delta, which is:

218a219,221
       if (!dumping)
           callout_reset(&request->callout, request->timeout * hz,
                         (timeout_t*)ata_timeout, request);

appears to get rid of the crashes while not harming data integrity OR the
reqeueing.

With this one out the errors (I was able to general over a dozen retries in 
less than 10 minutes doing a large file copy with a 3-disk RAID 1 array 
comprised of 2 SATA disks, 1 UDMA100) still occur, BUT they are retried 
(apparently successfully.)

I copied the source tree to /usr/src2 and took the errors.  I am now
attempting to "buildworld" off it - so far, so good (about 1/4 of the way
through - if there was data corruption it should have failed by now)

Also, the sandbox system is still up.  That also is a major improvement.

I will let this buildworld complete, and if it is successful (proving that
the retried errors didn't actually result in corrupted files!), will put 
this same change (pulling the first delta only) on the production system, 
rebuild the other RAID disks (I had to pull the cartridges from there to 
use them on the sandbox) and see if intentionally provoking the same 
error there allows the system to remain stable once the errors start 
showing up.

Again, I will not have a "final" determination on this until late tomorrow, 
but at first blush pulling the first delta appears to fix the stability 
issue.

Further update tomorrow as soon as I have it....

--
-- 
Karl Denninger (karl at denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net	My home on the net - links to everything I do!
http://scubaforum.org		Your UNCENSORED place to talk about DIVING!
http://www.spamcuda.net		SPAM FREE mailboxes - FREE FOR A LIMITED TIME!
http://genesis3.blogspot.com	Musings Of A Sentient Mind


On Wed, Mar 30, 2005 at 09:08:30PM -0600, Karl Denninger wrote:
> On Tue, Mar 29, 2005 at 11:43:18PM -0600, Karl Denninger wrote:
> > Here's the diff and some thoughts....
> > 
> > Fs:/usr/src/sys/dev/ata> cvs diff -r 1.32.2.5 ata-queue.c
> > Index: ata-queue.c
> > ===================================================================
> > RCS file: /usr/cvs/src/sys/dev/ata/ata-queue.c,v
> > retrieving revision 1.32.2.5
> > retrieving revision 1.32.2.6
> > diff -r1.32.2.5 -r1.32.2.6
> > 30c30
> > < __FBSDID("$FreeBSD: src/sys/dev/ata/ata-queue.c,v 1.32.2.5 2004/10/24 09:27:37 sos Exp $");
> > ---
> > > __FBSDID("$FreeBSD: src/sys/dev/ata/ata-queue.c,v 1.32.2.6 2005/03/23 04:50:26 mdodd Exp $");
> > 218a219,221
> > >       if (!dumping)
> > >           callout_reset(&request->callout, request->timeout * hz,
> > >                         (timeout_t*)ata_timeout, request);
> > 241,243c244,249
> > < 
> > <       /* if reinit succeeded and retries still permit, reinject request */
> > <       if (ata_reinit(ch) && request->retries-- > 0 && request->device->param){
> > ---
> > >       /*
> > >        * if reinit succeeds, retries still permit and device didn't
> > >        * get removed by the reinit, reinject request
> > >        */
> > >       if (!ata_reinit(ch) && request->retries-- > 0
> > >           && request->device->param){
> > 245a252
> > >           request->donecount = 0;
> 
> Removing the second change (changing the test on the "ata_reinit") appears to 
> prevent both the destabilization and the actual requeue from taking place 
> (that is, you get the immediate disconnect from the array when the error 
> occurs; therefore whatever is causing the destabilization doesn't happen.)
> 
> I will attempt to remove the first delta alone (and put back the second), but 
> from a quick perusal of the code I doubt this will make a material change.
> 
> --
> -- 
> Karl Denninger (karl at denninger.net) Internet Consultant & Kids Rights Activist
> http://www.denninger.net	My home on the net - links to everything I do!
> http://scubaforum.org		Your UNCENSORED place to talk about DIVING!
> http://www.spamcuda.net		SPAM FREE mailboxes - FREE FOR A LIMITED TIME!
> http://genesis3.blogspot.com	Musings Of A Sentient Mind
> 
> 
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
> 
> 
> %SPAMBLOCK-SYS: Matched [freebsd], message ok




More information about the freebsd-stable mailing list