Showstopper ATA bug in 6.1-PRE?
Wilko Bulte
wb at freebie.xs4all.nl
Thu Feb 9 14:08:49 PST 2006
On Thu, Feb 09, 2006 at 09:24:23PM +0100, Sren Schmidt wrote..
> Wilko Bulte wrote:
> >On Thu, Feb 09, 2006 at 03:45:53PM +0100, Sren Schmidt wrote..
> >>Wilko Bulte wrote:
> >>>On Thu, Feb 09, 2006 at 03:37:07PM +0100, Sren Schmidt wrote..
> >>>>Wilko Bulte wrote:
> >>>>>On Wed, Feb 08, 2006 at 10:44:05PM +0100, Sren Schmidt wrote..
> >>>>>>Wilko Bulte wrote:
> >>>>>>>On Wed, Feb 08, 2006 at 10:02:08PM +0100, Sren Schmidt wrote..
> >>>>>>>>Wilko Bulte wrote:
> >>>>>>>>>Hi Soren,
> >>>>>>>>>
> >>>>>>>>>I just went to 6.1-PRE on my main machine, coming from 6.0-STABLE
> >>>>>>>>>of roughly end of december.
> >>>>>>>>>
> >>>>>>>>>And I hit some stuff that really worries me:
> >>>>>>>>>
> >>>>>>>>>- the freshly built kernel keels over with (hand transcribed):
> >>>>>>>>>
> >>>>>>>>>ata3: reiniting channel SATA connect ...
> >>>>>>>>>SATA connected
> >>>>>>>>>sata_connect_devices 0x1 <ATA_MASTER>
> >>>>>>>>>
> >>>>>>>>>ad6: req=0xC35ba0c8 SETFEATURES SETTRANSFERMODE semaphore timeout
> >>>>>>>>>!! DANGER Will RObinson !!
> >>>>>>>>>
> >>>>>>>>>(... is where I cannot read my own handwriting, it scrolled quite
> >>>>>>>>>fast on
> >>>>>>>>>the screen..)
> >>>>>>>>>
> >>>>>>>>>Boot device is a SATA RAID1 on a Promise 2300.
> >>>>>>>>Hmm, that should not happen. Could you try to backstep just ATA to
> >>>>>>>>before the MFC, that is 24/1/06 and let me know if that helps
> >>>>>>>>please ?
> >>>>>>>First impression is that the problem is gone. None of the
> >>>>>>>previously reported errors are seen. I am running a level 0 dump
> >>>>>>>from disk to disk
> >>>>>>>to see if the box remains stable. Given that this is my primary
> >>>>>>>machine
> >>>>>>>I sure hope it will be :-)
> >>>>>>>
> >>>>>>>>>Another snag is that my ad10 disk on 6.0-STABLE suddenly became
> >>>>>>>>>ad12 on
> >>>>>>>>>6.1-PRE
> >>>>>>>>Hmm that is because there is only 2 ports on your promise which is
> >>>>>>>>now correctly identified, before it was errounsly found as 3 ports.
> >>>>>>>Ah, OK. I would suggest a note to the Release Note writers would be
> >>>>>>>a good
> >>>>>>>thing, devices changing location after an upgrade in the -stable
> >>>>>>>branch
> >>>>>>>is unnerving ;-)
> >>>>>>Well, the good thing is that I can reproduce the error here, the bad
> >>>>>>thing is that it slipped through testing on -current...
> >>>>>>Oh, well, I'll look into it ASAP...
> >>>>>Thank you Soren!
> >>>>OK, had a few this afternoon, could you try this patch and let me know
> >>>>if it helps, at least it makes the problem go away on my testbed..
> >>>Is this relative to HEAD or RELENG_6? I cannot / will not go to HEAD
> >>>with this machine (my main production box.. :-)
> >>Doesn't matter, ATA is the same on both...
> >
> >OK, I was not sure if they were 100% identical.
> >
> >The patch at first impression seems to have eliminated the problem.
>
> Good seems I'm on the right track at least.
>
> >Interestingly enough ad10 remained ad10 with the patch applied?
>
> Yeah, thats intentional, I though we better not break POLA here..
I agree :-)
> >I'll put some load on to see what happens.
>
> Let me know how that turns out, I'll clean things up a bit and get it
> committed to -current, then get permission to MFC when we are sure it
> fixes the problem...
I ran a 44GB disk-to-disk dump without incidents (source on the RAID1,
target on the JBOD). No problems whatsoever.
Looks like things behave much better now. Tonight the machine will
run a daily full dump to DLT tape, I'll know how that turns out tomorrow.
thanks,
Wilko
--
Wilko Bulte wilko at FreeBSD.org
More information about the freebsd-stable
mailing list