Showstopper ATA bug in 6.1-PRE?

Søren Schmidt sos at deepcore.dk
Thu Feb 9 12:24:39 PST 2006


Wilko Bulte wrote:
> On Thu, Feb 09, 2006 at 03:45:53PM +0100, Sren Schmidt wrote..
>> Wilko Bulte wrote:
>>> On Thu, Feb 09, 2006 at 03:37:07PM +0100, Sren Schmidt wrote..
>>>> Wilko Bulte wrote:
>>>>> On Wed, Feb 08, 2006 at 10:44:05PM +0100, Sren Schmidt wrote..
>>>>>> Wilko Bulte wrote:
>>>>>>> On Wed, Feb 08, 2006 at 10:02:08PM +0100, Sren Schmidt wrote..
>>>>>>>> Wilko Bulte wrote:
>>>>>>>>> Hi Soren,
>>>>>>>>>
>>>>>>>>> I just went to 6.1-PRE on my main machine, coming from 6.0-STABLE
>>>>>>>>> of roughly end of december.
>>>>>>>>>
>>>>>>>>> And I hit some stuff that really worries me:
>>>>>>>>>
>>>>>>>>> - the freshly built kernel keels over with (hand transcribed):
>>>>>>>>>
>>>>>>>>> ata3: reiniting channel SATA connect ... 
>>>>>>>>> SATA connected
>>>>>>>>> sata_connect_devices 0x1 <ATA_MASTER>
>>>>>>>>>
>>>>>>>>> ad6: req=0xC35ba0c8 SETFEATURES SETTRANSFERMODE semaphore timeout 
>>>>>>>>> !! DANGER Will RObinson !!
>>>>>>>>>
>>>>>>>>> (... is where I cannot read my own handwriting, it scrolled quite 
>>>>>>>>> fast on
>>>>>>>>> the screen..)
>>>>>>>>>
>>>>>>>>> Boot device is a SATA RAID1 on a Promise 2300.
>>>>>>>> Hmm, that should not happen. Could you try to backstep just ATA to 
>>>>>>>> before the MFC, that is 24/1/06 and let me know if that helps please ?
>>>>>>> First impression is that the problem is gone.  None of the previously 
>>>>>>> reported errors are seen.  I am running a level 0 dump from disk to 
>>>>>>> disk
>>>>>>> to see if the box remains stable.  Given that this is my primary 
>>>>>>> machine
>>>>>>> I sure hope it will be :-)
>>>>>>>
>>>>>>>>> Another snag is that my ad10 disk on 6.0-STABLE suddenly became ad12 
>>>>>>>>> on
>>>>>>>>> 6.1-PRE
>>>>>>>> Hmm that is because there is only 2 ports on your promise which is 
>>>>>>>> now correctly identified, before it was errounsly found as 3 ports.
>>>>>>> Ah, OK.  I would suggest a note to the Release Note writers would be a 
>>>>>>> good
>>>>>>> thing, devices changing location after an upgrade in the -stable branch
>>>>>>> is unnerving ;-)
>>>>>> Well, the good thing is that I can reproduce the error here, the bad 
>>>>>> thing is that it slipped through testing on -current...
>>>>>> Oh, well, I'll look into it ASAP...
>>>>> Thank you Soren!
>>>> OK, had a few this afternoon, could you try this patch and let me know 
>>>> if it helps, at least it makes the problem go away on my testbed..
>>> Is this relative to HEAD or RELENG_6?  I cannot / will not go to HEAD
>>> with this machine (my main production box.. :-)
>> Doesn't matter, ATA is the same on both...
> 
> OK, I was not sure if they were 100% identical.
> 
> The patch at first impression seems to have eliminated the problem.

Good seems I'm on the right track at least.

> Interestingly enough ad10 remained ad10 with the patch applied?

Yeah, thats intentional, I though we better not break POLA here..

> I'll put some load on to see what happens.

Let me know how that turns out, I'll clean things up a bit and get it 
committed to -current, then get permission to MFC when we are sure it 
fixes the problem...

-Søren




More information about the freebsd-stable mailing list