8-STABLE won't boot with ZFSv28

Holger Kipp holger.kipp at alogis.com
Fri Jun 3 10:51:22 UTC 2011


Hi all,

as yesterday was a bank holiday in Germany I wasn't in the office to
try the patch linked in the email.
Is it consent that I should try the patch located here:

>>>
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/ata-intel.c.diff?r1=1.25;r2=1.26

and report the result? Or do you need some additional discussion on
this topic? I really don't know much about ata-intel chipset programming
interface things, that's why I'm asking :-)

Best regards,
Holger

on 02.06.2011 10:37, Alexander Motin wrote:
> Jeremy Chadwick wrote:
>> On Thu, Jun 02, 2011 at 09:53:58AM +0300, Alexander Motin wrote:
>>> Holger Kipp wrote:
>>>> got the same messages over and over again - panic took some time:
>>>>
>>>> unknown: WARNING - ATAPI_IDENTIFY requeued due to channel reset LBA=0
>>>> ata0: reinit done ..
>>>> ata0: reiniting channel ..
>>>> ata0: DISCONNECT requested
>>>>
>>>> <short delay here>
>>>>
>>>> ata0: p0: SATA connect time=0ms status=00000113
>>>> ata0: p1: SATA connect timeout status=00000000
>>>> ata0: reset tp1 mask=03 ostat0=00 ostat1=00
>>>> ata0: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
>>>> ata0: stat1=0x00 err=0x01 lsb=0x14 msb=0xeb
>>>> ata0: reset tp2 stat0=00 stat1=00 devices=0x30000
>>>> unknown: WARNING - ATAPI_IDENTIFY requeued due to channel reset LBA=0
>>>> ata0: reinit done ..
>>>> ata0: reiniting channel ..
>>>> ata0: DISCONNECT requested
>>> I see two problems here:
>>>  1. "devices=0x30000" means that two ATAPI devices were detected instead
>>> of one. I can reproduce it also with other Intel chipsets. It looks like
>>> a hardware bug to me. It can be workarounded by reconnecting ATAPI
>>> device to even (2 or 4) SATA port, or connecting any other device there.
>>>  2. "DISCONNECT requested" means that controller reported PHY status
>>> change for some device on channel, triggering infinite retry. Unluckily
>>> I have no ICH9 board, while I can't reproduce it with ICH10 or above.
>>>
>>> This patch should workaround the first problem in software:
>>> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/ata-intel.c.diff?r1=1.25;r2=1.26
>>> Try it please and let's see if with some luck it do something about the
>>> second problem.
>>
>> With regards to item #1: I don't see anything in the ICH9 errata that
>> indicates a silicon bug if the only device attached to the controller is
>> an ATAPI device and connected to SATA port 0 (presumably), or an
>> odd-numbered port?  If this problem exists on other ICHxx and/or ESBxx
>> chips, I sure would hope it'd be documented.
>>
>> I haven't tried confirming it myself, but if need be I can set up a test
>> box with a SATA-based DVD drive hooked up to it + provide remote serial
>> console/etc. if it'd be of any help.  I don't think it would be (sounds
>> like you have lots of hardware :-) ), but I'm willing to help in any way
>> I can.
> 
> Intel probably don't see issue there, as the same behavior can be found
> even on latest chipsets. But according to my ATA specs understanding and
> real PATA devices behavior analysis, this behavior is not correct. When
> ATAPI device connected to the first of two SATA ports, routed to the
> same legacy-/PATA-emulated ATA channel (master device), soft-reset
> sequence returns false-positive slave ATAPI device presence. Problem
> doesn't expose with ATA disk devices, or if some other device really
> attached to the slave port. Problem looks like it was there always, but
> before ATA_CAM it was not usually noticed, due to very small IDENTIFY
> command timeouts in ata(4).
> 
> If somebody can give better explanation or propose better workaround --
> welcome, as I am not very like this solution.
> 
>> With regards to item #2: could this be at all related to OOB (bit 15)
>> somehow being set in PCS (SATA register offset 0x92)?  I'm doubting it
>> but I thought I'd ask.  My thought process, which is probably wrong
>> (consider it an educational discussion :-) ):
>>
>> The ICH9 specification states that the default value for this register
>> is 0x0000, and b15=0 means "SATA controller will not retry after an OOB
>> failure", while b15=1 causes the controller to indefinitely retry after
>> OOB failure.  I imagine system BIOSes and other things can change this
>> default value, but we don't seem to print it anywhere in
>> ata_intel_chipinit() during a verbose boot.
>>
>> Looking at chipsets/ata-intel.c, it looks like we only touch PCS in
>> ata_intel_chipinit() and ata_intel_reset().  In the former, we avoid
>> touching bits 4 through 15, and in the latter we mask out only what we
>> want to adjust (e.g. the SATA port per ch variable).
> 
> As as I can see, ata_intel.c should not change that bit if it was set
> for some reason. Theoretically, OOB (Out-of-Band signaling) is the
> function of the same state machine which sets that PHY changes status
> flag. But friendly speaking, I have no idea what result can be from
> setting of this bit. In this legacy/PATA emulation mode there are too
> many things not documented to be sure in anything.
> 




More information about the freebsd-stable mailing list