svn commit: r270249 - head/sys/cam/ata

Warner Losh imp at bsdimp.com
Fri Aug 22 13:20:25 UTC 2014


On Aug 21, 2014, at 11:58 PM, Neel Natu <neelnatu at gmail.com> wrote:

> Hi Warner,
> 
> On Thu, Aug 21, 2014 at 10:34 PM, Warner Losh <imp at bsdimp.com> wrote:
>> 
>> On Aug 21, 2014, at 10:31 PM, Neel Natu <neelnatu at gmail.com> wrote:
>> 
>>> Hi Warner,
>>> 
>>> On Wed, Aug 20, 2014 at 3:58 PM, Warner Losh <imp at freebsd.org> wrote:
>>>> Author: imp
>>>> Date: Wed Aug 20 22:58:12 2014
>>>> New Revision: 270249
>>>> URL: http://svnweb.freebsd.org/changeset/base/270249
>>>> 
>>>> Log:
>>>> Turns out that IDENTIFY DEVICE and IDENTIFY PACKET DEVICE return data
>>>> that's only mostly similar. Specifically word 78 bits are defined for
>>>> IDENTIFY DEVICE as
>>>>       5 Supports Hardware Feature Control
>>>> while a IDENTIFY PACKET DEVICE defines them as
>>>>       5 Asynchronous notification supported
>>>> Therefore, only pay attention to bit 5 when we're talking to ATAPI
>>>> devices (we don't use the hardware feature control at this time).
>>>> Ignore it for ATA devices. Remove kludge that papered over this issue
>>>> for Samsung SATA SSDs, since Micron drives also have the bit set and
>>>> the error was caused by this bad interpretation of the spec (which is
>>>> quite easy to do, since bits aren't normally overlapping like this).
>>>> 
>>>> Modified:
>>>> head/sys/cam/ata/ata_xpt.c
>>>> 
>>>> Modified: head/sys/cam/ata/ata_xpt.c
>>>> ==============================================================================
>>>> --- head/sys/cam/ata/ata_xpt.c  Wed Aug 20 22:39:26 2014        (r270248)
>>>> +++ head/sys/cam/ata/ata_xpt.c  Wed Aug 20 22:58:12 2014        (r270249)
>>>> @@ -458,12 +458,18 @@ negotiate:
>>>>                   0, 0x02);
>>>>               break;
>>>>       case PROBE_SETAN:
>>>> -               /* Remember what transport thinks about AEN. */
>>>> -               if (softc->caps & CTS_SATA_CAPS_H_AN)
>>>> +               /*
>>>> +                * Only ATAPI defines this bit to mean AEN, but remember
>>>> +                * what transport thinks about AEN.
>>>> +                */
>>>> +               if ((softc->caps & CTS_SATA_CAPS_H_AN) &&
>>>> +                   periph->path->device->protocol == PROTO_ATAPI)
>>>>                       path->device->inq_flags |= SID_AEN;
>>>>               else
>>>>                       path->device->inq_flags &= ~SID_AEN;
>>>>               xpt_async(AC_GETDEV_CHANGED, path, NULL);
>>>> +               if (periph->path->device->protocol != PROTO_ATAPI)
>>>> +                       break;
>>>>               cam_fill_ataio(ataio,
>>>>                   1,
>>>>                   probedone,
>>>> @@ -750,14 +756,6 @@ out:
>>>>                       goto noerror;
>>>> 
>>>>               /*
>>>> -                * Some Samsung SSDs report supported Asynchronous Notification,
>>>> -                * but return ABORT on attempt to enable it.
>>>> -                */
>>>> -               } else if (softc->action == PROBE_SETAN &&
>>>> -                   status == CAM_ATA_STATUS_ERROR) {
>>>> -                       goto noerror;
>>>> -
>>>> -               /*
>>>>                * SES and SAF-TE SEPs have different IDENTIFY commands,
>>>>                * but SATA specification doesn't tell how to identify them.
>>>>                * Until better way found, just try another if first fail.
>>>> 
>>> 
>>> This change causes a panic for me on boot. Here is the boot log:
>>> 
>>> ahci0: <Intel Patsburg AHCI SATA controller> port
>>> 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f
>>> mem 0xfbb21000-0xfbb217ff irq 18 at device 31.2 on pci0
>>> ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
>>> ahcich0: <AHCI channel> at channel 0 on ahci0
>>> ahcich1: <AHCI channel> at channel 1 on ahci0
>>> ahcich2: <AHCI channel> at channel 2 on ahci0
>>> ahcich3: <AHCI channel> at channel 3 on ahci0
>>> ahcich4: <AHCI channel> at channel 4 on ahci0
>>> ahcich5: <AHCI channel> at channel 5 on ahci0
>>> ahciem0: <AHCI enclosure management bridge> on ahci0
>>> ...
>>> xpt_action_default: CCB type 0xdeadc0de not supported
>>> ...
>>> run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config
>>> run_interrupt_driven_hooks: still waiting after 120 seconds for xpt_config
>>> run_interrupt_driven_hooks: still waiting after 180 seconds for xpt_config
>>> run_interrupt_driven_hooks: still waiting after 240 seconds for xpt_config
>>> run_interrupt_driven_hooks: still waiting after 300 seconds for xpt_config
>>> panic: run_interrupt_driven_config_hooks: waited too long
>>> cpuid = 0
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff81d92920
>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffffff81d929d0
>>> vpanic() at vpanic+0x189/frame 0xffffffff81d92a50
>>> kassert_panic() at kassert_panic+0x139/frame 0xffffffff81d92ac0
>>> boot_run_interrupt_driven_config_hooks() at
>>> boot_run_interrupt_driven_config_hooks+0x111/frame 0xffffffff81d92b50
>>> mi_startup()fffff81d92b70
>>> btext() at btext+0x2c
>>> KDB: enter: panic
>>> [ thread pid 0 tid 100000 ]
>>> Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
>>> db>
>>> 
>>> The peripheral in question is a SATA attached CDROM:
>>> 
>>> % camcontrol devlist
>>> <INTEL SSDSC2CW240A3 400i>         at scbus0 target 0 lun 0 (pass0,ada0)
>>> <ATAPI iHAS524   C LL23>           at scbus2 target 0 lun 0 (cd0,pass1)
>>> <WDC WD1000CHTZ-04JCPV0 04.06A00>  at scbus3 target 0 lun 0 (pass2,ada1)
>>> <Corsair Neutron GTX SSD M306>     at scbus4 target 0 lun 0 (pass3,ada2)
>>> <AHCI SGPIO Enclosure 1.00 0001>   at scbus6 target 0 lun 0 (ses0,pass4)
>>> 
>>> pass1 at ahcich2 bus 0 scbus2 target 0 lun 0
>>> pass1: <ATAPI iHAS524   C LL23> Removable CD-ROM SCSI-0 device
>>> pass1: Serial Number 3524472 2N8225501140
>>> pass1: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
>>> 
>>> The following patch fixes the panic.
>>> 
>>> Index: sys/cam/ata/ata_xpt.c
>>> ===================================================================
>>> --- sys/cam/ata/ata_xpt.c       (revision 270249)
>>> +++ sys/cam/ata/ata_xpt.c       (working copy)
>>> @@ -468,7 +468,8 @@
>>>               else
>>>                       path->device->inq_flags &= ~SID_AEN;
>>>               xpt_async(AC_GETDEV_CHANGED, path, NULL);
>>> -               if (periph->path->device->protocol != PROTO_ATAPI)
>>> +               if (periph->path->device->protocol != PROTO_ATAPI &&
>>> +                   periph->path->device->protocol != PROTO_SCSI)
>>>                       break;
>>>               cam_fill_ataio(ataio,
>>>                   1,
>> 
>> I think the more proper test is == PROTO_ATA elsewhere, since that’s what
>> distinguishes the ATA_IDENTIFY from the ATAPI_IDENTIFY.
>> 
>>> However, there seem to be a couple of issues with the original patch:
>>> 
>>> 1. The 'periph->path->device->protocol' is not initialized to
>>> PROTO_ATAPI anywhere in the tree so the not-equal-to test is  a no-op.
>> 
>> We test here to determine which identify command to send:
>> 
>>                if (periph->path->device->protocol == PROTO_ATA)
>>                        ata_28bit_cmd(ataio, ATA_ATA_IDENTIFY, 0, 0, 0);
>>                else
>>                        ata_28bit_cmd(ataio, ATA_ATAPI_IDENTIFY, 0, 0, 0);
>> 
>> and that is working to send the right command.
>> 
> 
> Yes, but PROTO_ATA != PROTO_ATAPI :-)
> 
> Since we never initialize 'periph->path->device->protocol' to
> 'PROTO_ATAPI' in -current:

But this code appears to:

        case PROBE_RESET:
        {
                int sign = (done_ccb->ataio.res.lba_high << 8) +
                    done_ccb->ataio.res.lba_mid;
                CAM_DEBUG(path, CAM_DEBUG_PROBE,
                    ("SIGNATURE: %04x\n", sign));
                if (sign == 0x0000 &&
                    done_ccb->ccb_h.target_id != 15) {
                        path->device->protocol = PROTO_ATA;
                        PROBE_SET_ACTION(softc, PROBE_IDENTIFY);
                } else if (sign == 0x9669 &&
                    done_ccb->ccb_h.target_id == 15) {
                        /* Report SIM that PM is present. */
                        bzero(&cts, sizeof(cts));
                        xpt_setup_ccb(&cts.ccb_h, path, CAM_PRIORITY_NONE);
                        cts.ccb_h.func_code = XPT_SET_TRAN_SETTINGS;
                        cts.type = CTS_TYPE_CURRENT_SETTINGS;
                        cts.xport_specific.sata.pm_present = 1;
                        cts.xport_specific.sata.valid = CTS_SATA_VALID_PM;
                        xpt_action((union ccb *)&cts);
                        path->device->protocol = PROTO_SATAPM;
                        PROBE_SET_ACTION(softc, PROBE_PM_PID);
                } else if (sign == 0xc33c &&
                    done_ccb->ccb_h.target_id != 15) {
                        path->device->protocol = PROTO_SEMB;
                        PROBE_SET_ACTION(softc, PROBE_IDENTIFY_SES);
                } else if (sign == 0xeb14 &&
                    done_ccb->ccb_h.target_id != 15) {
                        path->device->protocol = PROTO_SCSI;
                        PROBE_SET_ACTION(softc, PROBE_IDENTIFY);
                } else {
                        if (done_ccb->ccb_h.target_id != 15) {
                                xpt_print(path,
                                    "Unexpected signature 0x%04x\n", sign);
                        }
                        goto device_fail;
                }
 
what am I missing?

> if (protocol != PROTO_ATAPI) equates to if (1)
> if (protocol == PROTO_ATAPI) equates to if (0)
> 
> I was trying to say that any code that compares 'protocol' to
> PROTO_ATAPI probably deserves a second look (e.g., the original patch
> that triggered this panic).

Yes, but I think you’re analysis was incorrect on this point :)

>>> 2. It seems not right to break out of switch in 'probestart()' without
>>> providing a way for 'probedone()' to be called. I believe that this
>>> stops the state machine from making forward progress and results in
>>> 'xpt_config()' not completing.
>> 
>> That’s a problem, you’re right. Let me rework.
>> 
>>> If you need more information to debug this some more or test a proper
>>> fix then I am happy to help.
>> 
>> Please try the one included here. I think it will address things. I’ve tried it on one system, and am trying it on others in parallel to sending this.
>> 
> 
> Yup, works fine. Thanks for the quick fix!

Will push it in. Thanks.

Warner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/svn-src-all/attachments/20140822/1430ce43/attachment.sig>


More information about the svn-src-all mailing list