[Bug 248368] devd MEDIACHANGE events are delivered unreliably for /dev/cdX devices

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 25 Sep 2022 19:31:09 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248368

Warner Losh <imp@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |imp@FreeBSD.org

--- Comment #4 from Warner Losh <imp@FreeBSD.org> ---


cam error = 0xcc. That's CAM_SCSI_STATUS_ERROR   = 0x0c with
        /* The DEV queue is frozen w/this err */
        CAM_DEV_QFRZN           = 0x40,
        /* Autosense data valid for target */
        CAM_AUTOSNS_VALID       = 0x80,
also set. This makes sense.

!system=CAM subsystem=periph type=error device=cd0 serial="VB2-01700376"
cam_status="0xcc" scsi_status=2 scsi_sense="70 02 3a 00" CDB="00 00 00 00 00 00
" 

CDB is a TUR (Test Unit Ready) sent to the drive.

0x70 is the 'error code' for CURRENT ERROR.
0x02 is the sense key. This is SSD_KEY_NOT_READY
asc (additional sense code) is 0x3a
ascq (additional sense code qualifier) is 0x00

0x3a 0x00 is 'media not present'
0x3a 0x01 us 'media not present, tray closed' (seen later)
0x3a 0x02 is 'media not present, tray open'

so this seems legit. We don't report this error anywhere except devd, since we
report all errors (even ignored ones) there.

The cd driver currently will turn off the 'seen media' bit if it gets a 0x3a
asc (regardless of the qualifier) and call the media gone stuff.

So after the media is removed on real hardware, we see:
!system=CAM subsystem=periph type=error device=cd0 serial="VB2-01700376"
cam_status="0xcc" scsi_status=2 scsi_sense="70 02 3a 00" CDB="1e 00 00 00 01 00
" 
!system=CAM subsystem=periph type=error device=cd0 serial="VB2-01700376"
cam_status="0xcc" scsi_status=2 scsi_sense="70 02 3a 00" CDB="25 00 00 00 00 00
00 00 00 00 " 

1E is 'prevent / allow media removal'
25 is 'read capacity'

so it's trying to find the capacity of the media that isn't there, it seems.
But it gives up after 3 retries, and we see the same stream of TUR + media not
present.

So for closing the drive on real hardware:
<--- upon closing the emptied caddy
!system=CAM subsystem=periph type=error device=cd0 serial="KYME48F3455"
cam_status="0xcc" scsi_status=2 scsi_sense="70 02 04 01" CDB="00 00 00 00 00 00
" 
!system=CAM subsystem=periph type=error device=cd0 serial="KYME48F3455"
cam_status="0xcc" scsi_status=2 scsi_sense="70 02 3a 01" CDB="00 00 00 00 00 00
"

ASC of 04/01 is 'Logical Unit is becoming ready' so that makes sense (it takes
the CD drive a while to realize it has or does not have a disc). It's a
transient condition that may, or may not, happen.

ASC of 3a/01 is 'drive closed, no media'

I think that the 'upon opening drive' is two lines too late, since we destroy
the media in the error callback.

So the error is that we don't create/destroy the media for virtualbox, but do
for actual hardware.

The cd code looks like it's getting a ASC/ASCQ 28/00 which is media change. And
it also looks like we never saw media (which is why we never destroy it) so the
'saw media' bit looks like it's getting cleared somewhere else w/o a
corresponding disk_media_gone() callback. That looks like it can happen when
there's an error (any error) when we're trying to do the prevent/allow stuff.
so there may be a different ordering with virtual box that's leading to this
situation.

Not sure I have time to come up with a patch, and it doesn't make 100% sense to
me (so I'm thinking there's some missing information).

-- 
You are receiving this mail because:
You are the assignee for the bug.