Device nodes(/dev/daX) persists even after removing the device on FreeBSD10.3 Release

Kenneth D. Merry ken at FreeBSD.ORG
Thu Oct 6 15:31:21 UTC 2016


On Thu, Oct 06, 2016 at 20:19:59 +0530, Sumit Saxena wrote:
> Hi,
> 
> While doing some testing on Broadcom/LSI MegaRAID invader controller on
> FreeBSD10.03 release, I am facing  problem of device node /dev/daX not
> getting removed even after device is gone(removed/deleted).
> Setup has multiple Virtual disks(VDs) created behind MegaRAID controller
> and I am running IOs on these VDs. I observed that sometimes VDs are
> deleted using Broadcom's management application but still device nodes of
> these VDs are still present.

This is a known problem.  (At least I know about it...)

> However  the device is not seen in "camcontrol devlist" output. This issue
> is intermittent in nature.
> 
> Please find below some information/data collected from setup-
> 
> 1. Output of camcontrol devlist:
> 
> root at MyBsd:~ # camcontrol devlist
> <SEAGATE ST300MM0026 0003>         at scbus1 target 50 lun 0 (pass0,da0)
> <AHCI SGPIO Enclosure 1.00 0001>   at scbus8 target 0 lun 0 (pass1,ses0)
> <DELL MD1220 1.01>                 at scbus10 target 0 lun 0 (pass2,ses1)
> <DELL MD1220 1.01>                 at scbus10 target 2 lun 0 (ses2,pass3)

If you try 'camcontrol devlist -v' you'll see devices that have been
invalidated.

> 2. See below dev nodes(/dev/daX)-
> 
> root at MyBsd:~ # ls /dev/da*
> /dev/da0	/dev/da0p1	/dev/da0p2	/dev/da0p3	/dev/da1
> /dev/da2	/dev/da37
> 
> da0 is OS drive and rest of these drives "da1" "da2" and "da37" are
> deleted but still dev nodes is there with size 0 bytes. I tried
> rescanning(camcontrol rescan all) but it does not help.
> 
> 
> 3. OS:
> root at MyBsd:~ # uname -a
> FreeBSD MyBsd 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25
> 02:10:02 UTC 2016
> root at releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> 4.  dmesg logs pertaining to /dev/da37-
> 
> Oct  6 03:17:48 MyBsd kernel: da37: mrsas1:0:<Cisco MR9361-8i 4.67>35: s/n
> 00f5fd11568e82871f50d3c408b005060):  detached
> Oct  6 03:17:48 MyBsd kernel: Periph destroyed
> Oct  6 03:17:48 MyBsd kernel: (da37:
> Oct  6 03:17:48 MyBsd kernel: mrsas1:0:
> Oct  6 03:17:48 MyBsd kernel: 36:0): Periph destroyed
> ----
> Oct  6 03:18:05 MyBsd kernel: (da35:mrsas1:0:50:0): got CAM status 0x208
> Oct  6 03:18:05 MyBsd kernel: (da35:mrsas1:0:50:0): fatal error, failed to
> attach to device
> Oct  6 03:18:05 MyBsd kernel: (da37:mrsas1:0:51:0): got CAM status 0x208
> Oct  6 03:18:05 MyBsd kernel: (da37:mrsas1:0:51:0): fatal error, failed to
> attach to device
> ----
> Oct  6 05:08:17 MyBsd kernel: cam_periph_alloc: attempt to re-allocate
> valid device da37 rejected flags 0x18 refcount
> 1------------------------>here probably OS tries to assign da37 to some
> newly created device but looks like da37 still has older reference.
> Oct  6 05:08:17 MyBsd kernel: daasync: Unable to attach to new device due
> to status 0x6
> 
> 
> 5. Device info for da37-
> 
>  root at MyBsd:/var/log # geom disk list da37
> Geom name: da37
> Providers:
> 1. Name: da37
>    Mediasize: 0 (0B)
>    Sectorsize: 0
>    Mode: r0w0e0
>    descr: LSI MR9361-8i
>    lunid: 600605b008c4d3501f8782ab57c759db
>    ident: 00db59c757ab82871f50d3c408b00506
>    rotationrate: unknown
>    fwsectors: 0
>    fwheads: 0
> 
> root at MyBsd:/var/log #
> 
> 6. after this I am able to create/delete VDs but bus scan is triggered
> multiple times still these dev nodes- da2, da3, da37 exists.
> 
> Please let me know if anyone faced this problem. Any pointers/help will be
> very much appreciated.

I had patches under stable/10 (we've been using them at Spectra Logic for a
few years) that fixed the problem, but the devfs component of the changes
wasn't quite right (according to kib@).  In any case, those patches (right
or wrong) don't fully fix the problems on head or stable/11.

I've recently gotten back on trying to get to the bottom of it.

I upstreamed some of the GEOM disk fixes (see changes 302069, 302071,
302087, and 302150) in June.  Only one of those changes (302087) can be
MFCed, and it looks like I forgot to do that.

The symptoms you're seeing are typical, and there is no good work around
for it.  You can either not remove devices, or reboot to clear things up.

The da(4) driver (and all CAM peripheral drivers) have reference counts for
anything outside of the peripheral driver that depends on that peripheral.
Once the device is invalidated, it waits until all of the references are
released to fully go away.

If the device comes back before the old instance is fully cleaned up, we
defer attaching to the device again until all refereneces to the previous
version of it are released.

The most likely source of the problem is either in GEOM or devfs.  It could
be either in 10.3.

>From your standpoint (testing the MegaRAID driver on FreeBSD releases), I
would just move on in your testing and write this up as an OS bug that you
can't fix in your driver.  We'll eventually get it fixed.

Ken
-- 
Kenneth Merry
ken at FreeBSD.ORG


More information about the freebsd-scsi mailing list