da2:ciss1:0:0:0): Periph destroyed

Wed Sep 2 17:23:48 UTC 2015

On 09/02/15 09:30, Per olof Ljungmark wrote:
> Hi,
> 
> Recent 10-STABLE, HP D2600 with 12 SATA drives in RAID10 via a P812
> controller, 7TB capacity as one volume, ZFS.
> 
> If I pull a drive from the array, the following occurs and I am not sure
> about the logic here because the array is still intact and no data loss
> occurs.
> 
> Despite that the volume is gone.
> 
> # zpool clear imap
> cannot clear errors for imap: I/O error
> 
> # zpool online imap da2
> cannot online da2: pool I/O is currently suspended
> 
> Only a reboot helped and then the pool came up just fine, no errors, but
> that is not exactly what you want on a production box.
> 
> Did I miss something?
> 
> Would
> geli_autodetach="NO"
> help?
> 
> syslog output:
> 
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** Hot-plug drive
> removed, Port=1E Box=1 Bay=2 SN=            Z4Z2S9SD
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** Physical drive
> failure, Port=1E Box=1 Bay=2
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** State change, logical
> drive 0, new state=REGENING
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: logical drive 0 (da2)
> changed status OK->interim recovery, spare status 0x21<configured>
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** State change, logical
> drive 0, new state=NEEDS_REBUILD
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: logical drive 0 (da2)
> changed status interim recovery->ready for recovery, spare status
> 0x11<configured,available>
> Sep  2 17:55:19 <kern.crit> str kernel: da2 at ciss1 bus 0 scbus2 target
> 0 lun 0
> Sep  2 17:55:19 <kern.crit> str kernel: da2: <HP RAID 1(1+0) read> s/n
> PAGXQ0BRH1W0WA detached
> Sep  2 17:55:19 <kern.crit> str kernel: (da2:ciss1:0:0:0): Periph destroyed
> Sep  2 17:55:19 <user.notice> str devd: Executing 'logger -p kern.notice
> -t ZFS 'vdev is removed, pool_guid=13539160044045520113
> vdev_guid=1325849881310347579''
> Sep  2 17:55:19 <user.notice> str ZFS: vdev is removed,
> pool_guid=13539160044045520113 vdev_guid=1325849881310347579
> Sep  2 17:55:19 <kern.crit> str kernel: (da2:ciss1:0:0:0): fatal error,
> could not acquire reference count
> Sep  2 17:55:23 <kern.crit> str kernel: ciss1: *** State change, logical
> drive 0, new state=REBUILDING
> Sep  2 17:55:23 <kern.crit> str kernel: ciss1: logical drive 0 (da2)
> changed status ready for recovery->recovering, spare status
> 0x13<configured,rebuilding,available>
> Sep  2 17:55:23 <kern.crit> str kernel: cam_periph_alloc: attempt to
> re-allocate valid device da2 rejected flags 0x18 refcount 1
> Sep  2 17:55:23 <kern.crit> str kernel: daasync: Unable to attach to new
> device due to status 0x6
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
> 

This looks like a bug I introduced at r249170.  Now that I stare deeply
into the abyss of ciss(4), I think the entire change is wrong.

Do you want to try and revert that change from your kernel and rebuild
for a test?  I don't have access to ciss(4) hardware anylonger and
cannot verify.

sean

ref
https://svnweb.freebsd.org/base/head/sys/dev/ciss/ciss.c?r1=249170&r2=249169&pathrev=249170