da2:ciss1:0:0:0): Periph destroyed

Wed Sep 2 17:59:38 UTC 2015

On 2015-09-02 19:23, Sean Bruno wrote:
> 
> 
> On 09/02/15 09:30, Per olof Ljungmark wrote:
>> Hi,
>>
>> Recent 10-STABLE, HP D2600 with 12 SATA drives in RAID10 via a P812
>> controller, 7TB capacity as one volume, ZFS.
>>
>> If I pull a drive from the array, the following occurs and I am not sure
>> about the logic here because the array is still intact and no data loss
>> occurs.
>>
>> Despite that the volume is gone.
>>
>> # zpool clear imap
>> cannot clear errors for imap: I/O error
>>
>> # zpool online imap da2
>> cannot online da2: pool I/O is currently suspended
>>
>> Only a reboot helped and then the pool came up just fine, no errors, but
>> that is not exactly what you want on a production box.
>>
>> Did I miss something?
>>
>> Would
>> geli_autodetach="NO"
>> help?
>>
>> syslog output:
>>
>> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** Hot-plug drive
>> removed, Port=1E Box=1 Bay=2 SN=            Z4Z2S9SD
>> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** Physical drive
>> failure, Port=1E Box=1 Bay=2
>> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** State change, logical
>> drive 0, new state=REGENING
>> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: logical drive 0 (da2)
>> changed status OK->interim recovery, spare status 0x21<configured>
>> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** State change, logical
>> drive 0, new state=NEEDS_REBUILD
>> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: logical drive 0 (da2)
>> changed status interim recovery->ready for recovery, spare status
>> 0x11<configured,available>
>> Sep  2 17:55:19 <kern.crit> str kernel: da2 at ciss1 bus 0 scbus2 target
>> 0 lun 0
>> Sep  2 17:55:19 <kern.crit> str kernel: da2: <HP RAID 1(1+0) read> s/n
>> PAGXQ0BRH1W0WA detached
>> Sep  2 17:55:19 <kern.crit> str kernel: (da2:ciss1:0:0:0): Periph destroyed
>> Sep  2 17:55:19 <user.notice> str devd: Executing 'logger -p kern.notice
>> -t ZFS 'vdev is removed, pool_guid=13539160044045520113
>> vdev_guid=1325849881310347579''
>> Sep  2 17:55:19 <user.notice> str ZFS: vdev is removed,
>> pool_guid=13539160044045520113 vdev_guid=1325849881310347579
>> Sep  2 17:55:19 <kern.crit> str kernel: (da2:ciss1:0:0:0): fatal error,
>> could not acquire reference count
>> Sep  2 17:55:23 <kern.crit> str kernel: ciss1: *** State change, logical
>> drive 0, new state=REBUILDING
>> Sep  2 17:55:23 <kern.crit> str kernel: ciss1: logical drive 0 (da2)
>> changed status ready for recovery->recovering, spare status
>> 0x13<configured,rebuilding,available>
>> Sep  2 17:55:23 <kern.crit> str kernel: cam_periph_alloc: attempt to
>> re-allocate valid device da2 rejected flags 0x18 refcount 1
>> Sep  2 17:55:23 <kern.crit> str kernel: daasync: Unable to attach to new
>> device due to status 0x6
>> _______________________________________________
>> freebsd-scsi at freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
>>
> 
> 
> This looks like a bug I introduced at r249170.  Now that I stare deeply
> into the abyss of ciss(4), I think the entire change is wrong.
> 
> Do you want to try and revert that change from your kernel and rebuild
> for a test?  I don't have access to ciss(4) hardware anylonger and
> cannot verify.
> 

Yes, I can try. The installed rev is 281826 but I assume the change can
apply here too?