da2:ciss1:0:0:0): Periph destroyed

Per olof Ljungmark peo at intersonic.se
Wed Sep 2 19:02:14 UTC 2015


On 2015-09-02 20:40, Sean Bruno wrote:
> 
> 
> On 09/02/15 10:59, Per olof Ljungmark wrote:
>> On 2015-09-02 19:23, Sean Bruno wrote:
>>>
>>>
>>> On 09/02/15 09:30, Per olof Ljungmark wrote:
>>>> Hi,
>>>>
>>>> Recent 10-STABLE, HP D2600 with 12 SATA drives in RAID10 via a
>>>> P812 controller, 7TB capacity as one volume, ZFS.
>>>>
>>>> If I pull a drive from the array, the following occurs and I am
>>>> not sure about the logic here because the array is still intact
>>>> and no data loss occurs.
>>>>
>>>> Despite that the volume is gone.
>>>>
>>>> # zpool clear imap cannot clear errors for imap: I/O error
>>>>
>>>> # zpool online imap da2 cannot online da2: pool I/O is
>>>> currently suspended
>>>>
>>>> Only a reboot helped and then the pool came up just fine, no
>>>> errors, but that is not exactly what you want on a production
>>>> box.
>>>>
>>>> Did I miss something?
>>>>
>>>> Would geli_autodetach="NO" help?
>>>>
>>>> syslog output:
>>>>
>>>> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** Hot-plug
>>>> drive removed, Port=1E Box=1 Bay=2 SN=            Z4Z2S9SD Sep
>>>> 2 17:55:19 <kern.crit> str kernel: ciss1: *** Physical drive 
>>>> failure, Port=1E Box=1 Bay=2 Sep  2 17:55:19 <kern.crit> str
>>>> kernel: ciss1: *** State change, logical drive 0, new
>>>> state=REGENING Sep  2 17:55:19 <kern.crit> str kernel: ciss1:
>>>> logical drive 0 (da2) changed status OK->interim recovery,
>>>> spare status 0x21<configured> Sep  2 17:55:19 <kern.crit> str
>>>> kernel: ciss1: *** State change, logical drive 0, new
>>>> state=NEEDS_REBUILD Sep  2 17:55:19 <kern.crit> str kernel:
>>>> ciss1: logical drive 0 (da2) changed status interim
>>>> recovery->ready for recovery, spare status 
>>>> 0x11<configured,available> Sep  2 17:55:19 <kern.crit> str
>>>> kernel: da2 at ciss1 bus 0 scbus2 target 0 lun 0 Sep  2
>>>> 17:55:19 <kern.crit> str kernel: da2: <HP RAID 1(1+0) read>
>>>> s/n PAGXQ0BRH1W0WA detached Sep  2 17:55:19 <kern.crit> str
>>>> kernel: (da2:ciss1:0:0:0): Periph destroyed Sep  2 17:55:19
>>>> <user.notice> str devd: Executing 'logger -p kern.notice -t ZFS
>>>> 'vdev is removed, pool_guid=13539160044045520113 
>>>> vdev_guid=1325849881310347579'' Sep  2 17:55:19 <user.notice>
>>>> str ZFS: vdev is removed, pool_guid=13539160044045520113
>>>> vdev_guid=1325849881310347579 Sep  2 17:55:19 <kern.crit> str
>>>> kernel: (da2:ciss1:0:0:0): fatal error, could not acquire
>>>> reference count Sep  2 17:55:23 <kern.crit> str kernel: ciss1:
>>>> *** State change, logical drive 0, new state=REBUILDING Sep  2
>>>> 17:55:23 <kern.crit> str kernel: ciss1: logical drive 0 (da2) 
>>>> changed status ready for recovery->recovering, spare status 
>>>> 0x13<configured,rebuilding,available> Sep  2 17:55:23
>>>> <kern.crit> str kernel: cam_periph_alloc: attempt to 
>>>> re-allocate valid device da2 rejected flags 0x18 refcount 1 Sep
>>>> 2 17:55:23 <kern.crit> str kernel: daasync: Unable to attach to
>>>> new device due to status 0x6 
>>>> _______________________________________________ 
>>>> freebsd-scsi at freebsd.org mailing list 
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi To
>>>> unsubscribe, send any mail to
>>>> "freebsd-scsi-unsubscribe at freebsd.org"
>>>>
>>>
>>>
>>> This looks like a bug I introduced at r249170.  Now that I stare
>>> deeply into the abyss of ciss(4), I think the entire change is
>>> wrong.
>>>
>>> Do you want to try and revert that change from your kernel and
>>> rebuild for a test?  I don't have access to ciss(4) hardware
>>> anylonger and cannot verify.
>>>
> 
>> Yes, I can try. The installed rev is 281826 but I assume the change
>> can apply here too? 
>> _______________________________________________ 
>> freebsd-scsi at freebsd.org mailing list 
>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi To
>> unsubscribe, send any mail to
>> "freebsd-scsi-unsubscribe at freebsd.org"
> 
> 
> 
> yeah, I think a "svn merge -c -249170" from /usr/src should do it if
> you are managing your system from svn
> 

Sep  2 20:54:05 <kern.crit> str kernel: ciss1: *** Hot-plug drive
removed, Port=1E Box=1 Bay=3 SN=            W4Z1G4BD
Sep  2 20:54:05 <kern.crit> str kernel: ciss1: *** Physical drive
failure, Port=1E Box=1 Bay=3
Sep  2 20:54:50 <kern.crit> str kernel: ciss1: *** Hot-plug drive
inserted, Port=1E Box=1 Bay=3 SN=     WD-WMC1P0F66XVC
Sep  2 20:54:50 <kern.crit> str kernel: ciss1: *** HP Array Controller
Firmware Ver = 6.64, Build Num = 0


Right, this time it survived, the volume did not detach after reverting.

If this change does not cause any other problems do you think it can go
into -STABLE?

Thanks!

//per


More information about the freebsd-scsi mailing list