da2:ciss1:0:0:0): Periph destroyed

Wed Sep 2 19:05:51 UTC 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 09/02/15 12:02, Per olof Ljungmark wrote:
> On 2015-09-02 20:40, Sean Bruno wrote:
>> 
>> 
>> On 09/02/15 10:59, Per olof Ljungmark wrote:
>>> On 2015-09-02 19:23, Sean Bruno wrote:
>>>> 
>>>> 
>>>> On 09/02/15 09:30, Per olof Ljungmark wrote:
>>>>> Hi,
>>>>> 
>>>>> Recent 10-STABLE, HP D2600 with 12 SATA drives in RAID10 
>>>>> via a P812 controller, 7TB capacity as one volume, ZFS.
>>>>> 
>>>>> If I pull a drive from the array, the following occurs and 
>>>>> I am not sure about the logic here because the array is 
>>>>> still intact and no data loss occurs.
>>>>> 
>>>>> Despite that the volume is gone.
>>>>> 
>>>>> # zpool clear imap cannot clear errors for imap: I/O error
>>>>> 
>>>>> # zpool online imap da2 cannot online da2: pool I/O is 
>>>>> currently suspended
>>>>> 
>>>>> Only a reboot helped and then the pool came up just fine, 
>>>>> no errors, but that is not exactly what you want on a 
>>>>> production box.
>>>>> 
>>>>> Did I miss something?
>>>>> 
>>>>> Would geli_autodetach="NO" help?
>>>>> 
>>>>> syslog output:
>>>>> 
>>>>> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** Hot-plug
>>>>> drive removed, Port=1E Box=1 Bay=2 SN= Z4Z2S9SD Sep 2
>>>>> 17:55:19 <kern.crit> str kernel: ciss1: *** Physical drive
>>>>> failure, Port=1E Box=1 Bay=2 Sep  2 17:55:19 <kern.crit>
>>>>> str kernel: ciss1: *** State change, logical drive 0, new
>>>>> state=REGENING Sep  2 17:55:19 <kern.crit> str kernel:
>>>>> ciss1: logical drive 0 (da2) changed status OK->interim
>>>>> recovery, spare status 0x21<configured> Sep  2 17:55:19
>>>>> <kern.crit> str kernel: ciss1: *** State change, logical
>>>>> drive 0, new state=NEEDS_REBUILD Sep  2 17:55:19 
>>>>> <kern.crit> str kernel: ciss1: logical drive 0 (da2) 
>>>>> changed status interim recovery->ready for recovery, spare 
>>>>> status 0x11<configured,available> Sep  2 17:55:19 
>>>>> <kern.crit> str kernel: da2 at ciss1 bus 0 scbus2 target 0 
>>>>> lun 0 Sep  2 17:55:19 <kern.crit> str kernel: da2: <HP
>>>>> RAID 1(1+0) read> s/n PAGXQ0BRH1W0WA detached Sep  2
>>>>> 17:55:19 <kern.crit> str kernel: (da2:ciss1:0:0:0): Periph
>>>>> destroyed Sep  2 17:55:19 <user.notice> str devd: Executing
>>>>> 'logger -p kern.notice -t ZFS 'vdev is removed, 
>>>>> pool_guid=13539160044045520113 
>>>>> vdev_guid=1325849881310347579'' Sep  2 17:55:19 
>>>>> <user.notice> str ZFS: vdev is removed, 
>>>>> pool_guid=13539160044045520113 
>>>>> vdev_guid=1325849881310347579 Sep  2 17:55:19 <kern.crit> 
>>>>> str kernel: (da2:ciss1:0:0:0): fatal error, could not 
>>>>> acquire reference count Sep  2 17:55:23 <kern.crit> str 
>>>>> kernel: ciss1: *** State change, logical drive 0, new 
>>>>> state=REBUILDING Sep  2 17:55:23 <kern.crit> str kernel: 
>>>>> ciss1: logical drive 0 (da2) changed status ready for 
>>>>> recovery->recovering, spare status 
>>>>> 0x13<configured,rebuilding,available> Sep  2 17:55:23 
>>>>> <kern.crit> str kernel: cam_periph_alloc: attempt to 
>>>>> re-allocate valid device da2 rejected flags 0x18 refcount
>>>>> 1 Sep 2 17:55:23 <kern.crit> str kernel: daasync: Unable
>>>>> to attach to new device due to status 0x6 
>>>>> _______________________________________________ 
>>>>> freebsd-scsi at freebsd.org mailing list 
>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi To
>>>>>  unsubscribe, send any mail to 
>>>>> "freebsd-scsi-unsubscribe at freebsd.org"
>>>>> 
>>>> 
>>>> 
>>>> This looks like a bug I introduced at r249170.  Now that I 
>>>> stare deeply into the abyss of ciss(4), I think the entire 
>>>> change is wrong.
>>>> 
>>>> Do you want to try and revert that change from your kernel 
>>>> and rebuild for a test?  I don't have access to ciss(4) 
>>>> hardware anylonger and cannot verify.
>>>> 
>> 
>>> Yes, I can try. The installed rev is 281826 but I assume the 
>>> change can apply here too? 
>>> _______________________________________________ 
>>> freebsd-scsi at freebsd.org mailing list 
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi To 
>>> unsubscribe, send any mail to 
>>> "freebsd-scsi-unsubscribe at freebsd.org"
>> 
>> 
>> 
>> yeah, I think a "svn merge -c -249170" from /usr/src should do
>> it if you are managing your system from svn
>> 
> 
> Sep  2 20:54:05 <kern.crit> str kernel: ciss1: *** Hot-plug drive 
> removed, Port=1E Box=1 Bay=3 SN=            W4Z1G4BD Sep  2 
> 20:54:05 <kern.crit> str kernel: ciss1: *** Physical drive
> failure, Port=1E Box=1 Bay=3 Sep  2 20:54:50 <kern.crit> str
> kernel: ciss1: *** Hot-plug drive inserted, Port=1E Box=1 Bay=3
> SN= WD-WMC1P0F66XVC Sep  2 20:54:50 <kern.crit> str kernel: ciss1:
> *** HP Array Controller Firmware Ver = 6.64, Build Num = 0
> 
> 
> Right, this time it survived, the volume did not detach after 
> reverting.
> 
> If this change does not cause any other problems do you think it 
> can go into -STABLE?
> 
> Thanks!
> 
> //per _______________________________________________ 
> freebsd-scsi at freebsd.org mailing list 
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi To 
> unsubscribe, send any mail to 
> "freebsd-scsi-unsubscribe at freebsd.org"
> 

Definitely.  I'll yank it out today and setup a 3 day MFC

sean
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQF8BAEBCgBmBQJV50iKXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx
MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5k/cUIAKwf3SitFqiXrW8ophSd8D2F
PHMAIUbRnH6vzAK6yGFmly/4oCaTfCj966hrFRFcCdzKbUAUge89O1ewdbuiSgY+
oF0Wkb6175ucZSYaiEzayp0N1dgewxVZGAFjhO+OXGMXftgR6yYmQDCuE3eFdaRE
zA4A+VwE0gKnQxOVBbrhzf8ezEfml+iDvYd/NxCciDhlNMrWhXUCgq9B4RBM6aU2
oYt1qNxrqkVvL9hV8u2/WAJd8Q6sDcaJnv2IcKoU8i/XzhQtsMtCk9juFAvGHQQb
HRI4iJpqtBwlhBLSzesIYKzMtfd1RRRLLOG8PHZZFl3RrinOSS02SbbxCa8lFrM=
=/2d0
-----END PGP SIGNATURE-----