New ZFSv28 patchset for 8-STABLE

Sun Jan 9 11:49:30 UTC 2011

  On 01/09/2011 10:00 AM, Attila Nagy wrote:
>  On 12/16/2010 01:44 PM, Martin Matuska wrote:
>> Hi everyone,
>>
>> following the announcement of Pawel Jakub Dawidek (pjd at FreeBSD.org) I am
>> providing a ZFSv28 testing patch for 8-STABLE.
>>
>> Link to the patch:
>>
>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz 
>>
>>
> I've got an IO hang with dedup enabled (not sure it's related, I've 
> started to rewrite all data on pool, which makes a heavy load):
>
> The processes are in various states:
> 65747   1001      1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
> 80383   1001      1  54   10 40616K 30196K select  1   5:38  0.00% rsync
>  1501 www         1  44    0  7304K  2504K zio->i  0   2:09  0.00% nginx
>  1479 www         1  44    0  7304K  2416K zio->i  1   2:03  0.00% nginx
>  1477 www         1  44    0  7304K  2664K zio->i  0   2:02  0.00% nginx
>  1487 www         1  44    0  7304K  2376K zio->i  0   1:40  0.00% nginx
>  1490 www         1  44    0  7304K  1852K zfs     0   1:30  0.00% nginx
>  1486 www         1  44    0  7304K  2400K zfsvfs  1   1:05  0.00% nginx
>
> And everything which wants to touch the pool is/becomes dead.
>
> Procstat says about one process:
> # procstat -k 1497
>   PID    TID COMM             TDNAME           KSTACK
>  1497 100257 nginx            -                mi_switch sleepq_wait 
> __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV 
> _vn_lock nullfs_root lookup namei vn_open_cred kern_openat 
> syscallenter syscall Xfast_syscall
No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read 
error)
and it seems it froze the whole zpool. Removing the disk by hand solved 
the problem.
I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.