8.1R ZFS almost locking up system

Sun Sep 5 19:42:06 UTC 2010

Hello list!

Thu, Jan 01, 1970 at 12:00:00AM +0000 Dan Nelson wrote:

> In the last episode (Aug 31), Tim Bishop said:
> > On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote:
> > > In the last episode (Aug 21), Tim Bishop said:
> > > > A few items from top, including zfskern:
> > > > 
> > > >   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
> > > >     5 root        4  -8    -     0K    60K zio->i  0  54:38  3.47% zfskern
> > > > 91775     70      1  44    0 53040K 31144K tx->tx  1   2:11  0.00% postgres
> > > > 39661 tdb         1  44    0 55776K 32968K tx->tx  0   0:39  0.00% mutt
> > > > 14828 root        1  47    0 14636K  1572K tx->tx  1   0:03  0.00% zfs
> > > > 11188 root        1  51    0 14636K  1572K tx->tx  0   0:03  0.00% zfs
> > > > 

I'm seeing a similar problem on a remote server (unpatched 8.1-RELEASE,
amd64, quad-core, raidz pool on 8 drives). However, it's also different
in what seems to be important. Please advise.

Portions of top:

CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 75M Active, 31M Inact, 1352M Wired, 136K Cache, 812M Buf, 6458M Free

35769 root             1  52    0  2740K   744K zilog-  1   0:00  0.00% sync
35625 df               1  44    0 14636K  1856K zio->i  2   0:00  0.00% zfs
35920 root             1  44    0 15668K  1836K scl->s  0   0:00  0.00% zpool
35607 root             1  44    0  8260K  2284K zio->i  0   0:00  0.00% csh

> >     0 100084 kernel           zfs_vn_rele_task mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe 
> >     5 100031 zfskern          arc_reclaim_thre mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 fork_exit+0x118 fork_trampoline+0xe 
> >     5 100032 zfskern          l2arc_feed_threa mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be fork_exit+0x118 fork_trampoline+0xe 
> >     5 100085 zfskern          txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe 
> >     5 100086 zfskern          txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe 

procstat -kk -a:

    0 100114 kernel           zfs_vn_rele_task mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe
    0 100123 kernel           zil_clean        mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe
   16 100065 syncer           -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 zil_commit+0x3e1 zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 fork_exit+0x118 fork_trampoline+0xe
   39 100079 zfskern          arc_reclaim_thre mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 fork_exit+0x118 fork_trampoline+0xe
   39 100086 zfskern          l2arc_feed_threa mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be fork_exit+0x118 fork_trampoline+0xe
   39 100115 zfskern          txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe
   39 100116 zfskern          txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea spa_sync+0x355 txg_sync_thread+0x195
35625 100151 zfs              -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dbuf_read+0x39a dmu_buf_hold+0xcc zap_lockdir+0x52 zap_cursor_retrieve+0x194 dsl_prop_get_all+0x187 zfs_ioc_objset_stats+0x7b zfsdev_ioctl+0x8d devfs_ioctl_f+0x77 kern_ioctl+0xf6 ioctl+0xfd syscall+0x1e7
35769 100212 sync             -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zil_commit+0x7a zfs_sync+0xa6 sync+0x20e syscall+0x1e7 Xfast_syscall+0xe1
etc. etc...

Seems pretty identical...

But gstat seems different:

dT: 1.008s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
...
    3      0      0      0    0.0      0      0    0.0    0.0  da0
    3      0      0      0    0.0      0      0    0.0    0.0  da1
    3      0      0      0    0.0      0      0    0.0    0.0  da2
   35      0      0      0    0.0      0      0    0.0    0.0  da3
    3      0      0      0    0.0      0      0    0.0    0.0  da4
    2      0      0      0    0.0      0      0    0.0    0.0  da5
    3      0      0      0    0.0      0      0    0.0    0.0  da6
   35      0      0      0    0.0      0      0    0.0    0.0  da7

Note the huge wait queues...

I've tried reading from the disks (dd), camcontrol inquiry'ing them, but
all of these methods fail:
%dd if=/dev/da3 of=/dev/null bs=1024k
load: 0.00  cmd: dd 37770 [physrd] 2.67r 0.00u 0.00s 0% 1168k
[and nothing input/output]

So, one could suspect a hardware problem. FWIW the RAID h/w is a
RocketRaid:

hptiop0: adapter at PCI 5:0:0, IRQ 16
hptiop0: <RocketRAID 3540 SATA Controller
> mem 0xdd800000-0xddffffff irq 16 at device 0.0 on pci5
hptiop0: 0 RocketRAID 3xxx/4xxx controller driver v1.3 (010208)
hptiop0: [GIANT-LOCKED]
hptiop0: [ITHREAD]

> if your FS has been near 99%, you may not have any large runs
> of freespace left.

<3GB out of 12TB is used in my case.

> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     1     48     48   3042    9.8      0      0    0.0   47.6| ad4
>     0     38     38   2406   10.5      0      0    0.0   39.5| ad6
> 
> You have a pair of mirrored disks, each doing around 40% I/O load, which is
> 80% load if a single-threaded task is driving all the I/O.

No load at all in my case...

> I see the syncer
> process is also trying to write to the ZIL.  Are you running something that
> does a lot of fsync calls (a database server for example)?

There's a postgresql database there but it was inactive at the time the
first hung processes showed up.

> Is this system an NFS server maybe?

No. Peculiarities include using ez-jail for a couple of jails over ZFS
(also using nullfs).

> Try setting the sysctl vfs.zfs.zil_disable=1 and see
> if your performance improves.

Sure enough, it didn't help the existing processes, but I'll try it
after a reboot.

How likely do you think a hardware error is in this setup? Can I do
anything else to help diagnose this problem?

Thanks for any other input,

-- 
DoubleF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100905/6c8dec18/attachment.pgp