8.1R ZFS almost locking up system
Sergey Zaharchenko
doublef-ctm at yandex.ru
Sun Sep 5 19:42:06 UTC 2010
Hello list!
Thu, Jan 01, 1970 at 12:00:00AM +0000 Dan Nelson wrote:
> In the last episode (Aug 31), Tim Bishop said:
> > On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote:
> > > In the last episode (Aug 21), Tim Bishop said:
> > > > A few items from top, including zfskern:
> > > >
> > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
> > > > 5 root 4 -8 - 0K 60K zio->i 0 54:38 3.47% zfskern
> > > > 91775 70 1 44 0 53040K 31144K tx->tx 1 2:11 0.00% postgres
> > > > 39661 tdb 1 44 0 55776K 32968K tx->tx 0 0:39 0.00% mutt
> > > > 14828 root 1 47 0 14636K 1572K tx->tx 1 0:03 0.00% zfs
> > > > 11188 root 1 51 0 14636K 1572K tx->tx 0 0:03 0.00% zfs
> > > >
I'm seeing a similar problem on a remote server (unpatched 8.1-RELEASE,
amd64, quad-core, raidz pool on 8 drives). However, it's also different
in what seems to be important. Please advise.
Portions of top:
CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 75M Active, 31M Inact, 1352M Wired, 136K Cache, 812M Buf, 6458M Free
35769 root 1 52 0 2740K 744K zilog- 1 0:00 0.00% sync
35625 df 1 44 0 14636K 1856K zio->i 2 0:00 0.00% zfs
35920 root 1 44 0 15668K 1836K scl->s 0 0:00 0.00% zpool
35607 root 1 44 0 8260K 2284K zio->i 0 0:00 0.00% csh
> > 0 100084 kernel zfs_vn_rele_task mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe
> > 5 100031 zfskern arc_reclaim_thre mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 fork_exit+0x118 fork_trampoline+0xe
> > 5 100032 zfskern l2arc_feed_threa mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be fork_exit+0x118 fork_trampoline+0xe
> > 5 100085 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe
> > 5 100086 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe
procstat -kk -a:
0 100114 kernel zfs_vn_rele_task mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe
0 100123 kernel zil_clean mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe
16 100065 syncer - mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 zil_commit+0x3e1 zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 fork_exit+0x118 fork_trampoline+0xe
39 100079 zfskern arc_reclaim_thre mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 fork_exit+0x118 fork_trampoline+0xe
39 100086 zfskern l2arc_feed_threa mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be fork_exit+0x118 fork_trampoline+0xe
39 100115 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe
39 100116 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea spa_sync+0x355 txg_sync_thread+0x195
35625 100151 zfs - mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dbuf_read+0x39a dmu_buf_hold+0xcc zap_lockdir+0x52 zap_cursor_retrieve+0x194 dsl_prop_get_all+0x187 zfs_ioc_objset_stats+0x7b zfsdev_ioctl+0x8d devfs_ioctl_f+0x77 kern_ioctl+0xf6 ioctl+0xfd syscall+0x1e7
35769 100212 sync - mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zil_commit+0x7a zfs_sync+0xa6 sync+0x20e syscall+0x1e7 Xfast_syscall+0xe1
etc. etc...
Seems pretty identical...
But gstat seems different:
dT: 1.008s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
...
3 0 0 0 0.0 0 0 0.0 0.0 da0
3 0 0 0 0.0 0 0 0.0 0.0 da1
3 0 0 0 0.0 0 0 0.0 0.0 da2
35 0 0 0 0.0 0 0 0.0 0.0 da3
3 0 0 0 0.0 0 0 0.0 0.0 da4
2 0 0 0 0.0 0 0 0.0 0.0 da5
3 0 0 0 0.0 0 0 0.0 0.0 da6
35 0 0 0 0.0 0 0 0.0 0.0 da7
Note the huge wait queues...
I've tried reading from the disks (dd), camcontrol inquiry'ing them, but
all of these methods fail:
%dd if=/dev/da3 of=/dev/null bs=1024k
load: 0.00 cmd: dd 37770 [physrd] 2.67r 0.00u 0.00s 0% 1168k
[and nothing input/output]
So, one could suspect a hardware problem. FWIW the RAID h/w is a
RocketRaid:
hptiop0: adapter at PCI 5:0:0, IRQ 16
hptiop0: <RocketRAID 3540 SATA Controller
> mem 0xdd800000-0xddffffff irq 16 at device 0.0 on pci5
hptiop0: 0 RocketRAID 3xxx/4xxx controller driver v1.3 (010208)
hptiop0: [GIANT-LOCKED]
hptiop0: [ITHREAD]
> if your FS has been near 99%, you may not have any large runs
> of freespace left.
<3GB out of 12TB is used in my case.
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 1 48 48 3042 9.8 0 0 0.0 47.6| ad4
> 0 38 38 2406 10.5 0 0 0.0 39.5| ad6
>
> You have a pair of mirrored disks, each doing around 40% I/O load, which is
> 80% load if a single-threaded task is driving all the I/O.
No load at all in my case...
> I see the syncer
> process is also trying to write to the ZIL. Are you running something that
> does a lot of fsync calls (a database server for example)?
There's a postgresql database there but it was inactive at the time the
first hung processes showed up.
> Is this system an NFS server maybe?
No. Peculiarities include using ez-jail for a couple of jails over ZFS
(also using nullfs).
> Try setting the sysctl vfs.zfs.zil_disable=1 and see
> if your performance improves.
Sure enough, it didn't help the existing processes, but I'll try it
after a reboot.
How likely do you think a hardware error is in this setup? Can I do
anything else to help diagnose this problem?
Thanks for any other input,
--
DoubleF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100905/6c8dec18/attachment.pgp
More information about the freebsd-stable
mailing list