ZFS: processes hanging when trying to access filesystems

Johan Hendriks joh.hendriks at gmail.com
Wed Mar 28 10:50:41 UTC 2012


Tim Bishop schreef:
> I have a machine running 8-STABLE amd64 from the end of last week. I
> have a problem where the machine starts to freeze up. Any process
> accessing the ZFS filesystems hangs, which eventually causes more and
> more processes to be spawned (cronjobs, etc, never complete). Although
> the root filesystem is on UFS (the machine hosts jails on ZFS),
> eventually I can't log in anymore.
>
> The problem occurs when the frequently used part of the ARC gets too
> large. See this graph:
>
> http://dl.dropbox.com/u/318044/zfs_arc_utilization-day.png
>
> At the right of the graph things started to hang.
>
> At the same time I see a high amount of context switching.
>
> I picked a hanging process and procstat showed the following:
>
>    PID    TID COMM             TDNAME           KSTACK
> 24787 100303 mutt             -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 dmu_tx_assign+0x170 zfs_inactive+0xf1 zfs_freebsd_inactive+0x1a vinactive+0x71 vputx+0x2d8 null_reclaim+0xb3 vgonel+0x119 vrecycle+0x7b null_inactive+0x1f vinactive+0x71 vputx+0x2d8 vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23
>
> I'm running a reduced amount of jails on the machine at the moment which
> is limiting the speed at which the machine freezes up completely. I'd
> like to debug this problem further, so any advice on useful information
> to collect would be appreciated.
>
> I've had this problem on the machine before[1] but adding more RAM
> allievated the issue.
>
> Tim.
>
> [1] http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058541.html
>
Just a me too, only i am on FreeBSD 9.0-RELEASE AMD64
Once a week the system starts to hang.
System boots from a normal disk in a mirror on UFS, the zpool is on the 
SAS drives in the bays.

NFS is in tx >-tx state , but can not be restarted, same goes for mountd 
and samba leaves a lot of smbd processen in a zfs state.
The only way to come out of it is a reset and restart the machine.
We use the machine as NFS server for two ESXi5.0 machines.
The strange thing is we have an almost identical machine that does not 
show this behaviour, same board, memory and raid controller.
The only difference is that that machine is a 4U case.

Settings in loader.conf are.

# ZFS
zfs_load="YES"
# Tuning
vfs.zfs.arc_max="12G"

It gets really frustrating, there is an exchange server running on it, 
and i am becoming an eseutil expert which is not something i want :D.
Second problem is that the whole company can not work, and i need to get 
the things up as fast as i can.

It  happened on moday only till now, but the time is random.

The sytem is a 16 bay supermicro with a LSI 9211-8i card and 16 GB mem 
on the X9SCM-F board.

Nothing fancy

My arc stats at this moment are.
The only thing is that i do not fully know which stats are important.

zfs-stat -AE

ZFS Subsystem Report                            Wed Mar 28 12:34:54 2012
------------------------------------------------------------------------

ARC Summary: (HEALTHY)
         Memory Throttle Count:                  0

ARC Misc:
         Deleted:                                12.40m
         Recycle Misses:                         22.95k
         Mutex Misses:                           12.53k
         Evict Skips:                            64.35k

ARC Size:                               95.44%  11.45   GiB
         Target Size: (Adaptive)         95.44%  11.45   GiB
         Min Size (Hard Limit):          12.50%  1.50    GiB
         Max Size (High Water):          8:1     12.00   GiB

ARC Size Breakdown:
         Recently Used Cache Size:       93.75%  10.74   GiB
         Frequently Used Cache Size:     6.25%   733.11  MiB

ARC Hash Breakdown:
         Elements Max:                           308.73k
         Elements Current:               99.68%  307.75k
         Collisions:                             13.69m
         Chain Max:                              16
         Chains:                                 77.43k

------------------------------------------------------------------------

ARC Efficiency:                                 69.94m
         Cache Hit Ratio:                83.10%  58.12m
         Cache Miss Ratio:               16.90%  11.82m
         Actual Hit Ratio:               67.26%  47.04m

         Data Demand Efficiency:         94.55%  35.86m
         Data Prefetch Efficiency:       43.37%  17.21m

         CACHE HITS BY CACHE LIST:
           Anonymously Used:             17.49%  10.16m
           Most Recently Used:           24.32%  14.13m
           Most Frequently Used:         56.62%  32.91m
           Most Recently Used Ghost:     0.24%   140.34k
           Most Frequently Used Ghost:   1.34%   776.73k

         CACHE HITS BY DATA TYPE:
           Demand Data:                  58.33%  33.90m
           Prefetch Data:                12.84%  7.46m
           Demand Metadata:              22.47%  13.06m
           Prefetch Metadata:            6.35%   3.69m

         CACHE MISSES BY DATA TYPE:
           Demand Data:                  16.53%  1.95m
           Prefetch Data:                82.46%  9.75m
           Demand Metadata:              0.42%   50.02k
           Prefetch Metadata:            0.60%   70.50k

------------------------------------------------------------------------

The latest top when it hangs was.
i already did shutdown samba and restarted nfs, mountd but these also hangs.

1719 root4200 10000K1300K tx->tx038:030.00% nfsd

1884 root1200 24380K3112K select30:070.00% ntpd

1933 root1260 18500K1860K nanslp00:000.00% cron

1606 root1200 16424K1776K select30:000.00% syslogd

1695 root1200 14292K1828K select10:000.00% nfsuserd

1692 root1200 14292K1828K select30:000.00% nfsuserd

1693 root1200 14292K1828K select00:000.00% nfsuserd

1694 root1200 14292K1828K select20:000.00% nfsuserd

19312 adminusr1200 70184K5524K select10:000.00% sshd

19412 root1200 20940K2536K CPU000:000.00% top

19164 root1200 70184K5440K sbwait00:000.00% sshd

19309 root1210 70184K5440K sbwait00:000.00% sshd

19175 root1200 70184K5440K sbwait00:000.00% sshd

19228 root1200 70184K5440K sbwait00:000.00% sshd

19240 root1200 80784K 12068K zfs30:000.00% smbd

19286 root1200 80784K 12064K zfs00:000.00% smbd

19131 root1200 80784K 12060K zfs30:000.00% smbd

18887 root1200 80784K 12060K zfs10:000.00% smbd

19095 root1200 80784K 12064K zfs00:000.00% smbd

19089 root1200 80784K 12064K zfs10:000.00% smbd

18929 root1200 80784K 12064K zfs00:000.00% smbd

18977 root1210 80784K 12056K zfs10:000.00% smbd

19062 root1200 80784K 12056K zfs10:000.00% smbd

18944 root1200 80784K 12056K zfs 10:000.00% smbd

19063 root1200 80784K 12056K zfs10:000.00% smbd

19231 adminusr1200 70184K5524K select20:000.00% sshd

19317 root1200 21812K3152K wait00:000.00% bash

19178 adminusr1200 70184K5524K select10:000.00% sshd

19236 root1210 21812K3156K wait00:000.00% bash

18924 root1200 80784K 12060K zfs30:000.00% smbd


At the topic starter, how do you get the graphs for the arc data?

regards
Johan Hendriks


More information about the freebsd-fs mailing list