ZFS: processes hanging when trying to access filesystems
Johan Hendriks
joh.hendriks at gmail.com
Wed Mar 28 10:50:41 UTC 2012
Tim Bishop schreef:
> I have a machine running 8-STABLE amd64 from the end of last week. I
> have a problem where the machine starts to freeze up. Any process
> accessing the ZFS filesystems hangs, which eventually causes more and
> more processes to be spawned (cronjobs, etc, never complete). Although
> the root filesystem is on UFS (the machine hosts jails on ZFS),
> eventually I can't log in anymore.
>
> The problem occurs when the frequently used part of the ARC gets too
> large. See this graph:
>
> http://dl.dropbox.com/u/318044/zfs_arc_utilization-day.png
>
> At the right of the graph things started to hang.
>
> At the same time I see a high amount of context switching.
>
> I picked a hanging process and procstat showed the following:
>
> PID TID COMM TDNAME KSTACK
> 24787 100303 mutt - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 dmu_tx_assign+0x170 zfs_inactive+0xf1 zfs_freebsd_inactive+0x1a vinactive+0x71 vputx+0x2d8 null_reclaim+0xb3 vgonel+0x119 vrecycle+0x7b null_inactive+0x1f vinactive+0x71 vputx+0x2d8 vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23
>
> I'm running a reduced amount of jails on the machine at the moment which
> is limiting the speed at which the machine freezes up completely. I'd
> like to debug this problem further, so any advice on useful information
> to collect would be appreciated.
>
> I've had this problem on the machine before[1] but adding more RAM
> allievated the issue.
>
> Tim.
>
> [1] http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058541.html
>
Just a me too, only i am on FreeBSD 9.0-RELEASE AMD64
Once a week the system starts to hang.
System boots from a normal disk in a mirror on UFS, the zpool is on the
SAS drives in the bays.
NFS is in tx >-tx state , but can not be restarted, same goes for mountd
and samba leaves a lot of smbd processen in a zfs state.
The only way to come out of it is a reset and restart the machine.
We use the machine as NFS server for two ESXi5.0 machines.
The strange thing is we have an almost identical machine that does not
show this behaviour, same board, memory and raid controller.
The only difference is that that machine is a 4U case.
Settings in loader.conf are.
# ZFS
zfs_load="YES"
# Tuning
vfs.zfs.arc_max="12G"
It gets really frustrating, there is an exchange server running on it,
and i am becoming an eseutil expert which is not something i want :D.
Second problem is that the whole company can not work, and i need to get
the things up as fast as i can.
It happened on moday only till now, but the time is random.
The sytem is a 16 bay supermicro with a LSI 9211-8i card and 16 GB mem
on the X9SCM-F board.
Nothing fancy
My arc stats at this moment are.
The only thing is that i do not fully know which stats are important.
zfs-stat -AE
ZFS Subsystem Report Wed Mar 28 12:34:54 2012
------------------------------------------------------------------------
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 12.40m
Recycle Misses: 22.95k
Mutex Misses: 12.53k
Evict Skips: 64.35k
ARC Size: 95.44% 11.45 GiB
Target Size: (Adaptive) 95.44% 11.45 GiB
Min Size (Hard Limit): 12.50% 1.50 GiB
Max Size (High Water): 8:1 12.00 GiB
ARC Size Breakdown:
Recently Used Cache Size: 93.75% 10.74 GiB
Frequently Used Cache Size: 6.25% 733.11 MiB
ARC Hash Breakdown:
Elements Max: 308.73k
Elements Current: 99.68% 307.75k
Collisions: 13.69m
Chain Max: 16
Chains: 77.43k
------------------------------------------------------------------------
ARC Efficiency: 69.94m
Cache Hit Ratio: 83.10% 58.12m
Cache Miss Ratio: 16.90% 11.82m
Actual Hit Ratio: 67.26% 47.04m
Data Demand Efficiency: 94.55% 35.86m
Data Prefetch Efficiency: 43.37% 17.21m
CACHE HITS BY CACHE LIST:
Anonymously Used: 17.49% 10.16m
Most Recently Used: 24.32% 14.13m
Most Frequently Used: 56.62% 32.91m
Most Recently Used Ghost: 0.24% 140.34k
Most Frequently Used Ghost: 1.34% 776.73k
CACHE HITS BY DATA TYPE:
Demand Data: 58.33% 33.90m
Prefetch Data: 12.84% 7.46m
Demand Metadata: 22.47% 13.06m
Prefetch Metadata: 6.35% 3.69m
CACHE MISSES BY DATA TYPE:
Demand Data: 16.53% 1.95m
Prefetch Data: 82.46% 9.75m
Demand Metadata: 0.42% 50.02k
Prefetch Metadata: 0.60% 70.50k
------------------------------------------------------------------------
The latest top when it hangs was.
i already did shutdown samba and restarted nfs, mountd but these also hangs.
1719 root4200 10000K1300K tx->tx038:030.00% nfsd
1884 root1200 24380K3112K select30:070.00% ntpd
1933 root1260 18500K1860K nanslp00:000.00% cron
1606 root1200 16424K1776K select30:000.00% syslogd
1695 root1200 14292K1828K select10:000.00% nfsuserd
1692 root1200 14292K1828K select30:000.00% nfsuserd
1693 root1200 14292K1828K select00:000.00% nfsuserd
1694 root1200 14292K1828K select20:000.00% nfsuserd
19312 adminusr1200 70184K5524K select10:000.00% sshd
19412 root1200 20940K2536K CPU000:000.00% top
19164 root1200 70184K5440K sbwait00:000.00% sshd
19309 root1210 70184K5440K sbwait00:000.00% sshd
19175 root1200 70184K5440K sbwait00:000.00% sshd
19228 root1200 70184K5440K sbwait00:000.00% sshd
19240 root1200 80784K 12068K zfs30:000.00% smbd
19286 root1200 80784K 12064K zfs00:000.00% smbd
19131 root1200 80784K 12060K zfs30:000.00% smbd
18887 root1200 80784K 12060K zfs10:000.00% smbd
19095 root1200 80784K 12064K zfs00:000.00% smbd
19089 root1200 80784K 12064K zfs10:000.00% smbd
18929 root1200 80784K 12064K zfs00:000.00% smbd
18977 root1210 80784K 12056K zfs10:000.00% smbd
19062 root1200 80784K 12056K zfs10:000.00% smbd
18944 root1200 80784K 12056K zfs 10:000.00% smbd
19063 root1200 80784K 12056K zfs10:000.00% smbd
19231 adminusr1200 70184K5524K select20:000.00% sshd
19317 root1200 21812K3152K wait00:000.00% bash
19178 adminusr1200 70184K5524K select10:000.00% sshd
19236 root1210 21812K3156K wait00:000.00% bash
18924 root1200 80784K 12060K zfs30:000.00% smbd
At the topic starter, how do you get the graphs for the arc data?
regards
Johan Hendriks
More information about the freebsd-fs
mailing list