[Bug 219224] umount hanging in _cv_wait zil_commit zfs_sync sys_sync

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Thu May 11 17:06:23 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219224

            Bug ID: 219224
           Summary: umount hanging in _cv_wait zil_commit zfs_sync
                    sys_sync
           Product: Base System
           Version: 11.0-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: t at sdf.org

I'm observing hanging processes on a zfs system, apparently after umount
tried to sync data before unmounting a jails devfs.

tools-1-b# service jail restart git-1-b
Stopping jails: git-1-b
load: 1.77  cmd: umount 79620 [zilog->zl_cv_batch[0]] 568.08r 0.00u 0.01s 0%
2000k
load: 1.65  cmd: umount 79620 [zilog->zl_cv_batch[0]] 575.16r 0.00u 0.01s 0%
2000k
load: 2.32  cmd: umount 79620 [zilog->zl_cv_batch[0]] 580.44r 0.00u 0.01s 0%
2000k

tools-1-b# procstat -kk 79620
  PID    TID COMM             TDNAME           KSTACK                       
79620 100216 umount           -                mi_switch+0xd2 sleepq_wait+0x3a
_cv_wait+0x194 zil_commit+0x85 zfs_sync+0xad sys_sync+0x146 amd64_syscall+0x4ce
Xfast_syscall+0xfb 

tools-1-b# kgdb
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols
found)
...
Attempt to extract a component of a value that is not a structure pointer.
Attempt to extract a component of a value that is not a structure pointer.
#0  0xffffffff80b0cafb in sched_switch ()
(kgdb) tid 100216
[Switching to thread 2772 (Thread 100216)]#0  0xffffffff80b0cafb in
sched_switch ()
(kgdb) bt
#0  0xffffffff80b0cafb in sched_switch ()
#1  0xffffffff80ae4ae2 in mi_switch ()
#2  0xffffffff80b3226a in sleepq_wait ()
#3  0xffffffff80a66464 in _cv_wait ()
#4  0xffffffff82252945 in ?? ()
#5  0xfffffe201d5308e0 in ?? ()
#6  0xfffff805d1af7500 in ?? ()
#7  0xfffffe201d530870 in ?? ()
#8  0xffffffff80bae521 in __mnt_vnode_markerfree_all ()
#9  0xffffffff8227aded in ?? ()
#10 0x00000000fffff7ff in ?? ()
#11 0xfffffe201d530968 in ?? ()
#12 0xfffffe201d530990 in ?? ()
#13 0xffffffff80bb48a6 in sys_sync ()
Previous frame identical to this frame (corrupt stack?)
(kgdb) 

Two other processes are hanging too:

tools-1-b# ps ax | awk '$3~"D"&&$5~"^/"'
36031  -  DsJ      0:00.49 /var/cfengine/bin/cf-agent -Dfrom_cfexecd,scheduled_
40395  -  DJ       0:00.01 /var/cfengine/bin/cf-promises -c /var/cfengine/input
79620  0  D+       0:00.01 /sbin/umount /jails/git-1-b/dev

tools-1-b# procstat -kk 36031
  PID    TID COMM             TDNAME           KSTACK                       
36031 103064 cf-agent         -                mi_switch+0xd2 sleepq_wait+0x3a
_cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+0x8d
vnode_pager_putpages+0xe3 vm_pageout_flush+0xed
vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197
vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f
vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d
vinactive+0x17d

tools-1-b# procstat -kk 40395
  PID    TID COMM             TDNAME           KSTACK                       
40395 103070 cf-promises      -                mi_switch+0xd2 sleepq_wait+0x3a
_cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+0x8d
vnode_pager_putpages+0xe3 vm_pageout_flush+0xed
vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197
vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f
vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d
vinactive+0x17d

Everything is on zfs, with both the mothership and the
jail in zroot/jails/cfengine-1-b sharing pool "zroot":

tools-1-b# zpool list 
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
data   2.91T  1.43M  2.91T         -     0%     0%  2.14x  ONLINE  -
zroot   744G  19.6G   724G         -    19%     2%  1.60x  ONLINE  -

I observe another process being reported at 100%
cpu usage, that's highly unusual, also inside the cfengine-jail:

tools-1-b# ps w 2988
 PID TT  STAT      TIME COMMAND
2988  -  TsJ  118:40.91 /usr/local/sbin/cf-serverd

tools-1-b# procstat -kk 2988
  PID    TID COMM             TDNAME           KSTACK                       
 2988 100835 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 
 2988 100922 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 
 2988 100963 cf-serverd       -                dmu_object_info_from_db+0x2d
zfs_zget+0xde zfs_get_data+0x40 zil_commit+0x6a1 zfs_freebsd_putpages+0x71e
VOP_PUTPAGES_APV+0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed
vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197
vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f
vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d
vinactive+0x17d 
 2988 100966 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 
 2988 100970 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 
 2988 100973 cf-serverd       -                mi_switch+0xd2
thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f

[ ... 1000 more threads with exactly
"mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f"]

This thread 100963 seems to be looping around:

tools-1-b# {repeat 20 procstat -k 2988} | awk '$2=="100963"' | cut -d- -f3- |
sort | uniq 
                dbuf_hold_impl dbuf_hold dnode_hold_impl dmu_bonus_hold
zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV
vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush
vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV
vgonel vrecycle 
                dbuf_read dmu_bonus_hold zfs_zget zfs_get_data zil_commit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV 
                dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data
zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages
vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean
vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle
zfs_freebsd_inactive 
                dmu_bonus_hold zfs_zget zfs_get_data zil_commit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV vinactive 
                dmu_object_info_from_db zfs_zget zfs_get_data zil_commit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV vinactive 
                dmu_object_info_from_dnode dmu_object_info_from_db zfs_zget
zfsget_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV
vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush
vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV
vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV 
                dmu_zfetch dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget
zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV
vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush
vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV
vgonel vrecycle 
                dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_o
bject_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV 
                dnode_rele dmu_bonus_hold zfs_zget zfs_get_data zil_commit
zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV 
                zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages
VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush
vm_object_page_collect_flush vm_object_page_clean vm_object_terminate
zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive
VOP_INACTIVE_APV vinactive vputx

Only the above listed processes are affected, there
are ~200 other ones running (and writing to zroot pool).

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list