Locked up processes after upgrade to ZFS v15

Jeremy Chadwick freebsd at jdc.parodius.com
Sat Oct 9 14:34:42 UTC 2010


On Sat, Oct 09, 2010 at 04:12:42AM -0700, Jeremy Chadwick wrote:
> [...snipping for brevity...]
>
> We're seeing (what we believe) to be this exact situation.

Because of Kai's follow-up, I've decided to move our RELENG_8 system to
using gmirror(8) instead of ZFS.  The system has only been up for 4.5
hours.  Upon shutting down httpd, I checked ps/top to see what things
looked like.  There are tons of httpd processes (now with ppid of init,
indicating they're to be reaped but can't be) in "zfs" and "zfsmrb"
states, and these processes are hung/stuck.

So yes, the behaviour I'm seeing is identical to Kai's, and the problem
is easily reproducible.  I have much less RAM than he does (8GB) as
well, so it isn't a RAM thing.

Below is some simple debugging information for whoever wants to take
this on.  If someone wants to figure this out, I'll need time to build a
test/debug system that mimics our production system.  The UIDs differ
(80, 1014, 1004, etc.) because of the Apache ITK MPM that we use.  I
doubt that has anything to do with the problem.

horus# /usr/local/etc/rc.d/apache22 stop
Stopping apache22.
Waiting for PIDS: 1192.
horus# ps -auxw | grep 1192 | grep -v grep
horus# ps -axlH | grep httpd
 1014  1243     1   0  44  0 97944 15068 zfs    D     ??    0:00.00 /usr/local/sbin/httpd
 1014  1462     1   0  44  0 98008 15280 zfsmrb DL    ??    0:00.00 /usr/local/sbin/httpd
 1014  1585     1   0  44  0 97944 15068 zfs    D     ??    0:00.00 /usr/local/sbin/httpd
 1014  1633     1   0  44  0 97944 15068 zfs    D     ??    0:00.00 /usr/local/sbin/httpd
 1004  1805     1   0  44  0 97948 15236 zfsmrb DL    ??    0:00.00 /usr/local/sbin/httpd
 1014  1998     1   0  44  0 97944 15068 zfs    D     ??    0:00.00 /usr/local/sbin/httpd
 1014  2038     1   0  44  0 97944 15064 zfs    D     ??    0:00.00 /usr/local/sbin/httpd
 1014  2077     1   0  44  0 97944 15068 zfs    D     ??    0:00.00 /usr/local/sbin/httpd
   80  3186     1   0  45  0 97984 15432 zfsmrb DL    ??    0:00.01 /usr/local/sbin/httpd
   80  4806     1   0  44  0 97944 15128 zfs    D     ??    0:00.00 /usr/local/sbin/httpd
...

horus# procstat -k -k 3186
  PID    TID COMM             TDNAME           KSTACK
 3186 100443 httpd            -                mi_switch+0x176 sleepq_wait+0x3b _sleep+0x322 zfs_freebsd_read+0x26c vnode_pager_generic_getpages+0x454 vnode_pager_getpages+0x8e vm_fault+0xbd1 trap_pfault+0x111 trap+0x479 calltrap+0x8

horus# procstat -k -k 4806
  PID    TID COMM             TDNAME           KSTACK
 4806 100475 httpd            -                mi_switch+0x176 sleepq_wait+0x3b __lockmgr_args+0x642 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x44 vget+0x67 cache_lookup+0x4fd vfs_cache_lookup+0xad VOP_LOOKUP_APV+0x40 lookup+0x48a namei+0x518 vn_open_cred+0x390 kern_openat+0x165 syscall+0x1cd Xfast_syscall+0xe2

horus# procstat -k -k 2077
  PID    TID COMM             TDNAME           KSTACK
 2077 100339 httpd            -                mi_switch+0x176 sleepq_wait+0x3b __lockmgr_args+0x642 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x44 vget+0x67 cache_lookup+0x4fd vfs_cache_lookup+0xad VOP_LOOKUP_APV+0x40 lookup+0x48a namei+0x518 vn_open_cred+0x390 kern_openat+0x165 syscall+0x1cd Xfast_syscall+0xe2

horus# procstat -k -k 1462
  PID    TID COMM             TDNAME           KSTACK
 1462 100157 httpd            -                mi_switch+0x176 sleepq_wait+0x3b _sleep+0x322 zfs_freebsd_read+0x26c vnode_pager_generic_getpages+0x454 vnode_pager_getpages+0x8e vm_fault+0xbd1 trap_pfault+0x111 trap+0x479 calltrap+0x8

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-fs mailing list