Locked up processes after upgrade to ZFS v15

Kai Gallasch gallasch at free.de
Wed Oct 6 12:55:15 UTC 2010


Hi.

Two days ago I upgraded my server to 8.1-STABLE (amd64) and upgraded ZFS from v14 to v15.
After zpool & zfs upgrade the server was running stable for about half a day, but then apache processes running inside jails would lock up and could not be terminated any more.

In the end apache (both worker and prefork) itself locked up, because it lost control of its child processes.

- only webserver jails with a prefork or worker apache do lock up
- non-apache processes in other jails do not show this problem
- locked httpd processes will not terminate when rebooting.

in 'top' the stuck processes show up with state zfs or zfsmrb:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 2341 root        1  44    0   112M 12760K select  3   0:04  0.00% httpd
 2365 root        1  44    0 12056K  4312K select  0   0:00  0.00% sendmail
 2376 root        1  48    0  7972K  1628K nanslp  4   0:00  0.00% cron
 2214 root        1  44    0  6916K  1440K select  0   0:00  0.00% syslogd
24731 www         1  44    0   114M 13464K zfsmrb  6   0:00  0.00% httpd
12111 www         1  44    0   114M 13520K zfs     5   0:00  0.00% httpd
24729 www         1  44    0   114M 13408K zfsmrb  4   0:00  0.00% httpd
24728 www         1  47    0   114M 13404K zfsmrb  5   0:00  0.00% httpd
11051 www         1  44    0   114M 13456K zfs     1   0:00  0.00% httpd
26368 www         1  44    0   114M 13460K zfsmrb  6   0:00  0.00% httpd
24730 www         1  44    0   114M 13444K zfsmrb  5   0:00  0.00% httpd
88803 www         1  44    0   114M 13388K zfs     1   0:00  0.00% httpd
10887 www         1  44    0   114M 13436K zfs     6   0:00  0.00% httpd
16493 www         1  44    0   114M 13528K zfs     5   0:00  0.00% httpd
12461 www         1  44    0   114M 13340K zfs     1   0:00  0.00% httpd
89018 www         1  51    0   114M 13260K zfs     1   0:00  0.00% httpd
48699 www         1  52    0   114M 13308K zfs     3   0:00  0.00% httpd
31090 www         1  44    0   114M 13404K zfs     3   0:00  0.00% httpd
18094 www         1  44    0   114M 13312K zfs     2   0:00  0.00% httpd
69479 www         1  46    0   114M 13424K zfs     4   0:00  0.00% httpd
12890 www         1  44    0   114M 13336K zfs     5   0:00  0.00% httpd
67204 www         1  44    0   114M 13328K zfs     5   0:00  0.00% httpd
69402 www         1  60    0   114M 13432K zfs     4   0:00  0.00% httpd
91162 www         1  56    0   114M 13408K zfs     0   0:00  0.00% httpd
89781 www         1  45    0   114M 13428K zfs     4   0:00  0.00% httpd
48663 www         1  45    0   114M 13388K zfs     4   0:00  0.00% httpd
12112 www         1  44    0   114M 13340K zfs     6   0:00  0.00% httpd
91161 www         1  54    0   114M 13280K zfs     5   0:00  0.00% httpd
88839 www         1  44    0   114M 13592K zfsmrb  5   0:00  0.00% httpd
89144 www         1  58    0   114M 13304K zfs     0   0:00  0.00% httpd
78946 www         1  45    0   114M 13420K zfs     0   0:00  0.00% httpd
81984 www         1  44    0   114M 13396K zfs     5   0:00  0.00% httpd
93431 www         1  61    0   114M 13340K zfs     5   0:00  0.00% httpd
91179 www         1  76    0   114M 13360K zfs     4   0:00  0.00% httpd
69400 www         1  53    0   114M 13324K zfs     0   0:00  0.00% httpd
54211 www         1  45    0   114M 13404K zfs     6   0:00  0.00% httpd
36335 www         1  45    0   114M 13400K zfs     4   0:00  0.00% httpd
31093 www         1  44    0   114M 13348K zfs     2   0:00  0.00% httpd

I compiled a debug kernel with following options:

options         KDB                     # Enable kernel debugger support.
options         DDB                     # Support DDB.
options         GDB                     # Support remote GDB.
options         INVARIANTS              # Enable calls of extra sanity checking
options         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
options         WITNESS                 # Enable checks to detect deadlocks and cycles
options         WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
#
options         SW_WATCHDOG
options         DEBUG_LOCKS
options         DEBUG_VFS_LOCKS

After process lockups only output on console was:
witness_lock_list_get: witness exhausted

I also moved the jails with the stuck httpd processes to another server (also 8.1-STABLE, ZFS v15) - but the lockup also ouccured there.

How can I debug this and get further information? At the moment I am thinking about reverting from zfs to ufs - to save some nerves. Would be a big disappointment for me, after all the time and effort trying to use zfs in production.

Regards,
Kai.








More information about the freebsd-fs mailing list