Locked up processes after upgrade to ZFS v15
Kai Gallasch
gallasch at free.de
Wed Oct 6 12:55:15 UTC 2010
Hi.
Two days ago I upgraded my server to 8.1-STABLE (amd64) and upgraded ZFS from v14 to v15.
After zpool & zfs upgrade the server was running stable for about half a day, but then apache processes running inside jails would lock up and could not be terminated any more.
In the end apache (both worker and prefork) itself locked up, because it lost control of its child processes.
- only webserver jails with a prefork or worker apache do lock up
- non-apache processes in other jails do not show this problem
- locked httpd processes will not terminate when rebooting.
in 'top' the stuck processes show up with state zfs or zfsmrb:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
2341 root 1 44 0 112M 12760K select 3 0:04 0.00% httpd
2365 root 1 44 0 12056K 4312K select 0 0:00 0.00% sendmail
2376 root 1 48 0 7972K 1628K nanslp 4 0:00 0.00% cron
2214 root 1 44 0 6916K 1440K select 0 0:00 0.00% syslogd
24731 www 1 44 0 114M 13464K zfsmrb 6 0:00 0.00% httpd
12111 www 1 44 0 114M 13520K zfs 5 0:00 0.00% httpd
24729 www 1 44 0 114M 13408K zfsmrb 4 0:00 0.00% httpd
24728 www 1 47 0 114M 13404K zfsmrb 5 0:00 0.00% httpd
11051 www 1 44 0 114M 13456K zfs 1 0:00 0.00% httpd
26368 www 1 44 0 114M 13460K zfsmrb 6 0:00 0.00% httpd
24730 www 1 44 0 114M 13444K zfsmrb 5 0:00 0.00% httpd
88803 www 1 44 0 114M 13388K zfs 1 0:00 0.00% httpd
10887 www 1 44 0 114M 13436K zfs 6 0:00 0.00% httpd
16493 www 1 44 0 114M 13528K zfs 5 0:00 0.00% httpd
12461 www 1 44 0 114M 13340K zfs 1 0:00 0.00% httpd
89018 www 1 51 0 114M 13260K zfs 1 0:00 0.00% httpd
48699 www 1 52 0 114M 13308K zfs 3 0:00 0.00% httpd
31090 www 1 44 0 114M 13404K zfs 3 0:00 0.00% httpd
18094 www 1 44 0 114M 13312K zfs 2 0:00 0.00% httpd
69479 www 1 46 0 114M 13424K zfs 4 0:00 0.00% httpd
12890 www 1 44 0 114M 13336K zfs 5 0:00 0.00% httpd
67204 www 1 44 0 114M 13328K zfs 5 0:00 0.00% httpd
69402 www 1 60 0 114M 13432K zfs 4 0:00 0.00% httpd
91162 www 1 56 0 114M 13408K zfs 0 0:00 0.00% httpd
89781 www 1 45 0 114M 13428K zfs 4 0:00 0.00% httpd
48663 www 1 45 0 114M 13388K zfs 4 0:00 0.00% httpd
12112 www 1 44 0 114M 13340K zfs 6 0:00 0.00% httpd
91161 www 1 54 0 114M 13280K zfs 5 0:00 0.00% httpd
88839 www 1 44 0 114M 13592K zfsmrb 5 0:00 0.00% httpd
89144 www 1 58 0 114M 13304K zfs 0 0:00 0.00% httpd
78946 www 1 45 0 114M 13420K zfs 0 0:00 0.00% httpd
81984 www 1 44 0 114M 13396K zfs 5 0:00 0.00% httpd
93431 www 1 61 0 114M 13340K zfs 5 0:00 0.00% httpd
91179 www 1 76 0 114M 13360K zfs 4 0:00 0.00% httpd
69400 www 1 53 0 114M 13324K zfs 0 0:00 0.00% httpd
54211 www 1 45 0 114M 13404K zfs 6 0:00 0.00% httpd
36335 www 1 45 0 114M 13400K zfs 4 0:00 0.00% httpd
31093 www 1 44 0 114M 13348K zfs 2 0:00 0.00% httpd
I compiled a debug kernel with following options:
options KDB # Enable kernel debugger support.
options DDB # Support DDB.
options GDB # Support remote GDB.
options INVARIANTS # Enable calls of extra sanity checking
options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS
options WITNESS # Enable checks to detect deadlocks and cycles
options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed
#
options SW_WATCHDOG
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS
After process lockups only output on console was:
witness_lock_list_get: witness exhausted
I also moved the jails with the stuck httpd processes to another server (also 8.1-STABLE, ZFS v15) - but the lockup also ouccured there.
How can I debug this and get further information? At the moment I am thinking about reverting from zfs to ufs - to save some nerves. Would be a big disappointment for me, after all the time and effort trying to use zfs in production.
Regards,
Kai.
More information about the freebsd-fs
mailing list