NFS locking issue with FreeBSD7.1 client
Mikolaj Golub
to.my.trociny at gmail.com
Wed Dec 30 14:29:42 UTC 2009
Hi,
After upgrading our hosts from 6.3 to 7.1 we have been experiencing the issue
with our NFS mounts. We have observed the issue on one of the hosts 3 last
nights in raw and saw similar thing on another server several weeks ago.
NFS server is CentOS 5.3. FreeBSD client had several NFS mounts, something
like this:
172.30.10.124:/var/www/app25 on /var/www/shared25data (nfs)
172.30.10.83:/var/www/app2 on /var/www/shared2data (nfs)
172.30.10.63:/var/www/app17 on /var/www/shared17data (nfs)
172.30.10.71:/var/www/app19 on /var/www/shared19data (nfs)
172.30.10.141:/var/www/app21 on /var/www/shared21data (nfs)
172.30.10.106:/var/www/app24 on /var/www/shared24data (nfs)
172.30.10.50:/var/www/app10 on /var/www/shared10data (nfs)
Although IPs are different all them are the same CentOS server. At some point
the processes that try to access one of the shares start to got stuck. Access
to other shares works ok. Stuck processes are not killable. We can mount
"problem" exported directory from linux server to some other directory and it
works ok too. Before the last incident we moved all applications that used NFS
shares to another host, it looks like the only processes that still accessed
NFS folders on this host were nagios checks (perl scripts), which periodically
checked modification time of some files and eventually these processes started
to got stuck on one the mounts.
According to procstat the processes got stuck in _vn_lock. Below is some part of
procstat -a -kk output:
PID TID COMM TDNAME KSTACK
0 0 swapper - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 scheduler+0x35d mi_startup+0x96 begin+0x2c
1 100001 init - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_wait_sig+0xd _sleep+0x2c7 kern_wait+0xf98 wait4+0x3b syscall+0x335 Xint0x80_syscall+0x20
2 100009 g_event - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 g_event_procbody+0xac fork_exit+0x99 fork_trampoline+0x8
3 100010 g_up - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 g_io_schedule_up+0xd2 g_up_procbody+0x6e fork_exit+0x99 fork_trampoline+0x8
4 100011 g_down - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 g_io_schedule_down+0x6b g_down_procbody+0x6e fork_exit+0x99 fork_trampoline+0x8
5 100013 xpt_thrd - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 xpt_scanner_thread+0x41 fork_exit+0x99 fork_trampoline+0x8
6 100016 acpi_task_0 - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 taskqueue_thread_loop+0x104 fork_exit+0x99 fork_trampoline+0x8
7 100017 acpi_task_1 - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 taskqueue_thread_loop+0x104 fork_exit+0x99 fork_trampoline+0x8
8 100018 acpi_task_2 - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 taskqueue_thread_loop+0x104 fork_exit+0x99 fork_trampoline+0x8
9 100019 thread taskq - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 taskqueue_thread_loop+0x104 fork_exit+0x99 fork_trampoline+0x8
10 100000 audit - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _cv_wait+0x171 audit_worker+0x6f fork_exit+0x99 fork_trampoline+0x8
11 100002 idle: cpu3 - <running>
12 100003 idle: cpu2 - <running>
13 100004 idle: cpu1 - <running>
14 100005 idle: cpu0 - mi_switch+0x146 ipi_bitmap_handler+0x72 Xipi_intr_bitmap_handler+0x2e acpi_cpu_idle+0x184 cpu_idle+0x28 sched_idletd+0x2c4 fork_exit+0x99 fork_trampoline+0x8
15 100006 swi4: clock sio - mi_switch+0x146 ithread_loop+0x308 fork_exit+0x99 fork_trampoline+0x8
16 100007 swi3: vm -
17 100008 swi1: net - mi_switch+0x146 ithread_loop+0x308 fork_exit+0x99 fork_trampoline+0x8
18 100012 yarrow - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 pause+0x30 random_kthread+0x22a fork_exit+0x99 fork_trampoline+0x8
19 100014 swi2: cambio -
20 100015 swi5: + -
21 100020 swi6: Giant taskq -
22 100021 swi6: task queue - mi_switch+0x146 ithread_loop+0x308 fork_exit+0x99 fork_trampoline+0x8
23 100022 kqueue taskq - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 taskqueue_thread_loop+0x104 fork_exit+0x99 fork_trampoline+0x8
24 100023 irq9: acpi0 -
25 100024 irq256: bce0 - mi_switch+0x146 ithread_loop+0x308 fork_exit+0x99 fork_trampoline+0x8
26 100025 irq16: mfi0 - mi_switch+0x146 ithread_loop+0x308 fork_exit+0x99 fork_trampoline+0x8
27 100026 irq257: bce1 - mi_switch+0x146 ithread_loop+0x308 fork_exit+0x99 fork_trampoline+0x8
28 100027 irq21: uhci0 uhci+ - mi_switch+0x146 ithread_loop+0x308 fork_exit+0x99 fork_trampoline+0x8
29 100028 usb0 - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 usb_event_thread+0xeb fork_exit+0x99 fork_trampoline+0x8
30 100029 usbtask-hc - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 usb_task_thread+0x7e fork_exit+0x99 fork_trampoline+0x8
31 100030 usbtask-dr - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 usb_task_thread+0x7e fork_exit+0x99 fork_trampoline+0x8
32 100031 irq20: uhci1 uhci3 -
33 100032 usb1 - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 usb_event_thread+0xeb fork_exit+0x99 fork_trampoline+0x8
34 100033 usb2 - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 usb_event_thread+0xeb fork_exit+0x99 fork_trampoline+0x8
35 100034 usb3 - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 usb_event_thread+0xeb fork_exit+0x99 fork_trampoline+0x8
36 100035 usb4 - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 usb_event_thread+0xeb fork_exit+0x99 fork_trampoline+0x8
37 100036 irq14: ata0 - mi_switch+0x146 ithread_loop+0x308 fork_exit+0x99 fork_trampoline+0x8
38 100037 irq15: ata1 -
39 100038 swi0: sio -
40 100039 irq1: atkbd0 - mi_switch+0x146 ithread_loop+0x308 fork_exit+0x99 fork_trampoline+0x8
41 100040 sctp_iterator - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 sctp_iterator_thread+0x7f fork_exit+0x99 fork_trampoline+0x8
42 100041 pfpurge - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 pf_purge_thread+0x3d fork_exit+0x99 fork_trampoline+0x8
43 100042 pagedaemon - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 vm_pageout+0x2da fork_exit+0x99 fork_trampoline+0x8
44 100043 vmdaemon - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 vm_daemon+0x78 fork_exit+0x99 fork_trampoline+0x8
45 100044 pagezero - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 vm_pagezero+0xd2 fork_exit+0x99 fork_trampoline+0x8
46 100045 bufdaemon - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 buf_daemon+0x2fd fork_exit+0x99 fork_trampoline+0x8
47 100046 vnlru - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 vnlru_proc+0x153 fork_exit+0x99 fork_trampoline+0x8
48 100047 syncer - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 sched_sync+0x920 fork_exit+0x99 fork_trampoline+0x8
49 100048 softdepflush - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 softdep_flush+0x3ce fork_exit+0x99 fork_trampoline+0x8
494 100055 pflogd - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_wait_sig+0xd _sleep+0x2c7 sbwait+0x52 soreceive_generic+0x491 soreceive+0x38 soo_read+0x3b dofileread+0x96 kern_readv+0x58 read+0x4f syscall+0x335 Xint0x80_syscall+0x20
499 100077 pflogd - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_timedwait_sig+0x19 _sleep+0x2a1 bpfread+0x1c3 devfs_read_f+0x7d dofileread+0x96 kern_readv+0x58 read+0x4f syscall+0x335 Xint0x80_syscall+0x20
560 100052 moused - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_wait_sig+0xd _cv_wait_sig+0x180 kern_select+0x7e7 select+0x5e syscall+0x335 Xint0x80_syscall+0x20
706 100069 devd - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_wait_sig+0xd _cv_wait_sig+0x180 kern_select+0x7e7 select+0x5e syscall+0x335 Xint0x80_syscall+0x20
757 100050 accounting - mi_switch+0x146 sleepq_switch+0xcb sleepq_timedwait+0x37 _sleep+0x2b4 acct_thread+0x40b fork_exit+0x99 fork_trampoline+0x8
[many processes that do not look like related are skipped ... and below are stucked processes]
68304 100714 perl5.8.9 - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
68523 100086 perl5.8.9 - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
68644 100616 perl5.8.9 - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
68679 100764 perl5.8.9 - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
69504 100724 bash - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_wait_sig+0xd _sleep+0x2c7 kern_wait+0xf98 wait4+0x3b syscall+0x335 Xint0x80_syscall+0x20
69511 100105 mc - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_wait_sig+0xd _sleep+0x2c7 nfs_request+0xbc5 nfs3_access_otw+0xde nfs_access+0x18b VOP_ACCESS_APV+0x42 nfs_lookup+0xd1 VOP_LOOKUP_APV+0x46 lookup+0x611 namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335 Xint0x80_syscall+0x20
69513 100240 bash - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_wait_sig+0xd _sleep+0x2c7 ttysleep+0x39 ttread+0x617 ptsread+0x3e giant_read+0x89 devfs_read_f+0x7d dofileread+0x96 kern_readv+0x58 read+0x4f syscall+0x335 Xint0x80_syscall+0x20
70521 100849 bash - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_wait_sig+0xd _sleep+0x2c7 kern_wait+0xf98 wait4+0x3b syscall+0x335 Xint0x80_syscall+0x20
70528 100550 mc - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_lstat+0x4f lstat+0x2f syscall+0x335
70530 100846 bash - mi_switch+0x146 sleepq_switch+0xcb sleepq_catch_signals+0x2fc sleepq_wait_sig+0xd _sleep+0x2c7 ttysleep+0x39 ttread+0x617 ptsread+0x3e giant_read+0x89 devfs_read_f+0x7d dofileread+0x96 kern_readv+0x58 read+0x4f syscall+0x335 Xint0x80_syscall+0x20
70815 100781 mc - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_lstat+0x4f lstat+0x2f syscall+0x335
70830 100615 df - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_statfs+0x69 __vfs_statfs+0x2f kern_getfsstat+0x2d5 getfsstat+0x2e syscall+0x335 Xint0x80_syscall+0x20
71444 100795 bash - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
71901 100839 df - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_statfs+0x69 __vfs_statfs+0x2f kern_getfsstat+0x2d5 getfsstat+0x2e syscall+0x335 Xint0x80_syscall+0x20
72110 100508 ls - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_lstat+0x4f lstat+0x2f syscall+0x335
72835 100483 ls - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_lstat+0x4f lstat+0x2f syscall+0x335
73265 100685 ls - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
73292 100832 mc - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_lstat+0x4f lstat+0x2f syscall+0x335
73357 100772 ls - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
73796 100746 ls - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_lstat+0x4f lstat+0x2f syscall+0x335
74074 100800 tcsh - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
74125 100543 ls - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
74449 100547 df - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_statfs+0x69 __vfs_statfs+0x2f kern_getfsstat+0x2d5 getfsstat+0x2e syscall+0x335 Xint0x80_syscall+0x20
74497 100737 bash - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
74650 100837 df - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_statfs+0x69 __vfs_statfs+0x2f kern_getfsstat+0x2d5 getfsstat+0x2e syscall+0x335 Xint0x80_syscall+0x20
76499 100771 perl5.8.9 - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
76533 100850 perl5.8.9 - mi_switch+0x146 sleepq_switch+0xcb sleepq_wait+0x36 _sleep+0x2d6 acquire+0x7a _lockmgr+0x45c vop_stdlock+0x40 VOP_LOCK1_APV+0x46 _vn_lock+0x166 vget+0x114 vfs_hash_get+0x143 nfs_nget+0x94 nfs_root+0x3f lookup+0xa1c namei+0x39f kern_stat+0x3d stat+0x2f syscall+0x335
I can send the full output privately if someone from developers is interested
to look at it.
We have removed all NFS shares from this server after the last incident, but
we have other servers where the problem might occur too. So any suggestions
what we should check/do then to provide more info could be helpful.
--
Mikolaj Golub
More information about the freebsd-stable
mailing list