misc/181834: amd mounting NFS directories can drive a dead-lock

Julien Charbon jcharbon at verisign.com
Thu Sep 5 09:50:00 UTC 2013


>Number:         181834
>Category:       misc
>Synopsis:       amd mounting NFS directories can drive a dead-lock
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Sep 05 09:50:00 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     Julien Charbon
>Release:        FreeBSD 8.4 RELEASE
>Organization:
Verisign
>Environment:
FreeBSD atlas4 8.4-RELEASE-p3 FreeBSD 8.4-RELEASE-p3 #0 r+efa3d77-dirty: Wed Sep  4 20:41:53 UTC 2013     root at atlas4:/app/jcharbon/git/freebsd-vrsn/sys/GENERIC  amd64
>Description:
Short Summary:

On FreeBSD 8.4, the amd auto-mounter daemon can drive a machine dead-lock when mounting NFS directories.

Long Summary:

If amd daemon starts, the machine appears dead-locked right after:

- ssh connection are stalled
- virtual consoles are stalled
- serial console is stalled

However machine still replies to ping, thus the kernel is still alive.  Launching DDB kernel debugger via hardware NMI button during dead-lock gave us this status:

syslogd and devd are waiting for the Giant kernel lock:

db> show allchains
chain 1:
   thread 100171 (pid 885, syslogd) blocked on lock 0xffffffff80e2e100 
(sleep mutex) "Giant"
   thread 100148 (pid 1120, amd) running on CPU 9
chain 2:
   thread 100205 (pid 742, devd) blocked on lock 0xffffffff80e2e100 
(sleep mutex) "Giant"
   thread 100148 (pid 1120, amd) running on CPU 9

 which is owned by amd deamon

db> show lock Giant
   class: sleep mutex
   name: Giant
   flags: {DEF, RECURSE}
   state: {OWNED, CONTESTED, RECURSED}
   owner: 0xffffff003ef0f8e0 (tid 100148, pid 1120, "amd")
   recursed: 1
   An other backstrace with the witness kernel (kernel-witness):

  by the way this amd thread also owns other kernel mutexes:

db> show alllocks
Process 1120 (amd) thread 0xffffff003ef0f8e0 (100148)
exclusive rw udpinp (udpinp) r = 0 (0xffffff007b39fa60) locked @ 
/app/jcharbon/git/freebsd-vrsn/sys/netinet/in_pcb.c:237
exclusive rw udp (udp) r = 0 (0xffffffff80ff4d28) locked @ 
/app/jcharbon/git/freebsd-vrsn/sys/netinet/udp_usrreq.c:1464
exclusive lockmgr nfs (nfs) r = 0 (0xffffff007b1bd7e8) locked @ 
/app/jcharbon/git/freebsd-vrsn/sys/nfsclient/nfs_node.c:166
exclusive sleep mutex Giant (Giant) r = 1 (0xffffffff80e2e100) locked @ 
/app/jcharbon/git/freebsd-vrsn/sys/kern/vfs_mount.c:730

 Next we launch DDB directly from kernel NFS code which gave us as backstrace:

Tracing pid 1142 tid 100301 td 0xffffff00379098e0
kvprintf() at kvprintf+0x17a
nfs_msg() at nfs_msg+0x52
nfs_feedback() at nfs_feedback+0x105
clnt_reconnect_call() at clnt_reconnect_call+0x19b
nfs_request() at nfs_request+0x1e5
nfs_getattr() at nfs_getattr+0x2bc
mountnfs() at mountnfs+0x330
nfs_mount() at nfs_mount+0xe3f
vfs_donmount() at vfs_donmount+0xcde
kernel_mount() at kernel_mount+0xa1
nfs_cmount() at nfs_cmount+0x5a
mount() at mount+0x1ea
amd64_syscall() at amd64_syscall+0xf9
Xfast_syscall() at Xfast_syscall+0xfc
--- syscall (21, FreeBSD ELF64, mount), rip = 0x8007cea4c, rsp = 
0x7fffffffdd88, rbp = 0x2 ---
db> next
db> next
db> next
..

  And many 'next' debugger commands later:  We are back in the same place in 
nfs_request().

 At that point, the mountnfs() call will just loop infinitely in nfs_request() function and never releases kernel Giant and (nfs) kernel mutexes.
>How-To-Repeat:
Using these amd's files: /etc/amd.conf:

$ cat /etc/amd.conf
[global]
browsable_dirs = no
map_type = file
mount_type = nfs
search_path = /etc
auto_dir = /.amd
cache_duration = 30
log_file = syslog:daemon
log_options = fatal,error
print_pid = yes
pid_file = /var/run/amd.pid
restart_mounts = yes
selectors_in_defaults = no

[/nfs/home]
map_name = /etc/home.map
$

and /etc/home.map:

$ cat /etc/home.map
/defaults type:=nfs;opts:=tcp,intr,nosuid;rhost:=1.2.3.4
* rfs:=/dev/${key};fs:=${autodir}/nfs/home/${key}
$

Just a:

# /etc/rc.d/amd onestart

will drive the dead-lock.
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list