kern/156168: [nfs] [panic] Kernel panic under concurrent access over NFS

Rick Macklem rmacklem at uoguelph.ca
Sat Oct 8 21:28:58 UTC 2011


Mark Saad wrote:
> The following reply was made to PR kern/156168; it has been noted by
> GNATS.
> 
> From: Mark Saad <nonesuch at longcount.org>
> To: bug-followup at FreeBSD.org, niakrisn at gmail.com
> Cc:
> Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent
> access
> over NFS
> Date: Thu, 29 Sep 2011 11:32:12 -0400
> 
> All
> I am seeing a similar crash on 7.3-RELEASE-p2 amd64 when using
> apache-1.3.34 with accf_httpd and a nfs docroot
> The servers that have crashed are all FreeBSD 7.3-RELEASE amd64.
> Hardware is HP Dl145 g2
> They have 2G of ram and 2G swap with one single core opteron cpu.
> 
> 
> We are using the following sysctls .
> 
> kern.ipc.maxsockbuf=2097152
> kern.ipc.nmbclusters=32768
> kern.ipc.somaxconn=1024
> kern.maxfiles=131072
> kern.maxfilesperproc=32768
> net.inet.tcp.inflight.enable=0
> net.inet.tcp.path_mtu_discovery=0
> net.inet.tcp.recvbuf_inc=524288
> net.inet.tcp.recvbuf_max=8388608
> net.inet.tcp.recvspace=32768
> net.inet.tcp.sendbuf_inc=16384
> net.inet.tcp.sendbuf_max=8388608
> net.inet.tcp.sendspace=32768
> net.inet.udp.recvspace=42080
> net.isr.direct=1
> vm.pmap.shpgperproc=600
> 
> 
> Up time prior to the crash was not the other system was up for 11 days
> this one was 6 days.
> 
> Here is the contents of my crash
> 
> 
> [root at web29 /var/crash]# kgdb /boot/kernel/kernel /var/crash/vmcore.0
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0x258
> fault code = supervisor read data, page not present
> instruction pointer = 0x8:0xffffffff8051a66d
> stack pointer = 0x10:0xffffff803e69b1c0
> frame pointer = 0x10:0xffffff0001b50ae0
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 9336 (libhttpd.ep)
> trap number = 12
> panic: page fault
> cpuid = 0
> Uptime: 6d5h18m39s
> Physical memory: 2034 MB
> Dumping 1451 MB: 1436 1420 1404 1388 1372 1356 1340 1324 1308 1292
> 1276 1260 1244 1228 1212 1196 1180 1164 1148 1132 1116 1100 1084 1068
> 1052 1036 1020 1004 988 972 956 940 924 908 892 876 860 844 828 812
> 796 780 764 748 732 716 700 684 668 652 636 620 604 588 572 556 540
> 524 508 492 476 460 444 428 412 396 380 364 348 332 316 300 284 268
> 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12
> 
> Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from
> /boot/kernel/accf_http.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/accf_http.ko
> #0 doadump () at pcpu.h:195
> 195 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) bt
> #0 doadump () at pcpu.h:195
> #1 0x0000000000000004 in ?? ()
> #2 0xffffffff805285f9 in boot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:418
> #3 0xffffffff80528a02 in panic (fmt=0x104 <Address 0x104 out of
> bounds>) at /usr/src/sys/kern/kern_shutdown.c:574
> #4 0xffffffff807ec813 in trap_fatal (frame=0xffffff0001b50ae0,
> eva=Variable "eva" is not available.
> ) at /usr/src/sys/amd64/amd64/trap.c:777
> #5 0xffffffff807ecbe5 in trap_pfault (frame=0xffffff803e69b110,
> usermode=0) at /usr/src/sys/amd64/amd64/trap.c:693
> #6 0xffffffff807ed50c in trap (frame=0xffffff803e69b110) at
> /usr/src/sys/amd64/amd64/trap.c:464
> #7 0xffffffff807d614e in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:218
> #8 0xffffffff8051a66d in _mtx_lock_sleep (m=0xffffff002f3d7a80,
> tid=18446742974226565856, opts=Variable "opts" is not available.
> )
> at /usr/src/sys/kern/kern_mutex.c:339
> #9 0xffffffff80701f60 in clnt_dg_create (so=0xffffff00017755a0,
> svcaddr=0xffffff803e69b310, program=100000, version=4, sendsz=Variable
> "sendsz" is not available.
> )
> at /usr/src/sys/rpc/clnt_dg.c:259
> #10 0xffffffff806e97c9 in nlm_get_rpc (sa=Variable "sa" is not
> available.
> ) at /usr/src/sys/nlm/nlm_prot_impl.c:327
> #11 0xffffffff806e9d39 in nlm_host_get_rpc (host=0xffffff0001705000)
> at /usr/src/sys/nlm/nlm_prot_impl.c:1199
> #12 0xffffffff806e680f in nlm_clearlock (host=0xffffff0001705000,
> ext=0xffffff803e69b9a0, vers=4, timo=0xffffff803e69b9d0,
> retries=2147483647, vp=0xffffff004881edc8, op=2,
> fl=0xffffff803e69bac0, flags=64, svid=9336, fhlen=32,
> fh=0xffffff803e69b750,
> size=689) at /usr/src/sys/nlm/nlm_advlock.c:943
> #13 0xffffffff806e7801 in nlm_advlock_internal (vp=0xffffff004881edc8,
> id=Variable "id" is not available.
> ) at /usr/src/sys/nlm/nlm_advlock.c:355
> #14 0xffffffff806e8166 in nlm_advlock (ap=Variable "ap" is not
> available.
> ) at /usr/src/sys/nlm/nlm_advlock.c:392
> #15 0xffffffff806ced28 in nfs_advlock (ap=0xffffff803e69ba90) at
> /usr/src/sys/nfsclient/nfs_vnops.c:3153
> #16 0xffffffff804f40e2 in closef (fp=0xffffff0073716d80,
> td=0xffffff0001b50ae0) at vnode_if.h:1036
> #17 0xffffffff804f462b in kern_close (td=0xffffff0001b50ae0,
> fd=Variable "fd" is not available.
> ) at /usr/src/sys/kern/kern_descrip.c:1125
> #18 0xffffffff807ece67 in syscall (frame=0xffffff803e69bc80) at
> /usr/src/sys/amd64/amd64/trap.c:920
> #19 0xffffffff807d635b in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:339
> #20 0x00000008009c5b1c in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> 
I believe your crash is fixed in 8/stable or later systems. Unfortunately, it
is not a trivial patch to backport to 7.n. It's r193437, which isn't too hard
to put into the old code. The hard part is I'm not sure if you also need
r193272, which could be really ugly to backport.

I don't even have a 7.n system to test with at this point, so I can't try the
backport, but you can try putting r193437 in 7.n if you feel up to it and see
how it works.

Btw, I don't think this is the same crash as reported in by kern/156168. I
don't have any insight w.r.t. a fix for that one at this time.

rick

> --
> mark saad | nonesuch at longcount.org
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"


More information about the freebsd-fs mailing list