[Bug 235582] rpc_svc_gss / nfsd kernel panic

Thu Feb 7 18:24:35 UTC 2019

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235582

            Bug ID: 235582
           Summary: rpc_svc_gss / nfsd kernel panic
           Product: Base System
           Version: 11.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: peter.x.eriksson at liu.se

We have recently gone "live" with more NFS users "banging" on our FreeBSD-based
fileservers. And now something seems to have started triggering kernel panics.
Since the they major difference from before is the number of NFS users so this
is the major suspect...

We just caught a panic and got a screendump from the console and the stack
traceback shows:

> Fatal trap 12: page fault while in kernel mode
> cpuid = 8; apic id = 08
> fault virtual addresa  = 0x0
> fault code             = supervisor read data, page not present
> instruction pointer    = 0x20:0xffffffff82b578e9
> stack pointer          = 0x20:0xfffffe3fdc627760
> code segment           = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags       = interrupt enabled, resume, IOPL = 0
> current process        = 2519 (nfsd: service)
> trap number            = 12
> panic: page fault
> cpuid = 8
> KDB: stack backtrace
> #0 0xffffffff80b3d577 at kdb_backtrace+0x67
> #1 0xffffffff80af6b17 at vpanic+0x177
> #2 0xffffffff80af6993 at panic+0x43
> #3 0xffffffff80f77fdf at trap_fatal+0x35f
> #4 0xffffffff80f78039 at trap_pfault+0x49
> #5 0xffffffff80f77807 at trap+0x2c7
> #6 0xffffffff80f56fbc at calltrap+0x8
> #7 0xffffffff82b5d4d2 at svc_rpc_gss+0x8f2
> # 8 0xffffffff80d6c1b6 at svc_run_internal+0x726
> #9 0xffffffff80d6cd4b at svc_thread_start+0xb
> #10 0xffffffff80aba093 at fork_exit+0x8
> #11 0xffffffff80f48ede at fork_trampoline+0xe

(Unfortunately not kernel crash dump from this machine).

Systems are: Dell PowerEdge R730xd with 256GB RAM, HBA330 (LSI 3008) SAS
controllers, ZFS-storage, Intel X710 10GE-ethernet machines running FreeBSD
11.2. No swap enabled. ZFS ARC capped to 128GB.

NFS v4.0 or v4.1 client with sec=krb5:krb5i:krb5p security. Most clients (if
not all) are running Linux CentOS or Ubuntu). Around 200 active clients per
server.

(Most clients are Windows users using SMB via Samba though)
We have enabled a crash dump device one a couple of the machines and are going
to enable it on more in order to try to get a crash-dump when the next server
panics...

Any ideas where this bug might be or how we could workaround it? (Disabling NFS
is unfortunately not an option).

-- 
You are receiving this mail because:
You are the assignee for the bug.