ZFS-NFS kernel panic under load

Mon Aug 18 19:29:15 UTC 2008

Update on what else I have tried (all yeild same results, same 
backtraces, no indication in logs/console of why it is panicing other 
than page fault:  (FYI--I have tried to load 8-CURRENT, but it panics 
during install on the Dell 2950-3 I am using, I see a patch for a newer 
port of zfs, that looks like for 8, is there a patch for 7.0-RELEASE?)

I have tried breaking it into two smaller < 2TB filesystems and performed 
same test on one, still

I tried disabling swap all together (although I wasn't swapping)

I upped number of nfs daemons from 12 to 100

I turned on zfs debugging and WITNESS to see if anything would show, like 
locking issues (nothing shows)

I ran loops every 3s to monitor max vnodes, kmem, and arc during testes 
and up until the panic nothing was climbing

I turned off ZIL and disabled prefetch, the problem still occurs

I didn't get a panic in these situations:

I created a zfs mirror filesystem of only two drives (one on each chasis) 
and performed the test

I took one drive, created a UFS filesystem and performed the test.

If memory serves me right, sometime around Aug 6, Weldon S Godfrey 3 told me:

>
> Hello,
>
> Please forgive me, I didn't really see this discussed in the archives but I am 
> wondering if anyone has seen this issue.  I can replicate this issue under 
> FreeBSD amd64 7.0-RELEASE and the latest -STABLE (RELENG_7).  I do not 
> replicate any problems running 9 instances of postmark on the machine 
> directly, so the issue appears to be isolated with NFS.
>
> There are backtraces and more information in ticket kern/124280
>
> I am experiencing random kernel panics while running postmark benchmark from 9 
> NFS clients (clients on RedHat) to a 3TB ZFS filesystem exported with NFS. 
> The panics happen as soon as 5 mins from starting the benchmark or may take 
> hours before it panics and reboots.  It doesn't correspond to a time a cron 
> job is going on.  I am using the following settings in postmark:
>
> set number 20000
> set transactions 10000000
> set subdirectories 1000
> set size 10000 15000
> set report verbose
> set location /var/mail/store1/X  (where X is a number 1-9 so each is operating 
> in its own tree)
>
> The problem happens if I run 1 postmark on 9 NFS clients at the same time 
> (each client is its own server) or if I run 9 postmarks on one NFS client.
>
> commands used to create filesystem:
> zpool create tank mirror da0 da12 mirror da1 da13 mirror da2 da14 mirror da3 
> da15\
> mirror da4 da16 mirror da5 da17 mirror da6 da18 mirror da7 da19 mirror da8 
> da20 \
> mirror da9 da21 mirror da10 da22 spare da11 da23
> zfs set atime=off tank
> zfs create tank/mail
> zfs set mountpoint=/var/mail tank/mail
> zfs set sharenfs="-maproot=root  -network 192.168.2.0 -mask 255.255.255.0" 
> tank/mail
>
> I am using a 3ware 9690 SAS controller.  I have 2 IBM EXP3000 enclosures, each 
> drive is shown as single disk by the controller.
>
>
> this is my loader.conf:
> vm.kmem_size_max="1073741824"
> vm.kmem_size="1073741824"
> kern.maxvnodes="800000"
> vfs.zfs.prefetch_disable="1"
> vfs.zfs.cache_flush_disable="1"
>
> (I should note that kern.maxnodes in loader.conf does not appear to do 
> anything, after boot, it is shown to be at 100000 with sysctl.  It does change 
> to 800000 if I manually set it with sysctl.  However it appears my vnode usage 
> sits at around 25-26K and is near that within 5s of the panic.
>
> The server has 16GB of RAM, and 2 quad core XEON processors.
>
> This server is only a NFS fileserver.  The only non-default daemon running is 
> sshd.  It is running the GENERIC kernel, right now, unmodified.
>
> I am using two NICs.  NFS is exported only on the secondary NIC.  Each NIC is 
> in it's own subnet.
>
>
> nothing in /var/log/messages near time of panic except:
> Aug  6 08:45:30 store1 savecore: reboot after panic: page fault
> Aug  6 08:45:30 store1 savecore: writing core to vmcore.2
>
> I can provide cores if needed.
>
> Thank you for your time!
>
> Weldon
>
>
>
> kgdb with backtrace:
>
> store1# kgdb kernel.debug /var/crash/vmcore.2
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
>
> Unread portion of the kernel message buffer:
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 5; apic id = 05
> fault virtual address   = 0xdc
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x8:0xffffffff8063b3d8
> stack pointer           = 0x10:0xffffffffdfbc5720
> frame pointer           = 0x10:0xffffff00543ed000
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                        = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 839 (nfsd)
> trap number             = 12
> panic: page fault
> cpuid = 5
> Uptime: 18m53s
> Physical memory: 16366 MB
> Dumping 1991 MB: 1976 1960 1944 1928 1912 1896 1880 1864 1848 1832 1816 1800 
> 1784 1768 1752 1736 1720 1704 1688 1672 1656 1640 1624 1608 1592 1576 1560 
> 1544 1528 1512 1496 1480 1464 1448 1432 1416 1400 1384 1368 1352 1336 1320 
> 1304 1288 1272 1256 1240 1224 1208 1192 1176 1160 1144 1128 1112 1096 1080 
> 1064 1048 1032 1016 1000 984 968 952 936 920 904 888 872 856 840 824 808 792 
> 776 760 744 728 712 696 680 664 648 632 616 600 584 568 552 536 520 504 488 
> 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 
> 168 152 136 120 104 88 72 56 40 24 8
>
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
> /boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> #0  doadump () at pcpu.h:194
> 194             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
> (kgdb) backtrace
> #0  doadump () at pcpu.h:194
> #1  0x0000000000000004 in ?? ()
> #2  0xffffffff804a7049 in boot (howto=260) at 
> /usr/src/sys/kern/kern_shutdown.c:418
> #3  0xffffffff804a744d in panic (fmt=0x104 <Address 0x104 out of bounds>) at 
> /usr/src/sys/kern/kern_shutdown.c:572
> #4  0xffffffff807780e4 in trap_fatal (frame=0xffffff000bce26c0, 
> eva=18446742974395967712)
>    at /usr/src/sys/amd64/amd64/trap.c:724
> #5  0xffffffff807784b5 in trap_pfault (frame=0xffffffffdfbc5670, usermode=0) 
> at /usr/src/sys/amd64/amd64/trap.c:641
> #6  0xffffffff80778de8 in trap (frame=0xffffffffdfbc5670) at 
> /usr/src/sys/amd64/amd64/trap.c:410
> #7  0xffffffff8075e7ce in calltrap () at 
> /usr/src/sys/amd64/amd64/exception.S:169
> #8  0xffffffff8063b3d8 in nfsrv_access (vp=0xffffff00207d7dc8, flags=128, 
> cred=0xffffff00403d4800, rdonly=0,
>    td=0xffffff000bce26c0, override=0) at 
> /usr/src/sys/nfsserver/nfs_serv.c:4284
> #9  0xffffffff8063c4f1 in nfsrv3_access (nfsd=0xffffff00543ed000, 
> slp=0xffffff0006396d00, td=0xffffff000bce26c0,
>    mrq=0xffffffffdfbc5af0) at /usr/src/sys/nfsserver/nfs_serv.c:234
> #10 0xffffffff8064cd1d in nfssvc (td=Variable "td" is not available.
> ) at /usr/src/sys/nfsserver/nfs_syscalls.c:456
> #11 0xffffffff80778737 in syscall (frame=0xffffffffdfbc5c70) at 
> /usr/src/sys/amd64/amd64/trap.c:852
> #12 0xffffffff8075e9db in Xfast_syscall () at 
> /usr/src/sys/amd64/amd64/exception.S:290
> #13 0x0000000800687acc in ?? ()
> Previous frame inner to this frame (corrupt stack?)
>