ZFS panics

Niki Denev nike_d at cytexbg.com
Sat Feb 2 10:19:09 UTC 2008


Hi,

I'm doing some stress testing on one server using ZFS and i have
experienced two kernel panics in the last days.

The machine runs AMD64 7.0-PRERELEASE on dual quad-core (8 cores
total) Intel Xeon 2.0Ghz, with 8Gigs of Ram.
The disk subsystem consists of eight hitachi SATA drives on a Areca
1231ML with 1G of cache memory and a battery backup.
I'm using GUID partitions only. One 10G for the system on UFS2 with
geom_journal, 10G swap/dump partition, and the rest
2.7TB is a ZFS pool.
I also have this in loader.conf :

vm.kmem_size="1G"
vm.kmem_size_max="1G"

I was running multiple bonnie++ instances in parallel writing and
reading from the ZFS pool.
The first time i ran 80 bonnie++ instances and the machine rebooted
after about 3 hours.
The second time i ran 16 bonnie++ instances and the machine survived
good 11 hours.

I've tried to use the "list" command in kdb as shown in the developers handbook
but it keeps saying "No source file for address XXX"

Here it is the first panic that i experienced. The second one looks identical :

(i'm not entirely sure that i load the zfs symbols properly?)

sm-srv221# kldstat |grep zfs
 2    1 0xffffffff80bfc000 f5a40    zfs.ko
sm-srv221# kgdb -q /boot/kernel/kernel.symbols /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x8:0xffffffff80c19d16
stack pointer           = 0x10:0xffffffffd996a8f0
frame pointer           = 0x10:0xffffffffd996a920
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 321 (txg_thread_enter)
trap number             = 12
panic: page fault
cpuid = 4
Uptime: 3h20m21s
Physical memory: 8177 MB
Dumping 522 MB: 507 491 475 459 443 427 411 395 379 363 347 331 315
299 283 267 251 235 219 203 187 171 155 139 123 107 91 75 59 43 27 11

#0  doadump () at pcpu.h:194
194     pcpu.h: No such file or directory.
        in pcpu.h

(kgdb) add-debug-symbols /boot/kernel/zfs.ko.symbols 0xffffffff80bfc000

(kgdb) list *0xffffffff80c19d16
0xffffffff80c19d16 is in dmu_objset_sync_dnodes
(/usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:707).
702                     ASSERT(dn->dn_dbuf->db_data_pending);
703                     /*
704                      * Initialize dn_zio outside dnode_sync()
705                      * to accomodate meta-dnode
706                      */
707                     dn->dn_zio = dn->dn_dbuf->db_data_pending->dr_zio;
708                     ASSERT(dn->dn_zio);
709
710                     ASSERT3U(dn->dn_nlevels, <=, DN_MAX_LEVELS);
711                     list_remove(list, dn);

(kgdb) bt
#0  doadump () at pcpu.h:194
#1  0x0000000000000004 in avl_balance2child ()
#2  0xffffffff80478619 in boot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:409
#3  0xffffffff80478a1d in panic (fmt=0x104 <Address 0x104 out of
bounds>) at /usr/src/sys/kern/kern_shutdown.c:563
#4  0xffffffff8074f174 in trap_fatal (frame=0xffffff0003377000,
eva=18446742974251873384) at /usr/src/sys/amd64/amd64/trap.c:724
#5  0xffffffff8074f545 in trap_pfault (frame=0xffffffffd996a840,
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641
#6  0xffffffff8074fe88 in trap (frame=0xffffffffd996a840) at
/usr/src/sys/amd64/amd64/trap.c:410
#7  0xffffffff80735aee in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:169
#8  0xffffffff80c19d16 in dmu_objset_sync_dnodes
(list=0xffffff0003730d20, tx=0xffffff0137f9e800)
    at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:707
#9  0xffffffff80c19e7d in dmu_objset_sync (os=0xffffff0003730c00,
pio=0xffffff0131a4fac0, tx=0xffffff0137f9e800)
    at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:809
#10 0xffffffff80c27372 in dsl_pool_sync (dp=0xffffff00032b2800, txg=15331)
    at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:188
#11 0xffffffff80c31da0 in spa_sync (spa=0xffffff00032be000, txg=15331)
at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/spa.c:2989
#12 0xffffffff80c37abf in txg_sync_thread (arg=Variable "arg" is not available.
) at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/txg.c:331
#13 0xffffffff80459d33 in fork_exit (callout=0xffffffff80c37990
<txg_sync_thread>, arg=0xffffff00032b2800, frame=0xffffffffd996ac80)
    at /usr/src/sys/kern/kern_fork.c:781
#14 0xffffffff80735ebe in fork_trampoline () at
/usr/src/sys/amd64/amd64/exception.S:415
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000001 in avl_balance2child ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000e06000 in ?? ()
#40 0xffffffff80a7a740 in tdq_cpu ()
#41 0xffffffff80a83f40 in tdq_groups ()
#42 0xffffffff80a83d40 in tdq_cpu ()
#43 0xffffff0003377000 in ?? ()
#44 0xffffffff80a77540 in tdg_maxid ()
#45 0xffffffffd996a4b8 in ?? ()
#46 0xffffff0003377000 in ?? ()
#47 0xffffffff80496bc8 in sched_switch (td=0xffffffff80c37990,
newtd=0x0, flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1898
#48 0x0000000000000000 in ?? ()
#49 0x0000000000000000 in ?? ()
#50 0x0000000000000000 in ?? ()
#51 0x0000000000000000 in ?? ()
#52 0x0000000000000000 in ?? ()
#53 0x0000000000000000 in ?? ()
#54 0x0000000000000000 in ?? ()


More information about the freebsd-fs mailing list