FreeBSD-9.1: machine reboots during snapshot creation, LORs found

Andre Albsmeier Andre.Albsmeier at siemens.com
Thu Jul 4 05:14:17 UTC 2013


On Mon, 17-Jun-2013 at 21:30:31 +0200, John Baldwin wrote:
> On Sunday, June 16, 2013 2:39:42 am Andre Albsmeier wrote:
> > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > > > Each day at 5:15 we are generating snapshots on various machines.
> > > > This used to work perfectly under 7-STABLE for years but since
> > > > we started to use 9.1-STABLE the machine reboots in about 10%
> > > > of all cases.
> > > > 
> > > > After rebooting we find a new snapshot file which is a bit
> > > > smaller than the good ones and with different permissions
> > > > It does not succeed a fsck. In this example it is the one
> > > > whose name is beginning with s3:
> > > > 
> > > > -r--r-----   1 root  operator  snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04
> > > > -r--------   1 root  operator  snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03
> > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44
> > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03
> > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03
> > > > 
> > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > > > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > > > 
> > > > May 29 05:15:00 <kern.crit> palveli kernel: lock order reversal:
> > > > May 29 05:15:00 <kern.crit> palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240
> > > > May 29 05:15:00 <kern.crit> palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > > > May 29 05:15:04 <kern.crit> palveli kernel: lock order reversal:
> > > > May 29 05:15:04 <kern.crit> palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > > > May 29 05:15:04 <kern.crit> palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > > > 
> > > > Unfortunatley no corefiles are being generated ;-(.
> > > > 
> > > > I have checked and even rebuilt the (UFS1) fs in question
> > > > from scratch. I have also seen this happen on an UFS2 on
> > > > another machine and on a third one when running "dump -L"
> > > > on a root fs.
> > > > 
> > > > Any hints of how to proceed?
> > > 
> > > Would it be possible to setup a serial console that is logged on this machine
> > > to see if it is panic'ing but failing to write out a crashdump?
> > 
> > Couldn't attach the serial console yet ;-(. But I had people
> > attach a KVMoverIP switch and enabled the various KDB options
> > in the kernel. Now we can see a bit more (see below) -- no
> > crashdump is being generated though.
> 
> :(  Unfortunately these LORs don't really help with discerning the cause of
> the reboot.  If you have remote power access (and still wanted to test this)
> one option would be to change KDB to drop into the debugger on a panic.
> Then you could connect over the KVM and take images of the original panic
> along with a stack trace.

After a few days of no problems, the box decided to crash
during mksnap_ffs today ;-(. But now I have a crashdump,
see below. Unfortunatley, I cannot upload the dump somewhere
but if you ask me check whatever things I'll be happy to help.

kgdb /usr/obj/src/src-9/sys/palveli/kernel.debug vmcore.4
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xcfb5e000
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0xc07cb2fe
stack pointer           = 0x28:0xd83545d0
frame pointer           = 0x28:0xd835490c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12929 (mksnap_ffs)
trap number             = 12
panic: page fault
KDB: stack backtrace:
db_trace_self_wrapper(c08207eb,d835441c,c05fdfc9,c081df13,c08a82e0,...) at db_trace_self_wrapper+0x26/frame 0xd83543ec
kdb_backtrace(c081df13,c08a82e0,c0801bfa,d8354428,d8354428,...) at kdb_backtrace+0x29/frame 0xd83543f8
panic(c0801bfa,c0845a01,c2bafae4,1,1,...) at panic+0xc9/frame 0xd835441c
trap_fatal(c0ff6000,cfb5e000,2,0,265abf,...) at trap_fatal+0x353/frame 0xd835445c
trap_pfault(140da,0,c2baf930,c08b6a40,c282145c,...) at trap_pfault+0x2d7/frame 0xd83544a4
trap(d8354590) at trap+0x41a/frame 0xd8354584
calltrap() at calltrap+0x6/frame 0xd8354584
--- trap 0xc, eip = 0xc07cb2fe, esp = 0xd83545d0, ebp = 0xd835490c ---
bcopy(c2b36548,c2f194e0,0,0,0,...) at bcopy+0x1a/frame 0xd835490c
ffs_mount(c2b36548,c2db9000,ff,d8354c08,c2b665e4,...) at ffs_mount+0x15ee/frame 0xd8354a3c
vfs_donmount(c2baf930,10313108,0,c2b8ba80,c2b8ba80,...) at vfs_donmount+0x196b/frame 0xd8354c2c
sys_nmount(c2baf930,d8354ccc,c2bafc18,d8354c6c,c0605015,...) at sys_nmount+0x63/frame 0xd8354c50
syscall(d8354d08) at syscall+0x2ce/frame 0xd8354cfc
Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xd8354cfc
--- syscall (378, FreeBSD ELF32, sys_nmount), eip = 0x180bdf37, esp = 0xbfbfd65c, ebp = 0xbfbfddd8 ---
Uptime: 2d21h49m21s
Physical memory: 503 MB
Dumping 108 MB: 93 77 61 45 29 13

No symbol "stopped_cpus" in current context.
No symbol "stoppcbs" in current context.
#0  doadump (textdump=1) at pcpu.h:249
249     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) where
#0  doadump (textdump=1) at pcpu.h:249
#1  0xc05fdddd in kern_reboot (howto=260) at /src/src-9/sys/kern/kern_shutdown.c:449
#2  0xc05fe028 in panic (fmt=<value optimized out>) at /src/src-9/sys/kern/kern_shutdown.c:637
#3  0xc07cd1d3 in trap_fatal (frame=0xd8354590, eva=3484803072)
    at /src/src-9/sys/i386/i386/trap.c:1044
#4  0xc07cd4b7 in trap_pfault (frame=0xd8354590, usermode=0, eva=3484803072)
    at /src/src-9/sys/i386/i386/trap.c:957
#5  0xc07ce05a in trap (frame=0xd8354590) at /src/src-9/sys/i386/i386/trap.c:555
#6  0xc07ba88c in calltrap () at /src/src-9/sys/i386/i386/exception.s:170
#7  0xc07cb2fe in bcopy () at /src/src-9/sys/i386/i386/support.s:196
Previous frame inner to this frame (corrupt stack?)
(kgdb) 

	-Andre


More information about the freebsd-stable mailing list