FreeBSD-9.1: machine reboots during snapshot creation, LORs found

Konstantin Belousov kostikbel at gmail.com
Thu Jul 4 06:16:00 UTC 2013


On Thu, Jul 04, 2013 at 07:27:00AM +0200, Andre Albsmeier wrote:
> On Thu, 04-Jul-2013 at 07:24:40 +0200, Konstantin Belousov wrote:
> > On Thu, Jul 04, 2013 at 07:14:09AM +0200, Andre Albsmeier wrote:
> > > On Mon, 17-Jun-2013 at 21:30:31 +0200, John Baldwin wrote:
> > > > On Sunday, June 16, 2013 2:39:42 am Andre Albsmeier wrote:
> > > > > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> > > > > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > > > > > > Each day at 5:15 we are generating snapshots on various machines.
> > > > > > > This used to work perfectly under 7-STABLE for years but since
> > > > > > > we started to use 9.1-STABLE the machine reboots in about 10%
> > > > > > > of all cases.
> > > > > > > 
> > > > > > > After rebooting we find a new snapshot file which is a bit
> > > > > > > smaller than the good ones and with different permissions
> > > > > > > It does not succeed a fsck. In this example it is the one
> > > > > > > whose name is beginning with s3:
> > > > > > > 
> > > > > > > -r--r-----   1 root  operator  snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04
> > > > > > > -r--------   1 root  operator  snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03
> > > > > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44
> > > > > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03
> > > > > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03
> > > > > > > 
> > > > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > > > > > > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > > > > > > 
> > > > > > > May 29 05:15:00 <kern.crit> palveli kernel: lock order reversal:
> > > > > > > May 29 05:15:00 <kern.crit> palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240
> > > > > > > May 29 05:15:00 <kern.crit> palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > > > > > > May 29 05:15:04 <kern.crit> palveli kernel: lock order reversal:
> > > > > > > May 29 05:15:04 <kern.crit> palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > > > > > > May 29 05:15:04 <kern.crit> palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > > > > > > 
> > > > > > > Unfortunatley no corefiles are being generated ;-(.
> > > > > > > 
> > > > > > > I have checked and even rebuilt the (UFS1) fs in question
> > > > > > > from scratch. I have also seen this happen on an UFS2 on
> > > > > > > another machine and on a third one when running "dump -L"
> > > > > > > on a root fs.
> > > > > > > 
> > > > > > > Any hints of how to proceed?
> > > > > > 
> > > > > > Would it be possible to setup a serial console that is logged on this machine
> > > > > > to see if it is panic'ing but failing to write out a crashdump?
> > > > > 
> > > > > Couldn't attach the serial console yet ;-(. But I had people
> > > > > attach a KVMoverIP switch and enabled the various KDB options
> > > > > in the kernel. Now we can see a bit more (see below) -- no
> > > > > crashdump is being generated though.
> > > > 
> > > > :(  Unfortunately these LORs don't really help with discerning the cause of
> > > > the reboot.  If you have remote power access (and still wanted to test this)
> > > > one option would be to change KDB to drop into the debugger on a panic.
> > > > Then you could connect over the KVM and take images of the original panic
> > > > along with a stack trace.
> > > 
> > > After a few days of no problems, the box decided to crash
> > > during mksnap_ffs today ;-(. But now I have a crashdump,
> > > see below. Unfortunatley, I cannot upload the dump somewhere
> > > but if you ask me check whatever things I'll be happy to help.
> > > 
> > > kgdb /usr/obj/src/src-9/sys/palveli/kernel.debug vmcore.4
> > > GNU gdb 6.1.1 [FreeBSD]
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you are
> > > welcome to change it and/or distribute copies of it under certain conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > This GDB was configured as "i386-marcel-freebsd"...
> > > 
> > > Unread portion of the kernel message buffer:
> > > 
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > fault virtual address   = 0xcfb5e000
> > > fault code              = supervisor write, page not present
> > > instruction pointer     = 0x20:0xc07cb2fe
> > > stack pointer           = 0x28:0xd83545d0
> > > frame pointer           = 0x28:0xd835490c
> > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > >                         = DPL 0, pres 1, def32 1, gran 1
> > > processor eflags        = interrupt enabled, resume, IOPL = 0
> > > current process         = 12929 (mksnap_ffs)
> > > trap number             = 12
> > > panic: page fault
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper(c08207eb,d835441c,c05fdfc9,c081df13,c08a82e0,...) at db_trace_self_wrapper+0x26/frame 0xd83543ec
> > > kdb_backtrace(c081df13,c08a82e0,c0801bfa,d8354428,d8354428,...) at kdb_backtrace+0x29/frame 0xd83543f8
> > > panic(c0801bfa,c0845a01,c2bafae4,1,1,...) at panic+0xc9/frame 0xd835441c
> > > trap_fatal(c0ff6000,cfb5e000,2,0,265abf,...) at trap_fatal+0x353/frame 0xd835445c
> > > trap_pfault(140da,0,c2baf930,c08b6a40,c282145c,...) at trap_pfault+0x2d7/frame 0xd83544a4
> > > trap(d8354590) at trap+0x41a/frame 0xd8354584
> > > calltrap() at calltrap+0x6/frame 0xd8354584
> > > --- trap 0xc, eip = 0xc07cb2fe, esp = 0xd83545d0, ebp = 0xd835490c ---
> > > bcopy(c2b36548,c2f194e0,0,0,0,...) at bcopy+0x1a/frame 0xd835490c
> > > ffs_mount(c2b36548,c2db9000,ff,d8354c08,c2b665e4,...) at ffs_mount+0x15ee/frame 0xd8354a3c
> > 
> > From the crash dump in kgdb, do
> > list *ffs_mount+0x15ee
> 
> (kgdb) list *ffs_mount+0x15ee
> 0xc0748e8e is in ffs_mount (/src/src-9/sys/ufs/ffs/ffs_vfsops.c:483).
> 478
> 479                     /*
> 480                      * If this is a snapshot request, take the snapshot.
> 481                      */
> 482                     if (mp->mnt_flag & MNT_SNAPSHOT)
> 483                             return (ffs_snapshot(mp, fspec));
> 484             }
> 485
> 486             /*
> 487              * Not an update, or updating the name: look up the name

It is not useful, bcopy does not create a frame, so the real caller
of the failing bcopy gets lost.  It could be uncovered with some
stack digging, but I believe it would be easier just fix bcopy.

Please apply this patch and reproduce the panic again, then the kgdb
backtrace should be more useful.

diff --git a/sys/i386/i386/support.s b/sys/i386/i386/support.s
index 967a09e..779fa38 100644
--- a/sys/i386/i386/support.s
+++ b/sys/i386/i386/support.s
@@ -181,11 +181,13 @@ END(bcopyb)
  *  ws at tools.de     (Wolfgang Solfrank, TooLs GmbH) +49-228-985800
  */
 ENTRY(bcopy)
+	pushl	%ebp
+	movl	%esp,%ebp
 	pushl	%esi
 	pushl	%edi
-	movl	12(%esp),%esi
-	movl	16(%esp),%edi
-	movl	20(%esp),%ecx
+	movl	8(%ebp),%esi
+	movl	12(%ebp),%edi
+	movl	16(%ebp),%ecx
 
 	movl	%edi,%eax
 	subl	%esi,%eax
@@ -196,12 +198,13 @@ ENTRY(bcopy)
 	cld					/* nope, copy forwards */
 	rep
 	movsl
-	movl	20(%esp),%ecx
+	movl	16(%ebp),%ecx
 	andl	$3,%ecx				/* any bytes left? */
 	rep
 	movsb
 	popl	%edi
 	popl	%esi
+	popl	%ebp
 	ret
 
 	ALIGN_TEXT
@@ -214,7 +217,7 @@ ENTRY(bcopy)
 	std
 	rep
 	movsb
-	movl	20(%esp),%ecx			/* copy remainder by 32-bit words */
+	movl	16(%ebp),%ecx			/* copy remainder by 32-bit words */
 	shrl	$2,%ecx
 	subl	$3,%esi
 	subl	$3,%edi
@@ -223,6 +226,7 @@ ENTRY(bcopy)
 	popl	%edi
 	popl	%esi
 	cld
+	popl	%ebp
 	ret
 END(bcopy)
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130704/c9bc7ff6/attachment.sig>


More information about the freebsd-stable mailing list