FreeBSD-9.1: machine reboots during snapshot creation, LORs found

Konstantin Belousov kostikbel at gmail.com
Fri Jul 12 06:35:45 UTC 2013


On Fri, Jul 12, 2013 at 08:05:27AM +0200, Andre Albsmeier wrote:
> On Fri, 12-Jul-2013 at 08:01:12 +0200, Konstantin Belousov wrote:
> > On Fri, Jul 12, 2013 at 07:24:40AM +0200, Andre Albsmeier wrote:
> > > On Thu, 04-Jul-2013 at 19:25:28 +0200, Konstantin Belousov wrote:
> > > > On Thu, Jul 04, 2013 at 04:29:19PM +0200, Andre Albsmeier wrote:
> > > > > OK, patch is applied. I will reboot the machine later
> > > > > and see what happens tomorrow in the morning. However,
> > > > > it might take a few days since the last 2 weeks all was
> > > > > fine.
> > > > > 
> > > > > BTW, should this patch be used in general or is it just
> > > > > for debugging? My understanding is that it is something
> > > > > which could stay in the code...
> > > > 
> > > > Patch is to improve debugging.
> > > > 
> > > > I probably commit it after the issue is closed.  Arguments against
> > > > the commit is that the change imposes small performance penalty
> > > > due to save and restore of the %ebp (I doubt that this is measureable
> > > > by any means).  Also, arguably, such change should be done for all
> > > > functions in support.s, but bcopy() is the hot spot.
> > > 
> > > Got a new one, 2 hours old ;-)
> > > 
> > > GNU gdb 6.1.1 [FreeBSD]
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you are
> > > welcome to change it and/or distribute copies of it under certain conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > This GDB was configured as "i386-marcel-freebsd"...
> > > 
> > > Unread portion of the kernel message buffer:
> > > 
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > fault virtual address   = 0xcd5ec000
> > > fault code              = supervisor write, page not present
> > > instruction pointer     = 0x20:0xc07cb2fe
> > > stack pointer           = 0x28:0xd82e45cc
> > > frame pointer           = 0x28:0xd82e45d4
> > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > >                         = DPL 0, pres 1, def32 1, gran 1
> > > processor eflags        = interrupt enabled, resume, IOPL = 0
> > > current process         = 18714 (mksnap_ffs)
> > > trap number             = 12
> > > panic: page fault
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper(c08207eb,d82e4418,c05fdfc9,c081df13,c08a82e0,...) at db_trace_self_wrapper+0x26/frame 0xd82e43e8
> > > kdb_backtrace(c081df13,c08a82e0,c0801bfa,d82e4424,d82e4424,...) at kdb_backtrace+0x29/frame 0xd82e43f4
> > > panic(c0801bfa,c0845a01,c2b067d4,1,1,...) at panic+0xc9/frame 0xd82e4418
> > > trap_fatal(c0ff6000,cd5ec000,2,0,c08b6bf4,...) at trap_fatal+0x353/frame 0xd82e4458
> > > trap_pfault(baa8454b,21510,0,c2b06620,c08b6bf0,...) at trap_pfault+0x2d7/frame 0xd82e44a0
> > > trap(d82e458c) at trap+0x41a/frame 0xd82e4580
> > > calltrap() at calltrap+0x6/frame 0xd82e4580
> > > --- trap 0xc, eip = 0xc07cb2fe, esp = 0xd82e45cc, ebp = 0xd82e45d4 ---
> > > bcopy(c36ed000,cd5e6000,8000,8000,c281b980,...) at bcopy+0x1a/frame 0xd82e45d4
> > > ffs_snapshot(c2b35a90,c2ed0400,0,0,0,...) at ffs_snapshot+0x2933/frame 0xd82e490c
> > > ffs_mount(c2b35a90,c322e200,ff,d82e4c08,c2ccbc8c,...) at ffs_mount+0x15ee/frame 0xd82e4a3c
> > > vfs_donmount(c2b06620,10313108,0,c2b74d80,c2b74d80,...) at vfs_donmount+0x196b/frame 0xd82e4c2c
> > > sys_nmount(c2b06620,d82e4ccc,c2b06908,d82e4c6c,c0605015,...) at sys_nmount+0x63/frame 0xd82e4c50
> > > syscall(d82e4d08) at syscall+0x2ce/frame 0xd82e4cfc
> > > Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xd82e4cfc
> > > --- syscall (378, FreeBSD ELF32, sys_nmount), eip = 0x180bdf37, esp = 0xbfbfd65c, ebp = 0xbfbfddd8 ---
> > > Uptime: 4d20h0m44s
> > > Physical memory: 503 MB
> > > Dumping 104 MB: 89 73 57 41 25 9
> > > 
> > > No symbol "stopped_cpus" in current context.
> > > No symbol "stoppcbs" in current context.
> > > #0  doadump (textdump=1) at pcpu.h:249
> > > 249     pcpu.h: No such file or directory.
> > >         in pcpu.h
> > > (kgdb) where
> > > #0  doadump (textdump=1) at pcpu.h:249
> > > #1  0xc05fdddd in kern_reboot (howto=260) at /src/src-9/sys/kern/kern_shutdown.c:449
> > > #2  0xc05fe028 in panic (fmt=<value optimized out>) at /src/src-9/sys/kern/kern_shutdown.c:637
> > > #3  0xc07cd1d3 in trap_fatal (frame=0xd82e458c, eva=3445538816)
> > >     at /src/src-9/sys/i386/i386/trap.c:1044
> > > #4  0xc07cd4b7 in trap_pfault (frame=0xd82e458c, usermode=0, eva=3445538816)
> > >     at /src/src-9/sys/i386/i386/trap.c:957
> > > #5  0xc07ce05a in trap (frame=0xd82e458c) at /src/src-9/sys/i386/i386/trap.c:555
> > > #6  0xc07ba88c in calltrap () at /src/src-9/sys/i386/i386/exception.s:170
> > > #7  0xc07cb2fe in bcopy () at /src/src-9/sys/i386/i386/support.s:198
> > > #8  0xc072be13 in ffs_snapshot (mp=0xc2b35a90, snapfile=0xc2ed0400 "s5-2013.07.12-03.15.01")
> > >     at /src/src-9/sys/ufs/ffs/ffs_snapshot.c:793
> > > #9  0xc0748e8e in ffs_mount (mp=0xc2b35a90) at /src/src-9/sys/ufs/ffs/ffs_vfsops.c:483
> > > #10 0xc068a72b in vfs_donmount (td=0xc2b06620, fsflags=271659272, fsoptions=0xc2b74d80)
> > >     at /src/src-9/sys/kern/vfs_mount.c:948
> > > #11 0xc068a8e3 in sys_nmount (td=0xc2b06620, uap=0xd82e4ccc) at /src/src-9/sys/kern/vfs_mount.c:417
> > > #12 0xc07cd7ae in syscall (frame=0xd82e4d08) at subr_syscall.c:135
> > > #13 0xc07ba8f1 in Xint0x80_syscall () at /src/src-9/sys/i386/i386/exception.s:270
> > > #14 0x00000033 in ?? ()
> > > Previous frame inner to this frame (corrupt stack?)
> > 
> > Please show me the first 100 lines of the output of dumpfs(8) on the
> > filesystem where snapshot creation caused the panic.
> 
> OK, dumpfs /dev/stripe/p | head -100:
> 
> magic	11954 (UFS1)	time	Fri Jul 12 08:02:40 2013
> id	[ 517fa356 4ecc9335 ]
> ncg	82	size	17774144	blocks	17737399
> bsize	32768	shift	15	mask	0xffff8000
> fsize	4096	shift	12	mask	0xfffff000
> frag	8	shift	3	fsbtodb	3
> minfree	8%	optim	time	symlinklen 60
> maxbpg	4096	maxcontig 4	contigsumsize 4
> nbfree	1958555	ndir	695	nifree	1123668	nffree	5395
> cpg	1	bpg	27415	fpg	219320	ipg	13824
> nindir	8192	inopb	256	nspf	8	maxfilesize	18016597801566207
> sbsize	4096	cgsize	32768	cgoffset 0	cgmask	0xffffffff
> csaddr	456	cssize	4096
> rotdelay 0ms	rps	60	trackskew 0	interleave 1
> nsect	1754560	npsect	1754560	spc	1754560
> sblkno	8	cblkno	16	iblkno	24	dblkno	456
> cgrotor	50	fmod	0	ronly	0	clean	0
> metaspace 0	avgfpdir 64	avgfilesize 16384
> flags	soft-updates 
> fsmnt	/palveli
> volname		swuid	0	providersize	17774144

UFS1, weird.

I believe I see the problem.  UFS1 superblock is not aligned on the
fs block boundary, and bcopy() call tried to do the full block copy.
In fact, when the snapshotting operation did not trap, you probably
get a data corruption in the unrelated buffer.

Please try the patch below.

diff --git a/sys/ufs/ffs/ffs_snapshot.c b/sys/ufs/ffs/ffs_snapshot.c
index ad157aa..c37706b 100644
--- a/sys/ufs/ffs/ffs_snapshot.c
+++ b/sys/ufs/ffs/ffs_snapshot.c
@@ -792,7 +792,7 @@ out1:
 		brelse(nbp);
 	} else {
 		loc = blkoff(fs, fs->fs_sblockloc);
-		bcopy((char *)copy_fs, &nbp->b_data[loc], fs->fs_bsize);
+		bcopy((char *)copy_fs, &nbp->b_data[loc], (u_int)fs->fs_sbsize);
 		bawrite(nbp);
 	}
 	/*
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130712/6ce45a95/attachment.sig>


More information about the freebsd-stable mailing list