FreeBSD-9.1: machine reboots during snapshot creation, LORs found

Andre Albsmeier Andre.Albsmeier at siemens.com
Wed Jul 3 18:18:07 UTC 2013


On Mon, 17-Jun-2013 at 21:30:31 +0200, John Baldwin wrote:
> On Sunday, June 16, 2013 2:39:42 am Andre Albsmeier wrote:
> > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > > > Each day at 5:15 we are generating snapshots on various machines.
> > > > This used to work perfectly under 7-STABLE for years but since
> > > > we started to use 9.1-STABLE the machine reboots in about 10%
> > > > of all cases.
> > > > 
> > > > After rebooting we find a new snapshot file which is a bit
> > > > smaller than the good ones and with different permissions
> > > > It does not succeed a fsck. In this example it is the one
> > > > whose name is beginning with s3:
> > > > 
> > > > -r--r-----   1 root  operator  snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04
> > > > -r--------   1 root  operator  snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03
> > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44
> > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03
> > > > -r--r-----   1 root  operator  snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03
> > > > 
> > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > > > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > > > 
> > > > May 29 05:15:00 <kern.crit> palveli kernel: lock order reversal:
> > > > May 29 05:15:00 <kern.crit> palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240
> > > > May 29 05:15:00 <kern.crit> palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > > > May 29 05:15:04 <kern.crit> palveli kernel: lock order reversal:
> > > > May 29 05:15:04 <kern.crit> palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > > > May 29 05:15:04 <kern.crit> palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > > > 
> > > > Unfortunatley no corefiles are being generated ;-(.
> > > > 
> > > > I have checked and even rebuilt the (UFS1) fs in question
> > > > from scratch. I have also seen this happen on an UFS2 on
> > > > another machine and on a third one when running "dump -L"
> > > > on a root fs.
> > > > 
> > > > Any hints of how to proceed?
> > > 
> > > Would it be possible to setup a serial console that is logged on this machine
> > > to see if it is panic'ing but failing to write out a crashdump?
> > 
> > Couldn't attach the serial console yet ;-(. But I had people
> > attach a KVMoverIP switch and enabled the various KDB options
> > in the kernel. Now we can see a bit more (see below) -- no
> > crashdump is being generated though.
> 
> :(  Unfortunately these LORs don't really help with discerning the cause of
> the reboot.  If you have remote power access (and still wanted to test this)
> one option would be to change KDB to drop into the debugger on a panic.
> Then you could connect over the KVM and take images of the original panic
> along with a stack trace.

After making core dumps work actually (dump device was stopped and
FreeBSD-9 doesn't start it automatically) and upgrading to a recent
version of 9.1-STABLE it _seems_ that the troubles are gone. In case
the problem reappears I'll come back ;-).

Thanks,

	-Andre

> 
> -- 
> John Baldwin

-- 
FreeBSD is the most powerful OS.
NetBSD  is the most portable OS.
OpenBSD is the most secure OS.
Windoze is the most popular OS.
Linux   is no OS.


More information about the freebsd-stable mailing list