Serious FS hangs and panics on 10.1

Walter Hop freebsd at spam.lifeforms.nl
Fri Dec 12 17:16:28 UTC 2014


Hi all,

As some may have read on -stable, various users are having system hangs since 10.1-RC when unmounting the root filesystem on 10.1 with UFS+softupdates. I'll recap: hangs occur for instance when /sbin/init has been meddled with, so people experience it generally after running freebsd-update. With the 10.1-p1 update, the bug and mailinglist posts got additional activity, so it's a recurring theme. I verified the problem still exists in CURRENT, and found lock order reversals which may or may not be related. (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458)

Now the above problem has a simple mitigation: just disable softupdates before doing freebsd-update, and you won't hang. Okay, a little startling, but I’m still sleeping okay.

Now today, the 10.1 story seems to look a lot worse, with a 10.1 box getting back-to-back kernel panics in VFS functions. This is a box serving SVN repositories, and SVN is known to exercise a filesystem pretty thoroughly (even uncovering NTFS bugs in pre-SP1 Windows7). We’ve updated this box from 10.0 to 10.1 a week ago. The four panics that we saw (trace below), had the exact same instruction pointer and stack trace, so I'm pretty positive we're not looking at a random hardware fluke.

The last panics were spaced only minutes apart, which was pretty scary. I was fearing persistent disk corruption, but the panics stopped when... I disabled softupdates! This was my first shot, as this also solved my other stability problem on 10.1. Anyway, the machine has been stable so far.

Maybe these two problems are unrelated, it might be too early to tell, but in any case, I am getting the strong vibe that something was changed in UFS/VFS/softupdates between 10.0 and 10.1 that's possibly very problematic and has a risk of causing data loss in the future.

Our experience with 10.0 has been remarkably good (same for earlier releases for that matter... in fact I don't think I can remember the last kernel panic in production at all.. maybe on 5.2-STABLE?) So, that's why we were very happy to see 10.1; but it feels really troublesome in the filesystem department, which is very uncharacteristic for FreeBSD.

That said, I'd prefer spending some more energy on getting 10.1 working well, rather than downgrading or jumping to other systems... But I think it really needs some love.

Any ideas on what we could do?

Thanks!
WH

-- 
Walter Hop | PGP key: https://lifeforms.nl/pgp


Panic:

kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid = 0; apic id = 00
kernel: fault virtual address      = 0x30058
kernel: fault code         = supervisor write data, page not present
kernel: instruction pointer        = 0x20:0xffffffff8090e46a
kernel: stack pointer              = 0x28:0xfffffe000024d780
kernel: frame pointer              = 0x28:0xfffffe000024d850
kernel: code segment               = base 0x0, limit 0xfffff, type 0x1b
kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags   = interrupt enabled, resume, IOPL = 0
kernel: current process            = 27466 (httpd)
kernel: trap number                = 12
kernel: panic: page fault
kernel: cpuid = 0
kernel: KDB: stack backtrace:
kernel: #0 0xffffffff80963000 at kdb_backtrace+0x60
kernel: #1 0xffffffff80928125 at panic+0x155
kernel: #2 0xffffffff80d24f1f at trap_fatal+0x38f
kernel: #3 0xffffffff80d25238 at trap_pfault+0x308
kernel: #4 0xffffffff80d2489a at trap+0x47a
kernel: #5 0xffffffff80d0a782 at calltrap+0x8
kernel: #6 0xffffffff8090ec35 at lf_advlock+0x45
kernel: #7 0xffffffff809b8e69 at vop_stdadvlock+0xa9
kernel: #8 0xffffffff80e44247 at VOP_ADVLOCK_APV+0xa7
kernel: #9 0xffffffff808e4919 at kern_fcntl+0xb39
kernel: #10 0xffffffff808e3d5c at kern_fcntl_freebsd+0xac
kernel: #11 0xffffffff80d25851 at amd64_syscall+0x351
kernel: #12 0xffffffff80d0aa6b at Xfast_syscall+0xfb



More information about the freebsd-fs mailing list