panic ffs_truncate3 (maybe fuse being evil)
Rick Macklem
rmacklem at uoguelph.ca
Wed Jan 13 15:40:28 UTC 2016
I wrote:
> Kostik wrote:
> > On Sun, Jan 10, 2016 at 10:01:57AM -0500, Rick Macklem wrote:
> > > Hi,
> > >
> > > When fooling around with GlusterFS, I can get this panic intermittently.
> > > (I had a couple yesterday.) This happens on a Dec. 5, 2015 head kernel.
> > >
> > > panic: ffs_truncate3
> > > - backtrace without the numbers (I just scribbled it off the screen)
> > > ffs_truncate()
> > > ufs_inactive()
> > > VOP_INACTIVE_APV()
> > > vinactive()
> > > vputx()
> > > kern_unlinkat()
> > >
> > > So, at a glance, it seems that either
> > > b_dirty.bv_cnt
> > > or b_clean.bv_cnt
> > > is non-zero. (There is another case for the panic, but I thought it
> > > was less likely?)
> > >
> > > So, I'm wondering if this might be another side effect of r291460,
> > > since after that a new vnode isn't completely zero'd out?
> > >
> > > However, shouldn't bo_dirty.bv_cnt and bo_clean.bv_cnt be zero when
> > > a vnode is recycled?
> > > Does this make sense or do some fields of v_bufobj need to be zero'd
> > > out by getnewvnode()?
> > Look at the _vdrop(). When a vnode is freed to zone, it is asserted
> > that bufobj queues are empty. I very much doubt that it is possible
> > to leak either buffers or counters by reuse.
> >
> > >
> > > GlusterFS is using fuse and I suspect that fuse isn't cleaning out
> > > the buffers under some circumstance (I already noticed that there
> > > isn't any code in its fuse_vnop_reclaim() and I vaguely recall that
> > > there are conditions where VOP_INACTIVE() gets skipped, so that
> > > VOP_RECLAIM()
> > > has to check for anything that would have been done by VOP_INACTIVE()
> > > and do it, if it isn't already done.)
> > But even if fuse leaves the buffers around, is it UFS which panics for
> > you ? I would rather worry about dandling pointers and use after free in
> > fuse, which is a known issue with it anyway. I.e. it could be that fuse
> > operates on reclaimed and reused vnode as its own.
> >
> > >
> > > Anyhow, if others have thoughts on this (or other hunches w.r.t. what
> > > could cause this panic(), please let me know.
> >
> > The ffs_truncate3 was deterministically triggered by a bug in ffs_balloc().
> > The routine allocated buffers for indirect blocks, but if the blocks cannot
> > be allocated, the buffers where left on queue. See r174973, this was fixed
> > very long time ago.
> >
> Well, although I have r174973 in the kernel that crashes, it looks like this
> bug might have been around for a while.
> Here's what I've figured out sofar.
> 1 - The crashes only occur if soft updates are disabled. This isn't
> surprising
> if you look at ffs_truncate(), since the test for the panic isn't done
> when soft updates are enabled.
> Here's the snippet from ffs_truncate(), in case you are interested:
> if (DOINGSOFTDEP(vp)) {
> 335 if (softdeptrunc == 0 && journaltrunc == 0) {
> 336 /*
> 337 * If a file is only partially truncated, then
> 338 * we have to clean up the data structures
> 339 * describing the allocation past the truncation
> 340 * point. Finding and deallocating those
> structures
> 341 * is a lot of work. Since partial truncation
> occurs
> 342 * rarely, we solve the problem by syncing the
> file
> 343 * so that it will have no data structures left.
> 344 */
> 345 if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) !=
> 0)
> 346 return (error);
> 347 } else {
> 348 flags = IO_NORMAL | (needextclean ? IO_EXT: 0);
> 349 if (journaltrunc)
> 350 softdep_journal_freeblocks(ip, cred,
> length,
> 351 flags);
> 352 else
> 353 softdep_setup_freeblocks(ip, length,
> flags);
> 354 ASSERT_VOP_LOCKED(vp, "ffs_truncate1");
> 355 if (journaltrunc == 0) {
> 356 ip->i_flag |= IN_CHANGE | IN_UPDATE;
> 357 error = ffs_update(vp, 0);
> 358 }
> 359 return (error);
> 360 }
> 361 }
> You can see that it always returns once in this code block. The only way the
> code can get
> past this block if soft updates are enabled is a "goto extclean;", which
> takes you past
> the "panic()".
>
> By adding a few printf()s, I have determined:
> - The bo_clean.bv_cnt == 1 when the panic occurs and the b_lblkno of the
> buffer is -ve.
>
> If you look at vtruncbuf():
> trunclbn = (length + blksize - 1) / blksize;
> 1726
> 1727 ASSERT_VOP_LOCKED(vp, "vtruncbuf");
> 1728 restart:
> 1729 bo = &vp->v_bufobj;
> 1730 BO_LOCK(bo);
> 1731 anyfreed = 1;
> 1732 for (;anyfreed;) {
> 1733 anyfreed = 0;
> 1734 TAILQ_FOREACH_SAFE(bp, &bo->bo_clean.bv_hd, b_bobufs,
> nbp) {
> 1735 if (bp->b_lblkno < trunclbn)
> 1736 continue;
> When length == 0 --> trunclbn is 0, but the test at line#1735 will skip over
> the b_lblkno
> because it is negative.
>
> That is as far as I've gotten. A couple of things I need help from others on:
> - Is vtruncbuf() skipping over the cases where b_lblkno < 0 a feature or a
> bug?
> - If it is a feature, then what needs to be done in the code after the
> vtruncbuf()
> call in ffs_truncate() to ensure the buffer is gone by the time the panic
> check is
> done?
> --> I do see a bunch of code after the vtruncbuf() call related to indirect
> blocks
> (which I think use the -ve b_lblkno?), but I'll admit I don't understand
> it well
> enough to know if it expects vtruncbuf() to leave the -ve block on the
> bo_hd list?
>
> Obviously fixing vtruncbuf() to get rid of these -ve b_lblkno entries would
> be easy,
> but I don't know if that is a feature or a bug?
>
> I did look at the commit logs and vtruncbuf() has been like this for at least
> 10years.
> (I can only guess very few run UFS without soft updates or others would see
> these panic()s.)
>
> I am now running with soft updates enabled to avoid the crashes, but I can
> easily test any
> patch if others can a patch to try.
>
Oh, and one more thing.
Maybe having the buffer for an indirect block hanging off the vnode at the
end of ffs_truncate() to 0 length is ok. After all, this is happening in
VOP_INACTIVE() and the vnode isn't being recycled yet?
(ie. The panic() test is not needed?)
rick
> Thanks for your help with this, rick
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>
More information about the freebsd-fs
mailing list