panic ffs_truncate3 (maybe fuse being evil)

Wed Jan 13 15:40:28 UTC 2016

I wrote:
> Kostik wrote:
> > On Sun, Jan 10, 2016 at 10:01:57AM -0500, Rick Macklem wrote:
> > > Hi,
> > > 
> > > When fooling around with GlusterFS, I can get this panic intermittently.
> > > (I had a couple yesterday.) This happens on a Dec. 5, 2015 head kernel.
> > > 
> > > panic: ffs_truncate3
> > > - backtrace without the numbers (I just scribbled it off the screen)
> > > ffs_truncate()
> > > ufs_inactive()
> > > VOP_INACTIVE_APV()
> > > vinactive()
> > > vputx()
> > > kern_unlinkat()
> > > 
> > > So, at a glance, it seems that either
> > >    b_dirty.bv_cnt
> > > or b_clean.bv_cnt
> > > is non-zero. (There is another case for the panic, but I thought it
> > >               was less likely?)
> > > 
> > > So, I'm wondering if this might be another side effect of r291460,
> > > since after that a new vnode isn't completely zero'd out?
> > > 
> > > However, shouldn't bo_dirty.bv_cnt and bo_clean.bv_cnt be zero when
> > > a vnode is recycled?
> > > Does this make sense or do some fields of v_bufobj need to be zero'd
> > > out by getnewvnode()?
> > Look at the _vdrop().  When a vnode is freed to zone, it is asserted
> > that bufobj queues are empty.  I very much doubt that it is possible
> > to leak either buffers or counters by reuse.
> > 
> > > 
> > > GlusterFS is using fuse and I suspect that fuse isn't cleaning out
> > > the buffers under some circumstance (I already noticed that there
> > > isn't any code in its fuse_vnop_reclaim() and I vaguely recall that
> > > there are conditions where VOP_INACTIVE() gets skipped, so that
> > > VOP_RECLAIM()
> > > has to check for anything that would have been done by VOP_INACTIVE()
> > > and do it, if it isn't already done.)
> > But even if fuse leaves the buffers around, is it UFS which panics for
> > you ? I would rather worry about dandling pointers and use after free in
> > fuse, which is a known issue with it anyway. I.e. it could be that fuse
> > operates on reclaimed and reused vnode as its own.
> > 
> > > 
> > > Anyhow, if others have thoughts on this (or other hunches w.r.t. what
> > > could cause this panic(), please let me know.
> > 
> > The ffs_truncate3 was deterministically triggered by a bug in ffs_balloc().
> > The routine allocated buffers for indirect blocks, but if the blocks cannot
> > be allocated, the buffers where left on queue.  See r174973, this was fixed
> > very long time ago.
> > 
> Well, although I have r174973 in the kernel that crashes, it looks like this
> bug might have been around for a while.
> Here's what I've figured out sofar.
> 1 - The crashes only occur if soft updates are disabled. This isn't
> surprising
>     if you look at ffs_truncate(), since the test for the panic isn't done
>     when soft updates are enabled.
> Here's the snippet from ffs_truncate(), in case you are interested:
>        if (DOINGSOFTDEP(vp)) {
> 335 	                if (softdeptrunc == 0 && journaltrunc == 0) {
> 336 	                        /*
> 337 	                         * If a file is only partially truncated, then
> 338 	                         * we have to clean up the data structures
> 339 	                         * describing the allocation past the truncation
> 340 	                         * point. Finding and deallocating those
> structures
> 341 	                         * is a lot of work. Since partial truncation
> occurs
> 342 	                         * rarely, we solve the problem by syncing the
> file
> 343 	                         * so that it will have no data structures left.
> 344 	                         */
> 345 	                        if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) !=
> 0)
> 346 	                                return (error);
> 347 	                } else {
> 348 	                        flags = IO_NORMAL | (needextclean ? IO_EXT: 0);
> 349 	                        if (journaltrunc)
> 350 	                                softdep_journal_freeblocks(ip, cred,
> length,
> 351 	                                    flags);
> 352 	                        else
> 353 	                                softdep_setup_freeblocks(ip, length,
> flags);
> 354 	                        ASSERT_VOP_LOCKED(vp, "ffs_truncate1");
> 355 	                        if (journaltrunc == 0) {
> 356 	                                ip->i_flag |= IN_CHANGE | IN_UPDATE;
> 357 	                                error = ffs_update(vp, 0);
> 358 	                        }
> 359 	                        return (error);
> 360 	                }
> 361 	        }
> You can see that it always returns once in this code block. The only way the
> code can get
> past this block if soft updates are enabled is a "goto extclean;", which
> takes you past
> the "panic()".
> 
> By adding a few printf()s, I have determined:
> - The bo_clean.bv_cnt == 1 when the panic occurs and the b_lblkno of the
> buffer is -ve.
> 
> If you look at vtruncbuf():
>         trunclbn = (length + blksize - 1) / blksize;
> 1726
> 1727 	        ASSERT_VOP_LOCKED(vp, "vtruncbuf");
> 1728 	restart:
> 1729 	        bo = &vp->v_bufobj;
> 1730 	        BO_LOCK(bo);
> 1731 	        anyfreed = 1;
> 1732 	        for (;anyfreed;) {
> 1733 	                anyfreed = 0;
> 1734 	                TAILQ_FOREACH_SAFE(bp, &bo->bo_clean.bv_hd, b_bobufs,
> nbp) {
> 1735 	                        if (bp->b_lblkno < trunclbn)
> 1736 	                                continue;
> When length == 0 --> trunclbn is 0, but the test at line#1735 will skip over
> the b_lblkno
> because it is negative.
> 
> That is as far as I've gotten. A couple of things I need help from others on:
> - Is vtruncbuf() skipping over the cases where b_lblkno < 0 a feature or a
> bug?
> - If it is a feature, then what needs to be done in the code after the
> vtruncbuf()
>   call in ffs_truncate() to ensure the buffer is gone by the time the panic
>   check is
>   done?
>   --> I do see a bunch of code after the vtruncbuf() call related to indirect
>   blocks
>      (which I think use the -ve b_lblkno?), but I'll admit I don't understand
>      it well
>       enough to know if it expects vtruncbuf() to leave the -ve block on the
>       bo_hd list?
> 
> Obviously fixing vtruncbuf() to get rid of these -ve b_lblkno entries would
> be easy,
> but I don't know if that is a feature or a bug?
> 
> I did look at the commit logs and vtruncbuf() has been like this for at least
> 10years.
> (I can only guess very few run UFS without soft updates or others would see
> these panic()s.)
> 
> I am now running with soft updates enabled to avoid the crashes, but I can
> easily test any
> patch if others can a patch to try.
> 
Oh, and one more thing.
Maybe having the buffer for an indirect block hanging off the vnode at the
end of ffs_truncate() to 0 length is ok. After all, this is happening in
VOP_INACTIVE() and the vnode isn't being recycled yet?
(ie. The panic() test is not needed?)

rick

> Thanks for your help with this, rick
> 
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>