SU+J: 185 processes in state "suspfs" for >8 hrs. ... not good, right?

David Wolfskill david at catwhisker.org
Thu May 1 17:09:52 UTC 2014


On Thu, May 01, 2014 at 09:51:43AM -0700, Kirk McKusick wrote:
>>...
> 
> The following fix for related problems was made to head and MFC'ed
> to stable/10 but not stable/9.
> 
> *** stable/9/sys/ufs/ffs/ffs_vnops.c	2014-03-05 08:51:48.000000000 -0800
> --- stable/9/sys/ufs/ffsffs_vnops.c	2014-05-01 09:41:35.000000000 -0700
> ***************
> *** 258,266 ****
>   			continue;
>   		if (bp->b_lblkno > lbn)
>   			panic("ffs_syncvnode: syncing truncated data.");
> ! 		if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL))
>   			continue;
> - 		BO_UNLOCK(bo);
>   		if ((bp->b_flags & B_DELWRI) == 0)
>   			panic("ffs_fsync: not dirty");
>   		/*
> --- 258,274 ----
>   			continue;
>   		if (bp->b_lblkno > lbn)
>   			panic("ffs_syncvnode: syncing truncated data.");
> ! 		if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) == 0) {
> ! 			BO_UNLOCK(bo);
> ! 		} else if (wait != 0) {
> ! 			if (BUF_LOCK(bp,
> ! 			    LK_EXCLUSIVE | LK_SLEEPFAIL | LK_INTERLOCK,
> ! 			    BO_LOCKPTR(bo)) != 0) {
> ! 				bp->b_vflags &= ~BV_SCANNED;
> ! 				goto next;
> ! 			}
> ! 		} else
>   			continue;
>   		if ((bp->b_flags & B_DELWRI) == 0)
>   			panic("ffs_fsync: not dirty");
>   		/*
> 
> The associated comment is:
> 
>     If we fail to do a non-blocking acquire of a buf lock while doing a
>     waiting sync pass we need to do a blocking acquire and restart.
>     Another thread, typically the buf daemon, may have this buf locked and
>     if we don't wait we can fail to sync the file.  This lead to a great
>     variety of softdep panics and deadlocks because we rely on all
>     dependencies being flushed before proceeding in several cases.

Cool -- thanks!

> Let me know if it helps your problem. If it does, I will MFC it to 9.
> There have been several other fixes made to SU+J that are more likely
> to be the cause of your problem, but they are not easily back-ported
> to stable/9. So if this does not fix your problem my only suggestions
> are to turn off journaling or move to running on stable/10.
> 
>     Kirk McKusick

Roger that.  And yes, stable/10 is a goal -- but I *just* finally managed
to get the machines migrated from 8.2-ish to 9.2.  :-)  (Note: I do not
have direct control -- merely a measure of influence. :-})

Peace,
david
-- 
David H. Wolfskill				david at catwhisker.org
Taliban: Evil cowards with guns afraid of truth from a 14-year old girl.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 964 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20140501/9353aa23/attachment.sig>


More information about the freebsd-fs mailing list