SUJ deadlock
David O'Brien
obrien at freebsd.org
Fri Sep 3 23:41:22 UTC 2010
On Wed, May 05, 2010 at 12:54:07PM -1000, Jeff Roberson wrote:
> On Mon, 3 May 2010, Fabien Thomas wrote:
>>>> I'm with r207548 now and since some days i've system deadlock.
>>>> It seems related to SUJ with process waiting on suspfs or ppwait.
>>>
>>> I've also seen it stalled in suspfs, but this information is way better
>>> than what I was able to garner. I was only able to tell via ctrl-t on
>>> a stalled 'ls' process in a terminal before hard booting.
[..]
> Can anyone who has experienced this hang test this patch:
>
> Thanks,
> Jeff
> Index: ffs_softdep.c
> ===================================================================
> --- ffs_softdep.c (revision 207480)
> +++ ffs_softdep.c (working copy)
> @@ -9301,7 +9301,7 @@
> hadchanges = 1;
> }
> /* Leave this inodeblock dirty until it's in the list. */
> - if ((inodedep->id_state & (UNLINKED | DEPCOMPLETE)) == UNLINKED)
> + if ((inodedep->id_state & (UNLINKED | UNLINKONLIST)) == UNLINKED)
Hi Jeff,
I didn't seem to experience this problem back in May, but I'm now
experiencing it on a regular basis.
I seem to trigger it almost every other or 3rd day during the daily run.
I wind up with cvsup or svnsync stalled and any 'ls' of my sources
partition waiting on suspfs.
(note, I am also running diskcheckd from ports.)
My kernel sources are at:
Last Changed Author: davidxu
Last Changed Rev: 211534
Last Changed Date: 2010-08-20 16:51:34 -0700 (Fri, 20 Aug 2010)
I have also experienced it back to at least:
Last Changed Author: yongari
Last Changed Rev: 210152
Last Changed Date: 2010-07-15 16:34:58 -0700 (Thu, 15 Jul 2010)
Weird thing is - I can still access this partition across NFS without
problems.
dragon$ cd /src/fbsd
Filesystem Size Used Avail Capacity Mounted on
/dev/da31s1f 271G 119G 130G 48% /src
dragon$ ls
load: 0.12 cmd: ls 77901 [suspfs] 2.26r 0.00u 0.00s 0% 1212k
quynh$ cd /src/fbsd
quynh$ df .
Filesystem Size Used Avail Capacity Mounted on
dragon:/src 271G 119G 130G 48% /src
quynh$ ls
.svn/ lib/
COPYRIGHT libexec/
..snip..
Processes also have a tendency to complete quite slowly at times - waiting
in vlruwk.
When I reboot, usually / and /src (but not 3 other partitions) give a
"Bad cg number {negative number}" error from fsck; so a full fsck is run.
This results in what seems tens of thousands iterations of:
UNREF FILE I=[..snip..]
RECONNECT? yes
SORRY no space in lost+found directory
unexpected soft update inconsistency
CLEAR? yes
thoughts?
--
-- David (obrien at FreeBSD.org)
More information about the freebsd-current
mailing list