Freeze
Peter Holm
peter at holm.cc
Fri Dec 17 02:07:18 PST 2004
On Thu, Dec 16, 2004 at 03:21:44PM -0500, John Baldwin wrote:
> On Monday 06 December 2004 08:59 am, Peter Holm wrote:
> > On Fri, Nov 19, 2004 at 05:10:19PM -0500, John Baldwin wrote:
> > > On Friday 19 November 2004 02:59 am, Peter Holm wrote:
> > > > On Mon, Nov 15, 2004 at 03:46:15PM -0500, John Baldwin wrote:
> > > > > On Friday 12 November 2004 07:33 am, Peter Holm wrote:
> > > > > > GENERIC HEAD from Nov 11 08:05 UTC
> > > > > >
> > > > > > The following stack traces etc. was done before my first
> > > > > > cup of coffee, so it's not so informative as it could have been :-(
> > > > > >
> > > > > > The test box appeared to have been frozen for more than 6 hours,
> > > > > > but was pingable.
> > > > > >
> > > > > > http://www.holm.cc/stress/log/cons86.html
> > > > >
> > > > > A weak guess is that you have the system in some sort of livelock due
> > > > > to fork()? Have you tried running with 'debug.mpsafevm=1' set from
> > > > > the loader?
> > > > >
> > > > > --
> > > > > John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
> > > > > "Power Users Use the Power to Serve" = http://www.FreeBSD.org
> > > >
> > > > OK, I've got some more info:
> > > >
> > > > http://www.holm.cc/stress/log/cons88.html
> > > >
> > > > Looks like a spin in uma_zone_slab() when slab_zalloc() fails?
> > >
> > > Yes, I think if you specify M_WAITOK, then that might happen.
> > > slab_zalloc() can fail if any of the init functions fail for example, in
> > > which case it would loop forever. You can try this hack (though it may
> > > very well be wrong) to return failure if that is what is triggering:
> > >
> > > Index: uma_core.c
> > > ===================================================================
> > > RCS file: /usr/cvs/src/sys/vm/uma_core.c,v
> > > retrieving revision 1.110
> > > diff -u -r1.110 uma_core.c
> > > --- uma_core.c 6 Nov 2004 11:43:30 -0000 1.110
> > > +++ uma_core.c 19 Nov 2004 22:08:26 -0000
> > > @@ -1998,6 +1998,10 @@
> > > */
> > > if (flags & M_NOWAIT)
> > > flags |= M_NOVM;
> > > +
> > > + /* XXXHACK */
> > > + if (flags & M_WAITOK)
> > > + break;
> > > }
> > > return (slab);
> > > }
> > >
> > > --
> > > John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
> > > "Power Users Use the Power to Serve" = http://www.FreeBSD.org
> >
> > I instrumented the code with this:
> > $ cvs diff -u
> > cvs diff: Diffing .
> > Index: uma_core.c
> > ===================================================================
> > RCS file: /home/ncvs/src/sys/vm/uma_core.c,v
> > retrieving revision 1.110
> > diff -u -r1.110 uma_core.c
> > --- uma_core.c 6 Nov 2004 11:43:30 -0000 1.110
> > +++ uma_core.c 6 Dec 2004 13:49:36 -0000
> > @@ -1926,6 +1926,7 @@
> > {
> > uma_slab_t slab;
> > uma_keg_t keg;
> > + int i;
> >
> > keg = zone->uz_keg;
> >
> > @@ -1943,7 +1944,8 @@
> >
> > slab = NULL;
> >
> > - for (;;) {
> > + for (i = 0;;i++) {
> > + KASSERT(i < 10000, ("uma_zone_slab is looping"));
> > /*
> > * Find a slab with some space. Prefer slabs that are
> > partially * used over those that are totally full. This helps to reduce
> >
> > and now during test of Jeff Roberson's "SMP FFS" patch the assert
> > triggered: http://www.holm.cc/stress/log/cons92.html
>
> Hmm. Does the hack patch above make the hang go away or does it just break
> things worse?
>
> --
> John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve" = http://www.FreeBSD.org
I've uploaded two different freeze incidents to
http://www.holm.cc/stress/log/freeze01.html and
http://www.holm.cc/stress/log/freeze02.html
just in case there should be any new clues in there.
The first is switching threads, wheres the second isn't:
freeze01:curthread = 0xc301f8a0: pid 65444 "net"
freeze01:curthread = 0xc302f000: pid 65452 "net"
freeze02:curthread = 0xc25eb2e0: pid 73508 "fork"
freeze02:curthread = 0xc25eb2e0: pid 73508 "fork"
freeze02:curthread = 0xc25eb2e0: pid 73508 "fork"
freeze02:curthread = 0xc25eb2e0: pid 73508 "fork"
freeze02:curthread = 0xc25eb2e0: pid 73508 "fork"
I'm testing your patch right now, but I guess it will be days
before we know for sure.
--
Peter Holm
More information about the freebsd-current
mailing list