panic: uma_zone_slab is looping
Bosko Milekic
bmilekic at technokratis.com
Sun Dec 26 17:37:51 PST 2004
On Sun, Dec 26, 2004 at 11:56:51PM +0100, Peter Holm wrote:
> On Sun, Dec 26, 2004 at 01:17:38PM -0500, Bosko Milekic wrote:
> > On Sun, Dec 26, 2004 at 05:11:53PM +0100, Peter Holm wrote:
> > >
> > > Yes, I think that I have verified your exelent analysis of the
> > > problem: http://www.holm.cc/stress/log/freeze04.html
> > >
> > > So, do have any fix suggenstons? :-)
> >
> > Not yet, because the problem is non-obvious from the trace.
> >
> > I need to know exactly when the UMA RCntSlabs zone recurses _first_,
> > and I need to confirm that it is an actual recursion. I've looked at
> > the VM code and I don't see how/why recursion on the RCntSlabs zone
> > would happen.
> >
> > Please modify the printf code to look exactly like this:
> >
> > if (keg->uk_flags & UMA_ZFLAG_INTERNAL && keg->uk_recurse != 0) {
> > if ((zone == slabzone) || (zone == slabrefzone))
> > panic("Zone %s forced to fail due to recurse non-null: %d\n",
> > zone->uz_name, keg->uk_recurse);
> > return (NULL);
> > }
> >
> > (You don't need to check any global counter -- the counter is imperfect
> > anyway -- because even a single recursion on slabzone or slabrefzone
> > should be illegal).
> >
> > I'd like to see the trace from the above panic, if possible.
>
> Here it is: http://www.holm.cc/stress/log/freeze05.html
I have checked the code here and looked at possible code paths and
have unfortunately resorted to reguessing, and now I believe I have
identified a problematic scenario.
Consider this particular timeline (time moves downward):
[I hope you can handle ASCII art]
By the way, the stack trace you show would correspond to that of
thread 2. I refer to a frame number below.
thread 1 (t1) thread 2 (t2)
-------------------------------------------------------------------------
t1.a) Allocating from a zone,
needs slab header from one of
the slab header zones (either
slabzone or slabrefzone). Let's
assume it is slabzone, as in
your trace above. The allocation
is performed with M_WAITOK.
t2.a) Needs to allocate from
a zone, and it needs a
slab header too. The allocation will
be performed with M_WAITOK. Let's
assume that the slab header zone
we're allocating is also slabzone.
t1.b) in uma_zone_slab(), has
slabzone's keg lock, increments
keg's uk_recurse.
Enters slab_zalloc().
t2.b) Blocks on zone lock.
t1.c) Drops zone lock to
allocate from VM, uk_recurse
for the slabzone is currently
1 (we incremented it in t1.b).
t2.c) Takes zone lock for slabzone,
now in uma_zone_slab() (Frame 11),
and since uk_recurse is 1, it
decides recursion happened. Wants
to return NULL even though
allocation was done with M_WAITOK.
Our panic is triggered.
I'll have to reserve some more time to think about this. One way I
think it might be solvable would be to change that check that
triggers the NULL return explicitly check for the bucketzone, and not
for all UMA_ZONE_INTERNAL zones; I need to think this through a little
more.
Does the scenario seem likely to you?
Cheers,
--
Bosko Milekic
bmilekic at technokratis.com
bmilekic at FreeBSD.org
More information about the freebsd-current
mailing list