Freeze

Tue Dec 28 11:28:13 PST 2004

On Tue, 28 Dec 2004, Bosko Milekic wrote:

>
> On Tue, Dec 28, 2004 at 01:24:01PM -0500, Jeff Roberson wrote:
> > On Mon, 27 Dec 2004, Bosko Milekic wrote:
> >
> > >
> > > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=159278+0+current/freebsd-current
> > >
> > > (and see previous in thread for context).
> > >
> > > Let's hope.
> >
> > It might be better for the slab zones to msleep there rather than flood
> > the system with allocation requests.  Imagine if many threads were to hit
> > this code path, the slabzone would get a new slab for each calling thread,
> > when really we only want one more.
>
>   They will not flood the system with allocation requests.  They
>   will merely block in the VM if they need to (if the allocation is
>   M_WAITOK, otherwise they will return NULL shortly thereafter).
>   This is because after that check, in uma_zalloc_internal(), we enter
>   the loop which performs a slab_zalloc() call (which dips into VM).
>   The dipping into VM will not occur repeatedly, but only once, as it is
>   supposed to; the change merely gives it a chance to do so.
>
>   The uk_recurse check was designed to protect from recursion occuring
>   from the VM itself and so should not prevent multiple threads from
>   dipping into the VM, even if it is for the same zone.
>
>   This is perhaps an implementation flaw (but it is separate), that
>   under a starvation situation you might have two or more separate
>   threads ask the VM for pages (when in fact only one slab needs to be
>   allocated to satisfy N requests), but this scenario is rare and is in
>   my opinion not worth changing the implementation for (in that kind of
>   long-term starvation, you're likely facing serious problems that UMA
>   alone cannot address anyway).

This is the issue that I was talking about.  We'll have too many threads
simultaneously doing allocations for the same zone.  I thought I had some
code to address this situation, but I can't remeber where now.

>
> > It seems like this bug should have been caught sooner with asserts rather
> > than a hang somewhere.  Can you investigate why this did not happen?
>
>   I don't see why the bug would have been hit sooner.  The likelihood of
>   another allocation request coming in through uma_zalloc_internal()
>   _exactly_ when another is allocating a slab header within the window
>   where uk_recurse > 0 is small.

Not that we would have found it sooner, but we should have paniced
somewhere before returning with NULL and looping forever.

>
> --
> Bosko Milekic
> bmilekic at technokratis.com
> bmilekic at FreeBSD.org
>