panic: uma_zone_slab is looping

Thu Dec 23 02:30:06 PST 2004

On Wed, Dec 22, 2004 at 05:15:40PM -0500, Bosko Milekic wrote:
> 
> On Wed, Dec 22, 2004 at 10:05:53PM +0100, Peter Holm wrote:
> > On Mon, Dec 20, 2004 at 06:41:04PM -0500, Bosko Milekic wrote:
> > > 
> > >   I realize it's been a while.
> > > 
> > >   Anyway, what I *think* is going on here is that slab_zalloc() is
> > >   actually returning NULL even when called with M_WAITOK.  Further
> > >   inspection in slab_zalloc() reveals that this could come from several
> > >   places.  One of them is kmem_malloc() itself, which I doubt will ever
> > >   return NULL if called with M_WAITOK.  If this assumption is indeed
> > >   correct, then the NULL must be being returned by slab_zalloc() itself,
> > >   or due to a failed uma_zalloc_internal() call.  It is also possible
> > >   that slab_zalloc() returns NULL if the init that gets called for the
> > >   zone fails.  However, judging from the stack trace you provided, the
> > >   init in question is mb_init_pack() (kern_mbuf.c).  This particular
> > >   init DOES perform an allocation and CAN in theory fail, but I believe
> > >   it should be called with M_WAITOK as well, and so it should also never
> > >   fail in theory.
> > > 
> > >   Have you gotten any further with the analysis of this particular
> > >   trace?  If not, I would suggest adding some more printf()s and
> > >   analysis into slab_zalloc() itself, to see if that is indeed what is
> > >   causing the infinite looping in uma_zone_slab() and, if so, attempt to
> > >   figure out what part of slab_zalloc() is returning the NULL.
> > 
> > OK, did that: http://www.holm.cc/stress/log/freeze03.html
> 
>   OK, well, I think I know what's happening.  See if you can confirm
>   this with me.
> 
>   I'll start with your trace and describe the analysis, please bear with
>   me because it's long and painful.
> 
>   Your trace indicates that the NULL allocation failure, despite a call
>   with M_WAITOK, is coming from slab_zalloc().  The particular thing
>   that should also be mentionned about this trace, and your previous
>   one, is that they both show a call path that goes through an init
>   which performs an allocation, also with M_WAITOK.  Currently, only the
>   "packet zone" does this.  It looks something like this:
> 
>   1. UMA allocation is performed for a "packet."  A "packet" is an mbuf
>      with a pre-attached cluster.
> 
>   2. UMA dips into the packet zone and finds it empty.  Additionally, it
>      determines that it is unable to get a bucket to fill up the zone
>      (presumably there is a lot of memory request load).  So it calls
>      uma_zalloc_internal on the packet zone (frame 18).
> 
>   3. Perhaps after some blocking, a slab is obtained from the packet
>      zone's backing keg (which coincidentally is the same keg as the
>      mbuf zone's backing keg -- let's call it the MBUF KEG).  So now
>      that an mbuf item is taken from the freshly allocated slab obtained
>      from the MBUF KEG, uma_zalloc_internal() needs to init and ctor it,
>      since it is about to return it to the top (calling) layer.  It
>      calls the initializer on it for the packet zone, mbuf_init_pack().
>      This corresponds to frame 17.
> 
>   4. The packet zone's initializer needs to call into UMA again to get
>      and attach an mbuf cluster to the mbuf being allocated, because mbufs
>      residing within the packet zone (or obtained from the packet zone)
>      MUST have clusters attached to them.  It attempts to perform this
>      allocation with M_WAITOK, because that's what the initial caller
>      was calling with.  This is frame 16.
> 
>   5. Now the cluster zone is also completely empty and we can't get a
>      bucket (surprise, surprise, the system is under high memory-request
>      load). UMA calls uma_zalloc_internal() on the cluster zone as well.
>      This is frame 15.
> 
>   6. uma_zalloc_internal() calls uma_zone_slab().  Its job is to find a
>      slab from the cluster zone's backing keg (a separate CLUSTER KEG)
>      and return it.  Unfortunately, memory-request load is high, so it's
>      going to have a difficult time.  The uma_zone_slab() call is frame
>      14.
> 
>   7. uma_zone_slab() can't find a locally cached slab (hardly
>      surprising, due to load) and calls slab_zalloc() to actually go to
>      VM and get one.  Before calling, it increments a special "recurse"
>      flag so that we do not recurse on calling into the VM.  This is
>      because the VM itself might call back into UMA when it attempts to
>      allocate vm_map_entries which could cause it to recurse on
>      allocating buckets.  This recurse flag is PER zone, and really only
>      exists to protect the bucket zone. Crazy, crazy shit indeed.
>      Pardon the language.  This is frame 13.
> 
>   8. Now slab_zalloc(), called for the CLUSTER zone, determines that the
>      cluster zone (for space efficiency reasons) is in fact an OFFPAGE
>      zone, so it needs to grab a slab header structure from a separate
>      UMA "slab header" zone.  It calls uma_zalloc_internal() from
>      slab_zalloc(), but it calls it on the SLAB HEADER zone.  It passes
>      M_WAITOK down to it, but for some reason IT returns NULL and the
>      failure is propagated back up which causes the uma_zone_slab() to
>      keep looping.  Go back to step 7.
> 
>   This is the infinite loop 7 -> 8 -> 7 -> 8 -> ...  which you seem to
>   have caught.
> 
>   The question now is why does the uma_zalloc_internal() fail on the
>   SLAB HEADER zone, even though it is called with M_WAITOK.
>   Unfortunately, your stack trace does not provide enough depth to be
>   able to continue an accurate deductive analysis from this point on
>   (you would need to sprinkle MORE KASSERTs).
> 
>   However, here are some hypotheses. 
> 
>   The uma_zalloc_internal() which ends up getting called also ends up
>   calling uma_zone_slab(), but uma_zone_slab() eventually fails (this is
>   a fact, this is the only reason that the uma_zalloc_internal() could
>   in turn fail for the SLAB HEADER zone, which doesn't have an init or a
>   ctor).
> 
>   So why does the uma_zone_slab() fail with M_WAITOK on the slab header
>   zone?  Possibilities:
> 
>   1. The recurse flag is at some point determined non-zero FOR THE SLAB
>      HEADER backing keg.  If the VM ends up getting called from the
>      subsequent slab_zalloc() and ends up calling back into UMA for
>      whatever allocations, and "whatever allocations" are also
>      potentially offpage, and a slab header is ALSO required, then we
>      could also be recursing on the slab header zone from VM, so this
>      could cause the failure.
> 
>      if (keg->uk_flags & UMA_ZFLAG_INTERNAL && keg->uk_recurse != 0) {
> 	 /* ADD PRINTF HERE */
> 	 printf("This zone: %s, forced fail due to recurse non-null",
> 	     zone->uz_name);
> 	 return NULL;
>      }
> 
The printf didn't really fly. It seems to be called early in
boot:

ck_flags(0,0,c07e8890,882) at _mtx_lock_flags+0x24
uma_zalloc_internal(c09284a0,c0c20c84,2) at
uma_zalloc_internal+0x2d
uma_zcreate(c07e8b1e,40,0,0,0,0,3,2000) at uma_zcreate+0x57
uma_startup(c103d000,c103d000,28000,c0c20d78,ff00000) at
uma_startup+0x2ae
vm_page_startup(c1065000,c0c20d98,c05ec857,0,c08525d0) at
vm_page_startup+0x109
vm_mem_init(0,c08525d0,c1ec00,c1e000,c28000) at vm_mem_init+0x13
mi_startup() at mi_startup+0xb3
begin() at begin+0x2c

so I just sprinkled some more asserts. I'm trying to see if I can 
provoke this problem more consistently, based on your analysis.
It usually takes me a day or two of testing to get there.

>      If you get the print to trigger right before the panic (last one
>      before the panic), see if it is on the SLAB HEADER zone.  In
>      theory, it should only happen for the BUCKET ZONE.
> 
>   2. M_WAITOK really isn't set.  Unlikely.
> 
>   If (1) is really happening, we'll need to think about it a little more
>   before deciding how to fix it.  As you can see, due to the recursive
>   nature of UMA/VM, things can get really tough when resources are
>   scarce.
> 
> Regards,
> -- 
> Bosko Milekic
> bmilekic at technokratis.com
> bmilekic at FreeBSD.org

-- 
Peter Holm