Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini()

Wed Dec 4 10:13:48 UTC 2019

Yeah, I noticed that after I sent that email (should send stuff just before you go to bed :-)

Anyway, some more timing/debug printfs point to the keg->uk_freef(mem) call in keg_free_slab() which sometimes take about a second to execute.
(Should probably switch to some other time source with greater resolution than 1s next :-)

keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds
keg_free_slab: keg->uk_freef(mem) took 1 seconds

- Peter

> On 4 Dec 2019, at 10:48, Andriy Gapon <avg at FreeBSD.org> wrote:
> 
> On 04/12/2019 00:28, Peter Eriksson wrote:
>> … and X levels deeper it gives this (not all filesystems mounted - I rebooted fairly quickly):
>> 
>> keg_drain: while (SLIST_FIRST/SLIST_REMOVE)-loop took 14 seconds [20021 loops]
>> zone_drain_wait(): zone_foreach_keg(zone, &keg_drain) took 14 seconds
>> zone_dtor(): zone_drain_wait(zone, M_WAITOK) took 14 seconds
>> zone_free_item(zone=UMA Zones): zone->uz_dtor() took 14 seconds
>> uma_zdestroy(zio_buf_12288) took 14 seconds
>> kmem_cache_destroy: uma_zdestroy(0xfffff803467c8ac0) [zio_buf_12288] took 14 seconds
>> kmem_cache_destroy(zio_buf_cache[20]) took 14 seconds
>> 
>> 
>> Hmm… I wonder if it isn’t this code I /usr/src/sys/sys/queue.h:
>> 
>>> #define SLIST_REMOVE(head, elm, type, field) do {                       \
>>>       QMD_SAVELINK(oldnext, (elm)->field.sle_next);                   \
>>>       if (SLIST_FIRST((head)) == (elm)) {                             \
>>>               SLIST_REMOVE_HEAD((head), field);                       \
>>>       }                                                               \
>>>       else {                                                          \
>>>               QUEUE_TYPEOF(type) *curelm = SLIST_FIRST(head);         \
>>>               while (SLIST_NEXT(curelm, field) != (elm))              \
>>>                       curelm = SLIST_NEXT(curelm, field);             \
>>>               SLIST_REMOVE_AFTER(curelm, field);                      \
>>>       }                                                               \
>>>       TRASHIT(*oldnext);                                              \
>>> } while (0)
>> 
>> Combined with this in /usr/src/sys/vm/uma_core.c: keg_drain():
>> 
>>> finished:
>>>       KEG_UNLOCK(keg);
>>> 
>>>       while ((slab = SLIST_FIRST(&freeslabs)) != NULL) {
>>>               SLIST_REMOVE(&freeslabs, slab, uma_slab, us_hlink);
>>>               keg_free_slab(keg, slab, keg->uk_ipers);
>>>       }
>> 
>> (The keg_drain print above is the time for the while() loop).
>> 
>> If I’m reading this right it looks like it’s potentially doing a linear search through a (long) linked list, for every entry in the free slabs list. 20000 entries in 14 seconds is 1400 entries per second.
> 
> It's not a search.  The loop is removing *all* items.  There is no other way to
> do it :-)
> Expanding your calculations, it takes on the order of a millisecond to process a
> single slab.  I bet that the time is spent in keg_free_slab() and it's a lot of
> time.
> 
> I wonder if the OR should actually be AND in this condition:
>                if (!uma_dbg_kskip(keg, slab->us_data + (keg->uk_rsize * i)) ||
>                    keg->uk_fini != trash_fini)
> Looks like there is a "ton" of memory reading done here.
> 
> -- 
> Andriy Gapon