stress test deadlock involving vm and/or geom

Wed Sep 28 19:14:58 PDT 2005

After running Peter Holm's stress test suite for several hours while
creating and removing snapshots, I got a system deadlock that doesn't
appear to be snapshot related, and happened while no snapshots were
present.  This deadlock appears to be caused by a vm shortage.

  Kernel memory usage didn't look excessive in a post-mortem "vmstat -m"
  check.

  Large numbers of processes were waiting on "vmwait", "pfault",
  and "biord".  The processes waiting on "vmwait" were waiting for the
  pagedaemon to free up some memory.

  The pagedaemon was waiting on "wswbuf0", which indicates it was
  waiting for a pbuf, which indicates that all of its pbufs were
  currently in use.

I'm not sure where the pagedaemons I/O requests were stuck, but I
started looking at geom.  The g_up and g_event threads appeared to be
idle, but g_down was waiting on "g_down".

The stack trace for g_down is:

(kgdb) where
#0  0xc0653933 in sched_switch (td=0xc2275480, newtd=0xc2274780, flags=1)
    at /usr/src/sys/kern/sched_4bsd.c:973
#1  0xc0649178 in mi_switch (flags=1, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:356
#2  0xc066075c in sleepq_switch (wchan=0x0)
    at /usr/src/sys/kern/subr_sleepqueue.c:427
#3  0xc06609bc in sleepq_timedwait (wchan=0xe35e0cf4)
    at /usr/src/sys/kern/subr_sleepqueue.c:568
#4  0xc0648dc2 in msleep (ident=0xe35e0cf4, mtx=0x0, priority=76, 
    wmesg=0xc08695b3 "g_down", timo=100) at /usr/src/sys/kern/kern_synch.c:223
#5  0xc0606958 in g_io_schedule_down (tp=0xc2275480)
    at /usr/src/sys/geom/geom_io.c:418
#6  0xc0606eb6 in g_down_procbody () at /usr/src/sys/geom/geom_kern.c:118
#7  0xc06303bc in fork_exit (callout=0xc0606e5c <g_down_procbody>, arg=0x0, 
    frame=0xe35e0d38) at /usr/src/sys/kern/kern_fork.c:786
#8  0xc07f5f9c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208

There appear to be 559 items stuck in its queue:

(kgdb) frame 5
#5  0xc0606958 in g_io_schedule_down (tp=0xc2275480)
    at /usr/src/sys/geom/geom_io.c:418
418                             msleep(&error, NULL, PRIBIO, "g_down", hz/10);
(kgdb) print g_bio_run_down
$1 = {bio_queue = {tqh_first = 0xc38da318, tqh_last = 0xc38da248}, 
  bio_queue_lock = {mtx_object = {lo_class = 0xc08d5fe4, 
      lo_name = 0xc0853a4c "bio queue", lo_type = 0xc0853a4c "bio queue", 
      lo_flags = 196608, lo_list = {tqe_next = 0xc0936788, 
        tqe_prev = 0xc0936810}, lo_witness = 0xc0946a38}, mtx_lock = 4, 
    mtx_recurse = 0}, bio_queue_length = 559}

g_down is sitting at msleep() in this block of code in
g_io_schedule_down():

(kgdb) list
413                     }
414                     CTR0(KTR_GEOM, "g_down has work to do");
415                     g_bioq_unlock(&g_bio_run_down);
416                     if (pace > 0) {
417                             CTR1(KTR_GEOM, "g_down pacing self (pace %d)", pace);
418                             msleep(&error, NULL, PRIBIO, "g_down", hz/10);
419                             pace--;
420                     }
421                     error = g_io_check(bp);

This is consistent with pace > 0:

(kgdb) print pace
$2 = 1

What is curious is that pace is only incremented when g_io_deliver() is
called with error=ENOMEM, but this can only happen in this code:

422                     if (error) {
423                             CTR3(KTR_GEOM, "g_down g_io_check on bp %p provider "
424                                 "%s returned %d", bp, bp->bio_to->name, error);
425                             g_io_deliver(bp, error);
426                             continue;
427                     }
428                     CTR2(KTR_GEOM, "g_down processing bp %p provider %s", bp,
429                         bp->bio_to->name);

and error doesn't get changed anywhere else in this code, so it should
have the same value as the previous iteration through the loop, *but* it
is 0 ...

(kgdb) print error
$6 = 0

Also, g_io_check() can only return ENOMEM if that is the provider error
status, but the provider seems to be happy:

(kgdb) print *(bp->bio_to)
$5 = {name = 0xc258c8d8 "da0", provider = {le_next = 0x0, 
    le_prev = 0xc258c794}, geom = 0xc258c780, consumers = {
    lh_first = 0xc24671c0}, acr = 3, acw = 3, ace = 5, error = 0, orphan = {
    tqe_next = 0x0, tqe_prev = 0x0}, mediasize = 36703933440, 
  sectorsize = 512, stripesize = 0, stripeoffset = 0, stat = 0xc25c35a0, 
  nstart = 6007328, nend = 6006932, flags = 0, private = 0x0, index = 0}