biodone vs geom_up

Mon Oct 6 22:53:16 UTC 2014

It seems to me that there is a problem with biodone (particular vs. g_disk_start/g_disk_done_single):

void
biodone(struct bio *bp)
{
        struct mtx *mtxp;
        void (*done)(struct bio *);
        // ...
        done = bp->bio_done;
        if (done == NULL) { 
                mtxp = mtx_pool_find(mtxpool_sleep, bp);
                mtx_lock(mtxp);
                bp->bio_flags |= BIO_DONE;   // XXX [0]
                wakeup(bp);
                mtx_unlock(mtxp);
        } else {
                bp->bio_flags |= BIO_DONE;   // XXX [1]
                done(bp);                              // XXX [2]
        }
}

static void
g_disk_start(struct bio *bp)
{
        // ...
        case BIO_READ:
        case BIO_WRITE:
                d_maxsize = (bp->bio_cmd == BIO_DELETE) ?
                    dp->d_delmaxsize : dp->d_maxsize;
                if (bp->bio_length <= d_maxsize) {
                        bp->bio_disk = dp;
                        bp->bio_to = (void *)bp->bio_done;
                        bp->bio_done = g_disk_done_single; // XXX [3]
                        // ...
                        dp->d_strategy(bp);
                        break;
        // ...
}

static void
g_disk_done_single(struct bio *bp)
{
        struct bintime now;
        struct g_disk_softc *sc;

        bp->bio_completed = bp->bio_length - bp->bio_resid;
        bp->bio_done = (void *)bp->bio_to;
        bp->bio_to = LIST_FIRST(&bp->bio_disk->d_geom->provider);
        // ...
        g_io_deliver(bp, bp->bio_error);   // XXX [4]
}

And a normal caller doing something like:

void *
g_read_data(struct g_consumer *cp, off_t offset, off_t length, int *error)
{
        // ...
        bp = g_alloc_bio();
        bp->bio_cmd = BIO_READ;
        bp->bio_done = NULL;
        g_io_request(bp, cp);
        errorc = biowait(bp, "gread");
        if (error != NULL)
                *error = errorc;
        g_destroy_bio(bp);
        // ...
}

With this code, it is possible that thread 1(the i/o initiator) only gets to biowait *after* the g_io_request() has already made it back, called (the first) biodone, set BIO_DONE at [1], and the bio is now either queued in the g_bio_run_up.bio_queue, or in direct dispatch (via g_io_deliver at [4] via done(bp) [aka bio_done set at [3] called at [2]].

biowait will just return (because BIO_DONE is set), and the initiator thread will g_destroy_bio while that bio is still being worked on, on the g_up thread ... use-after-free.

Does anyone see a reason why we need to set BIO_DONE at [1] at all?  It seems to me that the default case, where there is no callback left [0], is enough?

Removing the BIO_DONE setting at [1] resolves the issue I'm seeing, but I wonder if there is a better way to resolve this?  Should g_disk_start() clone a bio for its callback instead?  Are there other places that set a bio_done callback on the way down?

Any comments, questions, or answers greatly appreciated.

Thanks,
  Max