flushing disk buffer cache

Siddharth Aggarwal saggarwa at cs.utah.edu
Fri Oct 29 10:31:45 PDT 2004


Another related question ...

Is it possible to delay or queue up disk writes until I exit from my
function in the kernel (where I am trying to sync with the disk)? Or
make sure that my sync function never goes to sleep waiting for the disk
driver to signal completion of flushes to disk?

On Fri, 29 Oct 2004, Siddharth Aggarwal wrote:

>
>
> Hi,
>
> I am writing this pseudo disk driver for disk checkpointing, which
> intercepts write requests to the disk (ad0s1) and performs a copy on write
> of the old contents to another partition (ad0s4) before writing out the
> new contents. So the driver (called shd) is mounted as
>
> /dev/shd0a on /
> /dev/shd0f on /usr
>
>
> So each time the user creates a new checkpoint (basically initialize new
> data structures in memory for a new checkpoint), right before that inside
> the driver, I explicitly do a sync() to flush out the disk buffer cache,
> so that disk state is consistent when the checkpoint was taken.
>
> Then, I have hacked the reboot system call to revert to a previous
> checkpoint after unmounting all the filesystems but before halting the
> system. This revert basically involves copying some blocks from ad0s4 to
> ad0s1.
>
> However, when the system reboots, fsck shows up inconsistencies in the
> filesystem and so fsck needs to be run manually.
>
> So I suspect that the reason for this problem is that when a checkpoint is
> taken, the filesystem on ad0s1 is active and more write operations are
> coming in i.e. filesystem on ad0s1 is still dirty. Hence I explicitly
> called sync() before returning from the checkpoint command but I think
> sync() doesnt guarantee that everything was actually flushed out. So I
> implemented a more mandatory way of syncing, i.e. just got part of the
> code from boot() system call. The code is as below, and it is called
> whenever a checkpoint command is fired.
>
> Does anyone think if this is the right way of flushing the cache? Is there
> anything I can do to ensure the filesystem is consistent during reboot?
> I don't think this is a problem in the driver code, because when I created
> a new filesystem on ad0s3 and shadowed that using the driver, everything
> ran perfectly fine, but the difference was that I could unmount the
> filesystem before "restoring the checkpoint" and hence wasnt necessary to
> do it during reboot time.
>
>
> void sync_before_checkpoint (void)
> {
>     register struct buf *bp;
>     int iter, nbusy, pbusy;
>
>     waittime = 0;
>     sync(&proc0, NULL);
>
>                 /*
>                  * With soft updates, some buffers that are
>                  * written will be remarked as dirty until other
>                  * buffers are written.
>                  */
>
>     for (iter = pbusy = 0; iter < 20; iter++) {
>         nbusy = 0;
>         for (bp = &buf[nbuf]; --bp >= buf; ) {
>                 if ((bp->b_flags & B_INVAL) == 0 &&
>                     BUF_REFCNT(bp) > 0) {
>                         nbusy++;
>                 } else if ((bp->b_flags & (B_DELWRI | B_INVAL))
>                                 == B_DELWRI) {
>                         /* bawrite(bp);*/
>                         nbusy++;
>                 }
>         }
>         if (nbusy == 0)
>                 break;
>         printf("%d ", nbusy);
>         if (nbusy < pbusy)
>                 iter = 0;
>         pbusy = nbusy;
>         if (iter > 5 && bioops.io_sync)
>                 (*bioops.io_sync)(NULL);
>         sync(&proc0, NULL);
>         DELAY(50000 * iter);
>     }
>                 /*
>                  * Count only busy local buffers to prevent forcing
>                  * a fsck if we're just a client of a wedged NFS server
>                  */
>     nbusy = 0;
>     for (bp = &buf[nbuf]; --bp >= buf; ) {
>                 if (((bp->b_flags&B_INVAL) == 0 && BUF_REFCNT(bp)) ||
>                     ((bp->b_flags & (B_DELWRI|B_INVAL)) == B_DELWRI)) {
>                         if (bp->b_dev == NODEV) {
>                                 TAILQ_REMOVE(&mountlist,
>                                     bp->b_vp->v_mount, mnt_list);
>                                 continue;
>                         }
>                         nbusy++;
>                 }
>     }
>     if (nbusy) {
>                         /*
>                          * Failed to sync all blocks. Indicate this and don't
>                          * unmount filesystems (thus forcing an fsck on reboot).
>                          */
>                 printf("giving up on %d buffers\n", nbusy);
>                 DELAY(5000000); /* 5 seconds */
>     }
> }
>
>
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>


More information about the freebsd-hackers mailing list