TRIM support for UFS?

Kirk McKusick mckusick at mckusick.com
Wed Dec 8 22:23:54 UTC 2010


> Date: Wed, 8 Dec 2010 23:49:09 +0200
> From: Kostik Belousov <kostikbel at gmail.com>
> To: Julian Elischer <julian at freebsd.org>
> Cc: freebsd-fs at freebsd.org, pjd at freebsd.org,
>         Oliver Fromme <olli at lurza.secnetix.de>,
>         Kirk McKusick <mckusick at mckusick.com>
> Subject: Re: TRIM support for UFS?
> 
> On Wed, Dec 08, 2010 at 09:07:56AM -0800, Julian Elischer wrote:
> >
> > From:    Kirk McKusick <mckusick at chez.mckusick.com>
> > Date:    Wed, 21 May 2008 13:19:18 -0700
> > To:      Poul-Henning Kamp <phk at critter.freebsd.dk>
> > Subject: UFS and BIO_DELETE
> > X-URL: http://WWW.McKusick.COM/
> > Reply-To: Kirk McKusick <mckusick at McKusick.COM>
> >
> > I enclose below my proposed patch to add BIO_DELETE to UFS
> > (goes at the end of ffs_blkfree). As I have no way to test
> > it, I am wondering if you could let me know if it works.
> >
> > Also, I am thinking of only enabling it for filesystems mounted
> > with a new flag requesting the behavior since the geteblk is a
> > rather expensive call for the usual no-op case.
> >
> > I did look at just allocating a `struct bio' as a local variable
> > and using that, but it looked like I needed to also come up with a
> > `producer' and/or `consumer' if I wanted to pass it to GEOM directly,
> > so in the end I went with this more expensive solution. If there
> > is an easy way to just pass a bio structure to GEOM, I would much
> > prefer that approach.
> >
> > 	~Kirk
> >
> >
> > *** ffs_alloc.c	Wed May 21 20:11:04 2008
> > --- ffs_alloc.c.new	Wed May 21 20:10:50 2008
> > ***************
> > *** 1945,1950 ****
> > --- 1945,1962 ----
> >   	ACTIVECLEAR(fs, cg);
> >   	UFS_UNLOCK(ump);
> >   	bdwrite(bp);
> > + 	/*
> > + 	 * Request that the block be cleared.
> > + 	 */
> > + 	bp = geteblk(size);
> > + 	bp->b_iocmd = BIO_DELETE;
> > + 	bp->b_vp = devvp;
> > + 	bp->b_blkno = fsbtodb(fs, bno);
> > + 	bp->b_offset = dbtob(bp->b_blkno);
> > + 	bp->b_iooffset = bp->b_offset;
> > + 	bp->b_bcount = size;
> > + 	BUF_KERNPROC(bp);
> > + 	BO_STRATEGY(&devvp->v_bufobj, bp);
> >   }
> > 
> >   #ifdef INVARIANTS
> >
> The UFS devvp contains the pointer to struct g_consumer in
> devpp->v_bufobj->bo_private, so g_alloc_bio() and g_io_request() can be
> used to start BIO_DELETE command without fake bufer allocation.
> g_io_strategy() may be used as the sample.

Thanks for that update to moderize the patch!

> But, I have a question about the patch. The bitmap block in ffs_blkfree()
> is handled with delayed write, by bdwrite() call. I think that it is
> quite possible for BIO_DELETE to be executed before the bitmap block
> is written and on-disk bit is set in bitmap. Would not this allow
> for the blocks to be deleted before the bitmap is updated,
> and potentially before on-disk pointers are cleared that points to
> the blocks ? SU just rolls back the unfinished dependencies on write,
> but BIO_DELETE cannot.

It is safe to do the BIO_DELETE here because at this point we have
progressed through the file delete to the point where the inode that
claimed this block has been updated on the disk with a NULL pointer.
The only step that remains is to update the on-disk bitmap to reflect
that the block is available. If the bitmap write is not done before
a system crash the only action that will be taken with respect to
the freed block is that either background fsck will restore it to
the bitmap (SU) or the journal will restore it to the bitmap (SUJ).
There is no case where either SU or SUJ will put it back into the
file once we have reached this point.

> Also, shouldn't the BIO_DELETE only be issued for real devices,
> and not the snapshots ?

They should also be issued for snapshots. The only time that snapshots
give up blocks is when they are deleted. The blocks that they give up
at that time are all copies of blocks that were later changed in the
filesystem. With the snapshot gone there will be no remaining references
to those blocks and they will be made available for reallocation. As
such, it would be helpful if they had been TRIMMED.

> diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
> index b740bbb..4ff57ae 100644
> --- a/sys/ufs/ffs/ffs_alloc.c
> +++ b/sys/ufs/ffs/ffs_alloc.c
> @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$");
> 
>  #include <security/audit/audit.h>
> 
> +#include <geom/geom.h>
> +
>  #include <ufs/ufs/dir.h>
>  #include <ufs/ufs/extattr.h>
>  #include <ufs/ufs/quota.h>
> @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
>  	u_int cg;
>  	u_int8_t *blksfree;
>  	struct cdev *dev;
> +	struct bio *bip;
> 
>  	cg = dtog(fs, bno);
>  	if (devvp->v_type == VREG) {
> @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
>  		softdep_setup_blkfree(UFSTOVFS(ump), bp, bno,
>  		    numfrags(fs, size), dephd);
>  	bdwrite(bp);
> +
> +	if (devvp->v_type != VREG) {
> +		bip = g_alloc_bio();
> +		bip->bio_cmd = BIO_DELETE;
> +		bip->bio_offset = dbtob(fsbtodb(fs, bno));
> +		bip->bio_done = g_destroy_bio;
> +		bip->bio_length = size;
> +		g_io_request(bip,
> +		    (struct g_consumer *)devvp->v_bufobj.bo_private);
> +	}
>  }
> 
>  #ifdef INVARIANTS

The above patch looks good though I would do it unconditionally
(e.g., also for snapshots). It seems sensible to make it conditional
on a sysctl variable so that folks could experiment with it more
easily. And I would leave it off by default as non-SSD disks are
unlikely to benefit from it. If it does prove useful for SSD disks
then I would make it conditional on a filesystem flag so that it
is possible to enable it on a filesystem-by-filesystem basis (e.g.,
on for filesystems on the SSD and off for the rest).

	Kirk McKusick


More information about the freebsd-fs mailing list