TRIM support for UFS?

Kostik Belousov kostikbel at gmail.com
Wed Dec 8 22:53:57 UTC 2010


On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote:
> > Date: Wed, 8 Dec 2010 23:49:09 +0200
> > From: Kostik Belousov <kostikbel at gmail.com>
> > To: Julian Elischer <julian at freebsd.org>
> > Cc: freebsd-fs at freebsd.org, pjd at freebsd.org,
> >         Oliver Fromme <olli at lurza.secnetix.de>,
> >         Kirk McKusick <mckusick at mckusick.com>
> > Subject: Re: TRIM support for UFS?
> > 
> > On Wed, Dec 08, 2010 at 09:07:56AM -0800, Julian Elischer wrote:
> > >
> > > From:    Kirk McKusick <mckusick at chez.mckusick.com>
> > > Date:    Wed, 21 May 2008 13:19:18 -0700
> > > To:      Poul-Henning Kamp <phk at critter.freebsd.dk>
> > > Subject: UFS and BIO_DELETE
> > > X-URL: http://WWW.McKusick.COM/
> > > Reply-To: Kirk McKusick <mckusick at McKusick.COM>
> > >
> > > I enclose below my proposed patch to add BIO_DELETE to UFS
> > > (goes at the end of ffs_blkfree). As I have no way to test
> > > it, I am wondering if you could let me know if it works.
> > >
> > > Also, I am thinking of only enabling it for filesystems mounted
> > > with a new flag requesting the behavior since the geteblk is a
> > > rather expensive call for the usual no-op case.
> > >
> > > I did look at just allocating a `struct bio' as a local variable
> > > and using that, but it looked like I needed to also come up with a
> > > `producer' and/or `consumer' if I wanted to pass it to GEOM directly,
> > > so in the end I went with this more expensive solution. If there
> > > is an easy way to just pass a bio structure to GEOM, I would much
> > > prefer that approach.
> > >
> > > 	~Kirk
> > >
> > >
> > > *** ffs_alloc.c	Wed May 21 20:11:04 2008
> > > --- ffs_alloc.c.new	Wed May 21 20:10:50 2008
> > > ***************
> > > *** 1945,1950 ****
> > > --- 1945,1962 ----
> > >   	ACTIVECLEAR(fs, cg);
> > >   	UFS_UNLOCK(ump);
> > >   	bdwrite(bp);
> > > + 	/*
> > > + 	 * Request that the block be cleared.
> > > + 	 */
> > > + 	bp = geteblk(size);
> > > + 	bp->b_iocmd = BIO_DELETE;
> > > + 	bp->b_vp = devvp;
> > > + 	bp->b_blkno = fsbtodb(fs, bno);
> > > + 	bp->b_offset = dbtob(bp->b_blkno);
> > > + 	bp->b_iooffset = bp->b_offset;
> > > + 	bp->b_bcount = size;
> > > + 	BUF_KERNPROC(bp);
> > > + 	BO_STRATEGY(&devvp->v_bufobj, bp);
> > >   }
> > > 
> > >   #ifdef INVARIANTS
> > >
> > The UFS devvp contains the pointer to struct g_consumer in
> > devpp->v_bufobj->bo_private, so g_alloc_bio() and g_io_request() can be
> > used to start BIO_DELETE command without fake bufer allocation.
> > g_io_strategy() may be used as the sample.
> 
> Thanks for that update to moderize the patch!
> 
> > But, I have a question about the patch. The bitmap block in ffs_blkfree()
> > is handled with delayed write, by bdwrite() call. I think that it is
> > quite possible for BIO_DELETE to be executed before the bitmap block
> > is written and on-disk bit is set in bitmap. Would not this allow
> > for the blocks to be deleted before the bitmap is updated,
> > and potentially before on-disk pointers are cleared that points to
> > the blocks ? SU just rolls back the unfinished dependencies on write,
> > but BIO_DELETE cannot.
> 
> It is safe to do the BIO_DELETE here because at this point we have
> progressed through the file delete to the point where the inode that
> claimed this block has been updated on the disk with a NULL pointer.
> The only step that remains is to update the on-disk bitmap to reflect
> that the block is available. If the bitmap write is not done before
> a system crash the only action that will be taken with respect to
> the freed block is that either background fsck will restore it to
> the bitmap (SU) or the journal will restore it to the bitmap (SUJ).
> There is no case where either SU or SUJ will put it back into the
> file once we have reached this point.
Thanks for the explanation. On the other hand, can my scenario be real
for async mounts ?

> 
> > Also, shouldn't the BIO_DELETE only be issued for real devices,
> > and not the snapshots ?
> 
> They should also be issued for snapshots. The only time that snapshots
> give up blocks is when they are deleted. The blocks that they give up
> at that time are all copies of blocks that were later changed in the
> filesystem. With the snapshot gone there will be no remaining references
> to those blocks and they will be made available for reallocation. As
> such, it would be helpful if they had been TRIMMED.
> 
> > diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
> > index b740bbb..4ff57ae 100644
> > --- a/sys/ufs/ffs/ffs_alloc.c
> > +++ b/sys/ufs/ffs/ffs_alloc.c
> > @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$");
> > 
> >  #include <security/audit/audit.h>
> > 
> > +#include <geom/geom.h>
> > +
> >  #include <ufs/ufs/dir.h>
> >  #include <ufs/ufs/extattr.h>
> >  #include <ufs/ufs/quota.h>
> > @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
> >  	u_int cg;
> >  	u_int8_t *blksfree;
> >  	struct cdev *dev;
> > +	struct bio *bip;
> > 
> >  	cg = dtog(fs, bno);
> >  	if (devvp->v_type == VREG) {
> > @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
> >  		softdep_setup_blkfree(UFSTOVFS(ump), bp, bno,
> >  		    numfrags(fs, size), dephd);
> >  	bdwrite(bp);
> > +
> > +	if (devvp->v_type != VREG) {
> > +		bip = g_alloc_bio();
> > +		bip->bio_cmd = BIO_DELETE;
> > +		bip->bio_offset = dbtob(fsbtodb(fs, bno));
> > +		bip->bio_done = g_destroy_bio;
> > +		bip->bio_length = size;
> > +		g_io_request(bip,
> > +		    (struct g_consumer *)devvp->v_bufobj.bo_private);
> > +	}
> >  }
> > 
> >  #ifdef INVARIANTS
> 
> The above patch looks good though I would do it unconditionally
> (e.g., also for snapshots). It seems sensible to make it conditional
> on a sysctl variable so that folks could experiment with it more
> easily. And I would leave it off by default as non-SSD disks are
> unlikely to benefit from it. If it does prove useful for SSD disks
> then I would make it conditional on a filesystem flag so that it
> is possible to enable it on a filesystem-by-filesystem basis (e.g.,
> on for filesystems on the SSD and off for the rest).

I think it is better to have a flag in superblock instead of the mount
option. 

diff --git a/sbin/dumpfs/dumpfs.c b/sbin/dumpfs/dumpfs.c
index 38c05f6..fa562dc 100644
--- a/sbin/dumpfs/dumpfs.c
+++ b/sbin/dumpfs/dumpfs.c
@@ -253,9 +253,11 @@ dumpfs(const char *name)
 		printf("fs_flags expanded ");
 	if (fsflags & FS_NFS4ACLS)
 		printf("nfsv4acls ");
+	if (fsflags & FS_TRIM)
+		printf("trim ");
 	fsflags &= ~(FS_UNCLEAN | FS_DOSOFTDEP | FS_NEEDSFSCK | FS_INDEXDIRS |
 		     FS_ACLS | FS_MULTILABEL | FS_GJOURNAL | FS_FLAGS_UPDATED |
-		     FS_NFS4ACLS | FS_SUJ);
+		     FS_NFS4ACLS | FS_SUJ | FS_TRIM);
 	if (fsflags != 0)
 		printf("unknown flags (%#x)", fsflags);
 	putchar('\n');
diff --git a/sbin/tunefs/tunefs.c b/sbin/tunefs/tunefs.c
index b2ea602..0ed713d 100644
--- a/sbin/tunefs/tunefs.c
+++ b/sbin/tunefs/tunefs.c
@@ -82,11 +82,13 @@ int
 main(int argc, char *argv[])
 {
 	char *avalue, *jvalue, *Jvalue, *Lvalue, *lvalue, *Nvalue, *nvalue;
+	char *tvalue;
 	const char *special, *on;
 	const char *name;
 	int active;
 	int Aflag, aflag, eflag, evalue, fflag, fvalue, jflag, Jflag, Lflag;
 	int lflag, mflag, mvalue, Nflag, nflag, oflag, ovalue, pflag, sflag;
+	int tflag;
 	int svalue, Sflag, Svalue;
 	int ch, found_arg, i;
 	const char *chg[2];
@@ -101,7 +103,8 @@ main(int argc, char *argv[])
 	evalue = fvalue = mvalue = ovalue = svalue = Svalue = 0;
 	active = 0;
 	found_arg = 0;		/* At least one arg is required. */
-	while ((ch = getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:")) != -1)
+	while ((ch = getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:t:"))
+	    != -1)
 		switch (ch) {
 
 		case 'A':
@@ -268,6 +271,18 @@ main(int argc, char *argv[])
 			Sflag = 1;
 			break;
 
+		case 't':
+			found_arg = 1;
+			name = "trim";
+			tvalue = optarg;
+			if (strcmp(tvalue, "enable") != 0 &&
+			    strcmp(tvalue, "disable") != 0) {
+				errx(10, "bad %s (options are %s)",
+				    name, "`enable' or `disable'");
+			}
+			tflag = 1;
+			break;
+
 		default:
 			usage();
 		}
@@ -493,6 +508,24 @@ main(int argc, char *argv[])
 			sblock.fs_avgfpdir = svalue;
 		}
 	}
+	if (tflag) {
+		name = "issue TRIM to the disk";
+ 		if (strcmp(tvalue, "enable") == 0) {
+			if (sblock.fs_flags & FS_TRIM)
+				warnx("%s remains unchanged as enabled", name);
+			else {
+ 				sblock.fs_flags |= FS_TRIM;
+ 				warnx("%s set", name);
+			}
+ 		} else if (strcmp(tvalue, "disable") == 0) {
+			if ((~sblock.fs_flags & FS_TRIM) == FS_TRIM)
+				warnx("%s remains unchanged as disabled", name);
+			else {
+ 				sblock.fs_flags &= ~FS_TRIM;
+ 				warnx("%s cleared", name);
+			}
+ 		}
+	}
 
 	if (sbwrite(&disk, Aflag) == -1)
 		goto err;
@@ -1011,12 +1044,13 @@ out:
 void
 usage(void)
 {
-	fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n",
+	fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n%s\n",
 "usage: tunefs [-A] [-a enable | disable] [-e maxbpg] [-f avgfilesize]",
 "              [-J enable | disable] [-j enable | disable]", 
 "              [-L volname] [-l enable | disable] [-m minfree]",
 "              [-N enable | disable] [-n enable | disable]",
-"              [-o space | time] [-p] [-s avgfpdir] special | filesystem");
+"              [-o space | time] [-p] [-s avgfpdir] [-t enable | disable]",
+"              special | filesystem");
 	exit(2);
 }
 
@@ -1035,6 +1069,8 @@ printfs(void)
 		(sblock.fs_flags & FS_SUJ)? "enabled" : "disabled");
 	warnx("gjournal: (-J)                                     %s",
 		(sblock.fs_flags & FS_GJOURNAL)? "enabled" : "disabled");
+	warnx("trim: (-p)                                         %s", 
+		(sblock.fs_flags & FS_TRIM)? "enabled" : "disabled");
 	warnx("maximum blocks per file in a cylinder group: (-e)  %d",
 	      sblock.fs_maxbpg);
 	warnx("average file size: (-f)                            %d",
diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
index b740bbb..ed39aa3 100644
--- a/sys/ufs/ffs/ffs_alloc.c
+++ b/sys/ufs/ffs/ffs_alloc.c
@@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$");
 
 #include <security/audit/audit.h>
 
+#include <geom/geom.h>
+
 #include <ufs/ufs/dir.h>
 #include <ufs/ufs/extattr.h>
 #include <ufs/ufs/quota.h>
@@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
 	u_int cg;
 	u_int8_t *blksfree;
 	struct cdev *dev;
+	struct bio *bip;
 
 	cg = dtog(fs, bno);
 	if (devvp->v_type == VREG) {
@@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
 		softdep_setup_blkfree(UFSTOVFS(ump), bp, bno,
 		    numfrags(fs, size), dephd);
 	bdwrite(bp);
+
+	if (fs->fs_flags & FS_TRIM) {
+		bip = g_alloc_bio();
+		bip->bio_cmd = BIO_DELETE;
+		bip->bio_offset = dbtob(fsbtodb(fs, bno));
+		bip->bio_done = g_destroy_bio;
+		bip->bio_length = size;
+		g_io_request(bip,
+		    (struct g_consumer *)devvp->v_bufobj.bo_private);
+	}
 }
 
 #ifdef INVARIANTS
diff --git a/sys/ufs/ffs/fs.h b/sys/ufs/ffs/fs.h
index ba98ed3..13d9ede 100644
--- a/sys/ufs/ffs/fs.h
+++ b/sys/ufs/ffs/fs.h
@@ -417,6 +417,7 @@ CTASSERT(sizeof(struct fs) == 1376);
 #define FS_FLAGS_UPDATED 0x0080	/* flags have been moved to new location */
 #define FS_NFS4ACLS	0x0100	/* file system has NFSv4 ACLs enabled */
 #define FS_INDEXDIRS	0x0200	/* kernel supports indexed directories */
+#define	FS_TRIM		0x0400	/* issue BIO_DELETE for deleted blocks */
 
 /*
  * Macros to access bits in the fs_active array.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20101208/b4720cd6/attachment.pgp


More information about the freebsd-fs mailing list