TRIM support for UFS?
Kostik Belousov
kostikbel at gmail.com
Wed Dec 8 22:53:57 UTC 2010
On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote:
> > Date: Wed, 8 Dec 2010 23:49:09 +0200
> > From: Kostik Belousov <kostikbel at gmail.com>
> > To: Julian Elischer <julian at freebsd.org>
> > Cc: freebsd-fs at freebsd.org, pjd at freebsd.org,
> > Oliver Fromme <olli at lurza.secnetix.de>,
> > Kirk McKusick <mckusick at mckusick.com>
> > Subject: Re: TRIM support for UFS?
> >
> > On Wed, Dec 08, 2010 at 09:07:56AM -0800, Julian Elischer wrote:
> > >
> > > From: Kirk McKusick <mckusick at chez.mckusick.com>
> > > Date: Wed, 21 May 2008 13:19:18 -0700
> > > To: Poul-Henning Kamp <phk at critter.freebsd.dk>
> > > Subject: UFS and BIO_DELETE
> > > X-URL: http://WWW.McKusick.COM/
> > > Reply-To: Kirk McKusick <mckusick at McKusick.COM>
> > >
> > > I enclose below my proposed patch to add BIO_DELETE to UFS
> > > (goes at the end of ffs_blkfree). As I have no way to test
> > > it, I am wondering if you could let me know if it works.
> > >
> > > Also, I am thinking of only enabling it for filesystems mounted
> > > with a new flag requesting the behavior since the geteblk is a
> > > rather expensive call for the usual no-op case.
> > >
> > > I did look at just allocating a `struct bio' as a local variable
> > > and using that, but it looked like I needed to also come up with a
> > > `producer' and/or `consumer' if I wanted to pass it to GEOM directly,
> > > so in the end I went with this more expensive solution. If there
> > > is an easy way to just pass a bio structure to GEOM, I would much
> > > prefer that approach.
> > >
> > > ~Kirk
> > >
> > >
> > > *** ffs_alloc.c Wed May 21 20:11:04 2008
> > > --- ffs_alloc.c.new Wed May 21 20:10:50 2008
> > > ***************
> > > *** 1945,1950 ****
> > > --- 1945,1962 ----
> > > ACTIVECLEAR(fs, cg);
> > > UFS_UNLOCK(ump);
> > > bdwrite(bp);
> > > + /*
> > > + * Request that the block be cleared.
> > > + */
> > > + bp = geteblk(size);
> > > + bp->b_iocmd = BIO_DELETE;
> > > + bp->b_vp = devvp;
> > > + bp->b_blkno = fsbtodb(fs, bno);
> > > + bp->b_offset = dbtob(bp->b_blkno);
> > > + bp->b_iooffset = bp->b_offset;
> > > + bp->b_bcount = size;
> > > + BUF_KERNPROC(bp);
> > > + BO_STRATEGY(&devvp->v_bufobj, bp);
> > > }
> > >
> > > #ifdef INVARIANTS
> > >
> > The UFS devvp contains the pointer to struct g_consumer in
> > devpp->v_bufobj->bo_private, so g_alloc_bio() and g_io_request() can be
> > used to start BIO_DELETE command without fake bufer allocation.
> > g_io_strategy() may be used as the sample.
>
> Thanks for that update to moderize the patch!
>
> > But, I have a question about the patch. The bitmap block in ffs_blkfree()
> > is handled with delayed write, by bdwrite() call. I think that it is
> > quite possible for BIO_DELETE to be executed before the bitmap block
> > is written and on-disk bit is set in bitmap. Would not this allow
> > for the blocks to be deleted before the bitmap is updated,
> > and potentially before on-disk pointers are cleared that points to
> > the blocks ? SU just rolls back the unfinished dependencies on write,
> > but BIO_DELETE cannot.
>
> It is safe to do the BIO_DELETE here because at this point we have
> progressed through the file delete to the point where the inode that
> claimed this block has been updated on the disk with a NULL pointer.
> The only step that remains is to update the on-disk bitmap to reflect
> that the block is available. If the bitmap write is not done before
> a system crash the only action that will be taken with respect to
> the freed block is that either background fsck will restore it to
> the bitmap (SU) or the journal will restore it to the bitmap (SUJ).
> There is no case where either SU or SUJ will put it back into the
> file once we have reached this point.
Thanks for the explanation. On the other hand, can my scenario be real
for async mounts ?
>
> > Also, shouldn't the BIO_DELETE only be issued for real devices,
> > and not the snapshots ?
>
> They should also be issued for snapshots. The only time that snapshots
> give up blocks is when they are deleted. The blocks that they give up
> at that time are all copies of blocks that were later changed in the
> filesystem. With the snapshot gone there will be no remaining references
> to those blocks and they will be made available for reallocation. As
> such, it would be helpful if they had been TRIMMED.
>
> > diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
> > index b740bbb..4ff57ae 100644
> > --- a/sys/ufs/ffs/ffs_alloc.c
> > +++ b/sys/ufs/ffs/ffs_alloc.c
> > @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$");
> >
> > #include <security/audit/audit.h>
> >
> > +#include <geom/geom.h>
> > +
> > #include <ufs/ufs/dir.h>
> > #include <ufs/ufs/extattr.h>
> > #include <ufs/ufs/quota.h>
> > @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
> > u_int cg;
> > u_int8_t *blksfree;
> > struct cdev *dev;
> > + struct bio *bip;
> >
> > cg = dtog(fs, bno);
> > if (devvp->v_type == VREG) {
> > @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
> > softdep_setup_blkfree(UFSTOVFS(ump), bp, bno,
> > numfrags(fs, size), dephd);
> > bdwrite(bp);
> > +
> > + if (devvp->v_type != VREG) {
> > + bip = g_alloc_bio();
> > + bip->bio_cmd = BIO_DELETE;
> > + bip->bio_offset = dbtob(fsbtodb(fs, bno));
> > + bip->bio_done = g_destroy_bio;
> > + bip->bio_length = size;
> > + g_io_request(bip,
> > + (struct g_consumer *)devvp->v_bufobj.bo_private);
> > + }
> > }
> >
> > #ifdef INVARIANTS
>
> The above patch looks good though I would do it unconditionally
> (e.g., also for snapshots). It seems sensible to make it conditional
> on a sysctl variable so that folks could experiment with it more
> easily. And I would leave it off by default as non-SSD disks are
> unlikely to benefit from it. If it does prove useful for SSD disks
> then I would make it conditional on a filesystem flag so that it
> is possible to enable it on a filesystem-by-filesystem basis (e.g.,
> on for filesystems on the SSD and off for the rest).
I think it is better to have a flag in superblock instead of the mount
option.
diff --git a/sbin/dumpfs/dumpfs.c b/sbin/dumpfs/dumpfs.c
index 38c05f6..fa562dc 100644
--- a/sbin/dumpfs/dumpfs.c
+++ b/sbin/dumpfs/dumpfs.c
@@ -253,9 +253,11 @@ dumpfs(const char *name)
printf("fs_flags expanded ");
if (fsflags & FS_NFS4ACLS)
printf("nfsv4acls ");
+ if (fsflags & FS_TRIM)
+ printf("trim ");
fsflags &= ~(FS_UNCLEAN | FS_DOSOFTDEP | FS_NEEDSFSCK | FS_INDEXDIRS |
FS_ACLS | FS_MULTILABEL | FS_GJOURNAL | FS_FLAGS_UPDATED |
- FS_NFS4ACLS | FS_SUJ);
+ FS_NFS4ACLS | FS_SUJ | FS_TRIM);
if (fsflags != 0)
printf("unknown flags (%#x)", fsflags);
putchar('\n');
diff --git a/sbin/tunefs/tunefs.c b/sbin/tunefs/tunefs.c
index b2ea602..0ed713d 100644
--- a/sbin/tunefs/tunefs.c
+++ b/sbin/tunefs/tunefs.c
@@ -82,11 +82,13 @@ int
main(int argc, char *argv[])
{
char *avalue, *jvalue, *Jvalue, *Lvalue, *lvalue, *Nvalue, *nvalue;
+ char *tvalue;
const char *special, *on;
const char *name;
int active;
int Aflag, aflag, eflag, evalue, fflag, fvalue, jflag, Jflag, Lflag;
int lflag, mflag, mvalue, Nflag, nflag, oflag, ovalue, pflag, sflag;
+ int tflag;
int svalue, Sflag, Svalue;
int ch, found_arg, i;
const char *chg[2];
@@ -101,7 +103,8 @@ main(int argc, char *argv[])
evalue = fvalue = mvalue = ovalue = svalue = Svalue = 0;
active = 0;
found_arg = 0; /* At least one arg is required. */
- while ((ch = getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:")) != -1)
+ while ((ch = getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:t:"))
+ != -1)
switch (ch) {
case 'A':
@@ -268,6 +271,18 @@ main(int argc, char *argv[])
Sflag = 1;
break;
+ case 't':
+ found_arg = 1;
+ name = "trim";
+ tvalue = optarg;
+ if (strcmp(tvalue, "enable") != 0 &&
+ strcmp(tvalue, "disable") != 0) {
+ errx(10, "bad %s (options are %s)",
+ name, "`enable' or `disable'");
+ }
+ tflag = 1;
+ break;
+
default:
usage();
}
@@ -493,6 +508,24 @@ main(int argc, char *argv[])
sblock.fs_avgfpdir = svalue;
}
}
+ if (tflag) {
+ name = "issue TRIM to the disk";
+ if (strcmp(tvalue, "enable") == 0) {
+ if (sblock.fs_flags & FS_TRIM)
+ warnx("%s remains unchanged as enabled", name);
+ else {
+ sblock.fs_flags |= FS_TRIM;
+ warnx("%s set", name);
+ }
+ } else if (strcmp(tvalue, "disable") == 0) {
+ if ((~sblock.fs_flags & FS_TRIM) == FS_TRIM)
+ warnx("%s remains unchanged as disabled", name);
+ else {
+ sblock.fs_flags &= ~FS_TRIM;
+ warnx("%s cleared", name);
+ }
+ }
+ }
if (sbwrite(&disk, Aflag) == -1)
goto err;
@@ -1011,12 +1044,13 @@ out:
void
usage(void)
{
- fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n",
+ fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n%s\n",
"usage: tunefs [-A] [-a enable | disable] [-e maxbpg] [-f avgfilesize]",
" [-J enable | disable] [-j enable | disable]",
" [-L volname] [-l enable | disable] [-m minfree]",
" [-N enable | disable] [-n enable | disable]",
-" [-o space | time] [-p] [-s avgfpdir] special | filesystem");
+" [-o space | time] [-p] [-s avgfpdir] [-t enable | disable]",
+" special | filesystem");
exit(2);
}
@@ -1035,6 +1069,8 @@ printfs(void)
(sblock.fs_flags & FS_SUJ)? "enabled" : "disabled");
warnx("gjournal: (-J) %s",
(sblock.fs_flags & FS_GJOURNAL)? "enabled" : "disabled");
+ warnx("trim: (-p) %s",
+ (sblock.fs_flags & FS_TRIM)? "enabled" : "disabled");
warnx("maximum blocks per file in a cylinder group: (-e) %d",
sblock.fs_maxbpg);
warnx("average file size: (-f) %d",
diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
index b740bbb..ed39aa3 100644
--- a/sys/ufs/ffs/ffs_alloc.c
+++ b/sys/ufs/ffs/ffs_alloc.c
@@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$");
#include <security/audit/audit.h>
+#include <geom/geom.h>
+
#include <ufs/ufs/dir.h>
#include <ufs/ufs/extattr.h>
#include <ufs/ufs/quota.h>
@@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
u_int cg;
u_int8_t *blksfree;
struct cdev *dev;
+ struct bio *bip;
cg = dtog(fs, bno);
if (devvp->v_type == VREG) {
@@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
softdep_setup_blkfree(UFSTOVFS(ump), bp, bno,
numfrags(fs, size), dephd);
bdwrite(bp);
+
+ if (fs->fs_flags & FS_TRIM) {
+ bip = g_alloc_bio();
+ bip->bio_cmd = BIO_DELETE;
+ bip->bio_offset = dbtob(fsbtodb(fs, bno));
+ bip->bio_done = g_destroy_bio;
+ bip->bio_length = size;
+ g_io_request(bip,
+ (struct g_consumer *)devvp->v_bufobj.bo_private);
+ }
}
#ifdef INVARIANTS
diff --git a/sys/ufs/ffs/fs.h b/sys/ufs/ffs/fs.h
index ba98ed3..13d9ede 100644
--- a/sys/ufs/ffs/fs.h
+++ b/sys/ufs/ffs/fs.h
@@ -417,6 +417,7 @@ CTASSERT(sizeof(struct fs) == 1376);
#define FS_FLAGS_UPDATED 0x0080 /* flags have been moved to new location */
#define FS_NFS4ACLS 0x0100 /* file system has NFSv4 ACLs enabled */
#define FS_INDEXDIRS 0x0200 /* kernel supports indexed directories */
+#define FS_TRIM 0x0400 /* issue BIO_DELETE for deleted blocks */
/*
* Macros to access bits in the fs_active array.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20101208/b4720cd6/attachment.pgp
More information about the freebsd-fs
mailing list