svn commit: r231160 - head/sys/ufs/ffs

Tue Feb 7 20:43:28 UTC 2012

Author: mckusick
Date: Tue Feb  7 20:43:28 2012
New Revision: 231160
URL: http://svn.freebsd.org/changeset/base/231160

Log:
  In the original days of BSD, a sync was issued on every filesystem
  every 30 seconds. This spike in I/O caused the system to pause every
  30 seconds which was quite annoying. So, the way that sync worked
  was changed so that when a vnode was first dirtied, it was put on
  a 30-second cleaning queue (see the syncer_workitem_pending queues
  in kern/vfs_subr.c). If the file has not been written or deleted
  after 30 seconds, the syncer pushes it out. As the syncer runs once
  per second, dirty files are trickled out slowly over the 30-second
  period instead of all at once by a call to sync(2).
  
  The one drawback to this is that it does not cover the filesystem
  metadata. To handle the metadata, vfs_allocate_syncvnode() is called
  to create a "filesystem syncer vnode" at mount time which cycles
  around the cleaning queue being sync'ed every 30 seconds. In the
  original design, the only things it would sync for UFS were the
  filesystem metadata: inode blocks, cylinder group bitmaps, and the
  superblock (e.g., by VOP_FSYNC'ing devvp, the device vnode from
  which the filesystem is mounted).
  
  Somewhere in its path to integration with FreeBSD the flushing of
  the filesystem syncer vnode got changed to sync every vnode associated
  with the filesystem. The result of this change is to return to the
  old filesystem-wide flush every 30-seconds behavior and makes the
  whole 30-second delay per vnode useless.
  
  This change goes back to the originally intended trickle out sync
  behavior. Key to ensuring that all the intended semantics are
  preserved (e.g., that all inode updates get flushed within a bounded
  period of time) is that all inode modifications get pushed to their
  corresponding inode blocks so that the metadata flush by the
  filesystem syncer vnode gets them to the disk in a timely way.
  Thanks to Konstantin Belousov (kib@) for doing the audit and commit
  -r231122 which ensures that all of these updates are being made.
  
  Reviewed by:    kib
  Tested by:      scottl
  MFC after:      2 weeks

Modified:
  head/sys/ufs/ffs/ffs_vfsops.c

Modified: head/sys/ufs/ffs/ffs_vfsops.c
==============================================================================

--- head/sys/ufs/ffs/ffs_vfsops.c	Tue Feb  7 20:24:52 2012	(r231159)
+++ head/sys/ufs/ffs/ffs_vfsops.c	Tue Feb  7 20:43:28 2012	(r231160)
@@ -1436,17 +1436,26 @@ ffs_sync(mp, waitfor)
 	int softdep_accdeps;
 	struct bufobj *bo;
 
+	wait = 0;
+	suspend = 0;
+	suspended = 0;
 	td = curthread;
 	fs = ump->um_fs;
 	if (fs->fs_fmod != 0 && fs->fs_ronly != 0 && ump->um_fsckpid == 0)
 		panic("%s: ffs_sync: modification on read-only filesystem",
 		    fs->fs_fsmnt);
 	/*
+	 * For a lazy sync, we just care about the filesystem metadata.
+	 */
+	if (waitfor == MNT_LAZY) {
+		secondary_accwrites = 0;
+		secondary_writes = 0;
+		lockreq = 0;
+		goto metasync;
+	}
+	/*
 	 * Write back each (modified) inode.
 	 */
-	wait = 0;
-	suspend = 0;
-	suspended = 0;
 	lockreq = LK_EXCLUSIVE | LK_NOWAIT;
 	if (waitfor == MNT_SUSPEND) {
 		suspend = 1;
@@ -1517,11 +1526,12 @@ loop:
 #ifdef QUOTA
 	qsync(mp);
 #endif
+
+metasync:
 	devvp = ump->um_devvp;
 	bo = &devvp->v_bufobj;
 	BO_LOCK(bo);
-	if (waitfor != MNT_LAZY &&
-	    (bo->bo_numoutput > 0 || bo->bo_dirty.bv_cnt > 0)) {
+	if (bo->bo_numoutput > 0 || bo->bo_dirty.bv_cnt > 0) {
 		BO_UNLOCK(bo);
 		vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY);
 		if ((error = VOP_FSYNC(devvp, waitfor, td)) != 0)