fsync: giving up on dirty on ufs partitions running vfs_write_suspend()
Kirk McKusick
mckusick at mckusick.com
Sat Sep 16 20:16:52 UTC 2017
> From: Konstantin Belousov <kostikbel at gmail.com>
> Date: Sat, 16 Sep 2017 21:31:17 +0300
> To: Andreas Longwitz <longwitz at incore.de>
> Subject: Re: fsync: giving up on dirty on ufs partitions running
> vfs_write_suspend()
> Cc: Kirk McKusick <mckusick at mckusick.com>, freebsd-fs at freebsd.org
>
> On Sat, Sep 16, 2017 at 01:44:44PM +0200, Andreas Longwitz wrote:
>> Ok, I understand your thoughts about the "big loop" and I agree. On the
>> other side it is not easy to measure the progress of the dirty buffers
>> because these buffers a created from another process at the same time we
>> loop in vop_stdfsync(). I can explain from my tests, where I use the
>> following loop on a gjournaled partition:
>>
>> while true; do
>> cp -p bigfile bigfile.tmp
>> rm bigfile
>> mv bigfile.tmp bigfile
>> done
>>
>> When g_journal_switcher starts vfs_write_suspend() immediately after the
>> rm command has started to do his "rm stuff" (ufs_inactive, ffs_truncate,
>> ffs_indirtrunc at different levels, ffs_blkfree, ...) the we must loop
>> (that means wait) in vop_stdfsync() until the rm process has finished
>> his work. A lot of locking overhead is needed for coordination.
>> Returning from bufobj_wwait() we always see one left dirty buffer (very
>> seldom two), that is not optimal. Therefore I have tried the following
>> patch (instead of bumping maxretry):
>>
>> --- vfs_default.c.orig 2016-10-24 12:26:57.000000000 +0200
>> +++ vfs_default.c 2017-09-15 12:30:44.792274000 +0200
>> @@ -688,6 +688,8 @@
>> bremfree(bp);
>> bawrite(bp);
>> }
>> + if( maxretry < 1000)
>> + DELAY(waitns);
>> BO_LOCK(bo);
>> goto loop2;
>> }
>>
>> with different values for waitns. If I run the testloop 5000 times on my
>> testserver, the problem is triggered always round about 10 times. The
>> results from several runs are given in the following table:
>>
>> waitns max time max loops
>> -------------------------------
>> no DELAY 0,5 sec 8650 (maxres = 100000)
>> 1000 0,2 sec 24
>> 10000 0,8 sec 3
>> 100000 7,2 sec 3
>>
>> "time" means spent time in vop_stdfsync() measured from entry to return
>> by a dtrace script. "loops" means the number of times "--maxretry" is
>> executed. I am not sure if DELAY() is the best way to wait or if waiting
>> has other drawbacks. Anyway with DELAY() it does not take more than five
>> iterazions to finish.
>
> This is not explicitly stated in your message, but I suppose that the
> vop_stdfsync() is called due to VOP_FSYNC(devvp, MNT_SUSPEND) call in
> ffs_sync(). Am I right ?
>
> If yes, then the solution is most likely to continue looping in the
> vop_stdfsync() until there is no dirty buffers or the mount point
> mnt_secondary_writes counter is zero. The pauses trick you tried might
> be still useful, e.g. after some threshold of the performed loop
> iterations.
>
> Some problem with this suggestion is that vop_stdfsync(devvp) needs to
> know that the vnode is devvp for some UFS mount. The struct cdev,
> acessible as v_rdev, has the pointer to struct mount. You should be
> accurate to not access freed or reused struct mount.
I concur with Kostik's comments. It would be helpful if you could try
out his suggestions and see if the produces a better result. Once you
converge on a solution, I will ensure that it gets checked in.
~Kirk
More information about the freebsd-fs
mailing list