Per-mount syncer threads and fanout for pagedaemon cleaning

Tue Dec 27 17:22:07 UTC 2011

On Tue, Dec 27, 2011 at 9:05 AM, Attilio Rao <attilio at freebsd.org> wrote:
> 2011/12/27  <mdf at freebsd.org>:
>> On Tue, Dec 27, 2011 at 8:05 AM, Attilio Rao <attilio at freebsd.org> wrote:
>>> 2011/12/27 Giovanni Trematerra <giovanni.trematerra at gmail.com>:
>>>> On Mon, Dec 26, 2011 at 9:24 PM, Venkatesh Srinivas
>>>> <vsrinivas at dragonflybsd.org> wrote:
>>>>> Hi!
>>>>>
>>>>> I've been playing with two things in DragonFly that might be of interest
>>>>> here.
>>>>>
>>>>> Thing #1 :=
>>>>>
>>>>> First, per-mountpoint syncer threads. Currently there is a single thread,
>>>>> 'syncer', which periodically calls fsync() on dirty vnodes from every mount,
>>>>> along with calling vfs_sync() on each filesystem itself (via syncer vnodes).
>>>>>
>>>>> My patch modifies this to create syncer threads for mounts that request it.
>>>>> For these mounts, vnodes are synced from their mount-specific thread rather
>>>>> than the global syncer.
>>>>>
>>>>> The idea is that periodic fsync/sync operations from one filesystem should
>>>>> not
>>>>> stall or delay synchronization for other ones.
>>>>> The patch was fairly simple:
>>>>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1efc595db0db397b4365f08b640
>>>>>
>>>>
>>>> There's something WIP by attilio@ on that area.
>>>> you might want to take a look at
>>>> http://people.freebsd.org/~attilio/syncer_alpha_15.diff
>>>>
>>>> I don't know what hammerfs needs but UFS/FFS and buffer cache make a good
>>>> job performance-wise and so the authors are skeptical about the boost that such
>>>> a change can give. We believe that brain cycles need to be spent on
>>>> other pieces of the system such as ARC and ZFS.
>>>
>>> More specifically, it is likely that focusing on UFS and buffer cache
>>> for performance is not really useful, we should drive our efforts over
>>> ARC and ZFS.
>>> Also, the real bottlenecks in our I/O paths are in GEOM
>>> single-threaded design, lack of unmapped I/O functionality, possibly
>>> lack of proritized I/O, etc.
>>
>> Indeed, Isilon (and probably other vendors as well) entirely skip
>> VFS_SYNC when the WAIT argument is MNT_LAZY.  Since we're a
>> distributed journalled filesystem, syncing via a system thread is not
>> a relevant operation; i.e. all writes that have exited a VOP_WRITE or
>> similar operation are already in reasonably stable storage in a
>> journal on the relevant nodes.
>>
>> However, we do then have our own threads running on each node to flush
>> the journal regularly (in addition to when it fills up), and I don't
>> know enough about this to know if it could be fit into the syncer
>> thread idea or if it's too tied in somehow to our architecture.
>
> I'm not really sure how does journaling is implemented on OneFS, but
> when I made this patch SU+J wasn't yet there.
> Also, this patch just adds the infrastructure for a multithreaded and
> configurable syncer, which means it still requires the UFS bits for
> skipping the "double-syncing" (alias the MNT_LAZY skippage you
> mentioned).

Right, I don't object to any changes relating to multiple sync
threads, etc., just trying to offer a vendor viewpoint.  Though having
one thread per mount would allow for a different sync interval for
each filesystem which can be of advantage.

Right after I did Isilon's last FreeBSD merge (it seems like a long
time ago now), I wanted to look into what it would take to eliminate
our specialed journal flush thread (i.e. tie it into VFS_SYNC), but
one objection was that then the flush interval would not be
configurable separately from the one for our UFS partition.

Cheers,
matthew