Per-mount syncer threads and fanout for pagedaemon cleaning

Tue Dec 27 16:59:43 UTC 2011

On Tue, Dec 27, 2011 at 8:05 AM, Attilio Rao <attilio at freebsd.org> wrote:
> 2011/12/27 Giovanni Trematerra <giovanni.trematerra at gmail.com>:
>> On Mon, Dec 26, 2011 at 9:24 PM, Venkatesh Srinivas
>> <vsrinivas at dragonflybsd.org> wrote:
>>> Hi!
>>>
>>> I've been playing with two things in DragonFly that might be of interest
>>> here.
>>>
>>> Thing #1 :=
>>>
>>> First, per-mountpoint syncer threads. Currently there is a single thread,
>>> 'syncer', which periodically calls fsync() on dirty vnodes from every mount,
>>> along with calling vfs_sync() on each filesystem itself (via syncer vnodes).
>>>
>>> My patch modifies this to create syncer threads for mounts that request it.
>>> For these mounts, vnodes are synced from their mount-specific thread rather
>>> than the global syncer.
>>>
>>> The idea is that periodic fsync/sync operations from one filesystem should
>>> not
>>> stall or delay synchronization for other ones.
>>> The patch was fairly simple:
>>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1efc595db0db397b4365f08b640
>>>
>>
>> There's something WIP by attilio@ on that area.
>> you might want to take a look at
>> http://people.freebsd.org/~attilio/syncer_alpha_15.diff
>>
>> I don't know what hammerfs needs but UFS/FFS and buffer cache make a good
>> job performance-wise and so the authors are skeptical about the boost that such
>> a change can give. We believe that brain cycles need to be spent on
>> other pieces of the system such as ARC and ZFS.
>
> More specifically, it is likely that focusing on UFS and buffer cache
> for performance is not really useful, we should drive our efforts over
> ARC and ZFS.
> Also, the real bottlenecks in our I/O paths are in GEOM
> single-threaded design, lack of unmapped I/O functionality, possibly
> lack of proritized I/O, etc.

Indeed, Isilon (and probably other vendors as well) entirely skip
VFS_SYNC when the WAIT argument is MNT_LAZY.  Since we're a
distributed journalled filesystem, syncing via a system thread is not
a relevant operation; i.e. all writes that have exited a VOP_WRITE or
similar operation are already in reasonably stable storage in a
journal on the relevant nodes.

However, we do then have our own threads running on each node to flush
the journal regularly (in addition to when it fills up), and I don't
know enough about this to know if it could be fit into the syncer
thread idea or if it's too tied in somehow to our architecture.

Cheers,
matthew