Per-mount syncer threads and fanout for pagedaemon cleaning

Tue Dec 27 17:05:49 UTC 2011

2011/12/27  <mdf at freebsd.org>:
> On Tue, Dec 27, 2011 at 8:05 AM, Attilio Rao <attilio at freebsd.org> wrote:
>> 2011/12/27 Giovanni Trematerra <giovanni.trematerra at gmail.com>:
>>> On Mon, Dec 26, 2011 at 9:24 PM, Venkatesh Srinivas
>>> <vsrinivas at dragonflybsd.org> wrote:
>>>> Hi!
>>>>
>>>> I've been playing with two things in DragonFly that might be of interest
>>>> here.
>>>>
>>>> Thing #1 :=
>>>>
>>>> First, per-mountpoint syncer threads. Currently there is a single thread,
>>>> 'syncer', which periodically calls fsync() on dirty vnodes from every mount,
>>>> along with calling vfs_sync() on each filesystem itself (via syncer vnodes).
>>>>
>>>> My patch modifies this to create syncer threads for mounts that request it.
>>>> For these mounts, vnodes are synced from their mount-specific thread rather
>>>> than the global syncer.
>>>>
>>>> The idea is that periodic fsync/sync operations from one filesystem should
>>>> not
>>>> stall or delay synchronization for other ones.
>>>> The patch was fairly simple:
>>>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1efc595db0db397b4365f08b640
>>>>
>>>
>>> There's something WIP by attilio@ on that area.
>>> you might want to take a look at
>>> http://people.freebsd.org/~attilio/syncer_alpha_15.diff
>>>
>>> I don't know what hammerfs needs but UFS/FFS and buffer cache make a good
>>> job performance-wise and so the authors are skeptical about the boost that such
>>> a change can give. We believe that brain cycles need to be spent on
>>> other pieces of the system such as ARC and ZFS.
>>
>> More specifically, it is likely that focusing on UFS and buffer cache
>> for performance is not really useful, we should drive our efforts over
>> ARC and ZFS.
>> Also, the real bottlenecks in our I/O paths are in GEOM
>> single-threaded design, lack of unmapped I/O functionality, possibly
>> lack of proritized I/O, etc.
>
> Indeed, Isilon (and probably other vendors as well) entirely skip
> VFS_SYNC when the WAIT argument is MNT_LAZY.  Since we're a
> distributed journalled filesystem, syncing via a system thread is not
> a relevant operation; i.e. all writes that have exited a VOP_WRITE or
> similar operation are already in reasonably stable storage in a
> journal on the relevant nodes.
>
> However, we do then have our own threads running on each node to flush
> the journal regularly (in addition to when it fills up), and I don't
> know enough about this to know if it could be fit into the syncer
> thread idea or if it's too tied in somehow to our architecture.

I'm not really sure how does journaling is implemented on OneFS, but
when I made this patch SU+J wasn't yet there.
Also, this patch just adds the infrastructure for a multithreaded and
configurable syncer, which means it still requires the UFS bits for
skipping the "double-syncing" (alias the MNT_LAZY skippage you
mentioned).

Attilio

-- 
Peace can only be achieved by understanding - A. Einstein