syncing large mmaped files

John Baldwin jhb at freebsd.org
Thu Oct 18 19:43:29 UTC 2012


On Thursday, October 18, 2012 12:42:18 pm Konstantin Belousov wrote:
> On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote:
> > On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote:
> > > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote:
> > > > 
> > > > I want to work with large (1-10G) files in memory but eventually sync
> > > > them back out to disk. The problem is that the sync process appears to
> > > > lock the file in kernel for the duration of the sync, which can run
> > > > into minutes. This prevents other processes from reading from the file
> > > > (unless they already have it mapped) for this whole time. Is there
> > > > any way to prevent this? I think I read in a post somewhere about
> > > > openbsd implementing partial-writes when it hits a file with lots of
> > > > dirty pages in order to prevent this. Is there anything available for
> > > > FreeBSD or is there another way around it?
> > > >
> > > No, currently the vnode lock is held exclusive for the whole duration
> > > of the msync(2) syscall or its analog from the syncer.
> > > 
> > > Making a change to periodically drop the vnode lock in
> > > vm_object_page_clean() might be possible, but requires the benchmarking
> > > to make sure that we do not pessimize the common case. Also, this opens
> > > a possibility for the vnode reclamation meantime.
> > 
> > You can simulate this in userland by breaking up your msync() into multiple
> > msync() calls where each call just syncs a portion of the file.
> Be aware that this is much-much slower than msyncing the whole file, even
> if file is very large. The reason is that pager initiates asynchronous
> _immediate_ clustered write for such situations. Async writes (AKA
> bdwrite()) are only specified for full range msyncing.

Ugh.  It would seem to me that msync(MS_ASYNC) should be doing delayed
writes.

> > > Anyway, note that you cannot 'work with large files in memory', even if
> > > you have enough RAM and no pressure to hold all the file pages resident.
> > > The syncer will do a writeback periodically regardless of the application
> > > calling msync(2) or not, with the interval of approximately 30 seconds.
> > 
> > You can mmap with MAP_NOSYNC to prevent the syncer from writing the file out
> > every 30 seconds.
> 
> This also prevents msync(2) from syncing the region. The flag is fine
> for throw-away data, but not for the scenario that was described, I
> think.

Oof.  I could see that in certain situations you might want to control this
behavior from an application (similar to how I now make use of fadvise() at
work).  Having a way to disable syncer but having msync(MS_ASYNC) do
something useful would be good.

-- 
John Baldwin


More information about the freebsd-hackers mailing list