syncing large mmaped files
    Tristan Verniquet 
    tris_vern at hotmail.com
       
    Fri Oct 19 00:11:43 UTC 2012
    
    
  
> From: jhb at freebsd.org
> To: kostikbel at gmail.com
> Subject: Re: syncing large mmaped files
> Date: Thu, 18 Oct 2012 15:43:25 -0400
> CC: freebsd-hackers at freebsd.org; tris_vern at hotmail.com
> 
> On Thursday, October 18, 2012 12:42:18 pm Konstantin Belousov wrote:
> > On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote:
> > > On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote:
> > > > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote:
> > > > > 
> > > > > I want to work with large (1-10G) files in memory but eventually sync
> > > > > them back out to disk. The problem is that the sync process appears to
> > > > > lock the file in kernel for the duration of the sync, which can run
> > > > > into minutes. This prevents other processes from reading from the file
> > > > > (unless they already have it mapped) for this whole time. Is there
> > > > > any way to prevent this? I think I read in a post somewhere about
> > > > > openbsd implementing partial-writes when it hits a file with lots of
> > > > > dirty pages in order to prevent this. Is there anything available for
> > > > > FreeBSD or is there another way around it?
> > > > >
> > > > No, currently the vnode lock is held exclusive for the whole duration
> > > > of the msync(2) syscall or its analog from the syncer.
> > > > 
> > > > Making a change to periodically drop the vnode lock in
> > > > vm_object_page_clean() might be possible, but requires the benchmarking
> > > > to make sure that we do not pessimize the common case. Also, this opens
> > > > a possibility for the vnode reclamation meantime.
> > > 
> > > You can simulate this in userland by breaking up your msync() into multiple
> > > msync() calls where each call just syncs a portion of the file.
> > Be aware that this is much-much slower than msyncing the whole file, even
> > if file is very large. The reason is that pager initiates asynchronous
> > _immediate_ clustered write for such situations. Async writes (AKA
> > bdwrite()) are only specified for full range msyncing.
> 
> Ugh.  It would seem to me that msync(MS_ASYNC) should be doing delayed
> writes.
Ahh, using MS_ASYNC seems to get me the behaviour I was looking for. It is just as fast as fsync for cases when all the pages are dirtied but it releases the lock allowing other programs to open and read the file. So it seems to be doing what I would expect.
> > > > Anyway, note that you cannot 'work with large files in memory', even if
> > > > you have enough RAM and no pressure to hold all the file pages resident.
> > > > The syncer will do a writeback periodically regardless of the application
> > > > calling msync(2) or not, with the interval of approximately 30 seconds.
> > > 
> > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the file out
> > > every 30 seconds.
> > 
> > This also prevents msync(2) from syncing the region. The flag is fine
> > for throw-away data, but not for the scenario that was described, I
> > think.
> 
> Oof.  I could see that in certain situations you might want to control this
> behavior from an application (similar to how I now make use of fadvise() at
> work).  Having a way to disable syncer but having msync(MS_ASYNC) do
> something useful would be good.
> 
When I map using MAP_NOSYNC I still seem to be able to msync(2) the regions? I see memory move from Wired/Active to Invalid and the disk is busy.
The madvise man page has a MADV_AUTOSYNC section which says that pages that are already dirtied can be guaranteed to be reverted using msync(2) or fsync(2).  This is FreeBSD 8.3. So even if there is something wrong with sync'ing MAP_NOSYNC pages, I guess I could always madvise MADV_AUTOSYNC them first.
> -- 
> John Baldwin
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
 		 	   		  
    
    
More information about the freebsd-hackers
mailing list