ZFS ARC and mmap/page cache coherency question

Paul Koch paul.koch137 at gmail.com
Sun Jul 3 02:30:19 UTC 2016


Is there a "long story", or is mmap() performance on ZFS doomed for the
foreseeable future ?

	Paul.

> Short story: ZFS was tacked on the kernel and was never properly
> integrated into the VM page management, which leads to DRAMATIC poor
> performance for anything which uses mmap() for write IO. This was
> solved in Oracle Solaris with the great VM allocator rewrite which
> landed after Opensolaris was made closed source again.
> 
> Without a complete rewrite of the VM system this problem is unsolvable.
> 
> Ced
> 
> On 30 June 2016 at 06:06, Paul Koch <paul.koch137 at gmail.com> wrote:
> >
> > Posted this to -stable on the 15th June, but no feedback...
> >
> > We are trying to understand a performance issue when syncing large mmap'ed
> > files on ZFS.
> >
> > Example test box setup:
> >  FreeBSD 10.3-p5
> >  Intel i7-5820K 3.30GHz with 64G RAM
> >  6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe
> >
> > Read performance of a sequentially written large file on the pool is
> > typically around 950Mbytes/sec using dd.
> >
> > Our software mmap's some large database files using MAP_NOSYNC, and we
> > call fsync() every 10 minutes when we know the file system is mostly
> > idle.  In our test setup, the database files are 1.1G, 2G, 1.4G, 12G,
> > 4.7G and ~20 small files (under 10M).  All of the memory pages in the
> > mmap'ed files are updated every minute with new values, so the entire
> > mmap'ed file needs to be synced to disk, not just fragments.
> >
> > When the 10 minute fsync() occurs, gstat typically shows very little disk
> > reads and very high write speeds, which is what we expect.  But, every 80
> > minutes we process the data in the large mmap'ed files and store it in
> > highly compressed blocks of a ~300G file using pread/pwrite (i.e. not
> > mmap'ed). After that, the performance of the next fsync() of the mmap'ed
> > files falls off a cliff.  We are assuming it is because the ARC has
> > thrown away the cached data of the mmap'ed files.  gstat shows lots of
> > read/write contention and lots of things tend to stall waiting for disk.
> >
> > Is this just a lack of ZFS ARC and page cache coherency ??
> >
> > Is there a way to prime the ARC with the mmap'ed files again before we
> > call fsync() ?
> >
> > We've tried cat and read() on the mmap'ed files but doesn't seem to touch
> > the disk at all and the fsync() performance is still poor, so it looks
> > like the ARC is not being filled.  msync() doesn't seem to be much
> > different. mincore() stats show the mmap'ed data is entirely incore and
> > referenced.
> >
> >         Paul.
> > _______________________________________________
> > freebsd-hackers at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > To unsubscribe, send any mail to
> > "freebsd-hackers-unsubscribe at freebsd.org"  


More information about the freebsd-hackers mailing list