mmap() incoherency on hi I/O load (FS is zfs)

Konstantin Belousov kostikbel at gmail.com
Wed Jul 4 09:06:43 UTC 2012


On Wed, Jul 04, 2012 at 11:07:36AM +0300, Pavlo wrote:
> 
> 
> 
> --- Original message ---
> From: "Pavlo" <devgs at ukr.net>
> To: freebsd-fs at freebsd.org
> Date: 14 June 2012, 13:30:20
> Subject: mmap() incoherency on hi I/O load (FS is zfs)
> 
> 
> > There's a case when some parts of files that are mapped and then
> modified getting corrupted. By corrupted I mean some data is ok (one that
> was written using write()/pwrite()) but some looks like it never existed.
> Like it was some time in buffers, when several processes simultaneously
> (of course access was synchronised) used shared pages and reported it's
> existence. But after time pass they (processes) screamed that it is now
> lost. Only part of data written with pwrite() was there. Everything that
> was written via mmap() is zero.
> >
> > So as I said it occurs on hi I/O busyness. When in background 4+
> processes do indexing of huge ammount of data. Also I want to note, it
> never occurred in the life of our project  while we used mmap() under
> same I/O stress conditions when mapping was done for a whole file of just
> a part(header) starting from a beginning of a file. First time we used
> mapping of individual pages, just to save RAM, and this popped up.
> >
> > Solution for this problem is msync() before any munmap(). But man says:
> >
> >
> 
> The msync() system call is usually not needed since BSD implements a
> coherent file system buffer cache.  However, it may be used to associate
> dirty VM pages with file system buffers and thus cause them to be flushed
> to physical media sooner rather than later.
> > 
> > Any thoughts? Thanks.
> > 
> > 
> 
> So I tracked issue to the place where it occurs. When I commit data to
> file using mmap() and pwrite() side by side, sometimes 'newest data' is
> being overwritten by 'elder data'. From time to time 'elder data' can be
> something written with mmap() either with pwrite(). It never happens when
> I use exclusively mmap() either pwrite(). Also this issue reproduces on
> UFS as well. I think there is a problem keeping mmapep pages and FS cache
> synced.
I am curious how do you label data with newer and elder labels.

I do admit a possibility of a race in ZFS double-copy implementation of
the mmap/cache coherency, but somewhat skeptical about the same possibility
for UFS. What you saying might indicate that we loose modified/dirty bits
for the page, but that would have much more firework then just eventual
race with write.

What version of the system ? Does the machine swap ?

> 
> I will try to make test to reliably reproduces issue.
Yes, isolated test case is the best route forward. It would either show
a bug or demonstrate a misunderstanding on your part.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20120704/47c40000/attachment.pgp


More information about the freebsd-fs mailing list