Prefaulting for i/o buffers

Konstantin Belousov kostikbel at gmail.com
Mon May 21 10:45:28 UTC 2012


On Fri, Feb 03, 2012 at 09:37:19PM +0200, Konstantin Belousov wrote:
> FreeBSD I/O infrastructure has well known issue with deadlock caused
> by vnode lock order reversal when buffers supplied to read(2) or
> write(2) syscalls are backed by mmaped file.
> 
> I previously published the patches to convert i/o path to use VMIO,
> based on the Jeff Roberson proposal, see
> http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the
> deadlock. Since that work is very intrusive and did not got any
> follow-up, it get stalled.
> 
> Below is very lightweight patch which only goal is to fix deadlock in
> the least intrusive way. This is possible after FreeBSD got the
> vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs.
> http://people.freebsd.org/~kib/misc/vm1.3.patch
> 
> Theory of operation is described in the patched sys/kern/vfs_vnops.c,
> see preamble comment for vn_io_fault(). The patch borrows the
> rangelocks implementation from VM6, which was discussed and improved
> together with Attilio Rao.
> 
> I was not able to reproduce the deadlock in the targeted test running
> for several hours, while stock HEAD deadlocks in the first iteration.
> 
> Below is the benchmark for the worst-case situation for the patched
> system, reading 1 byte from a file in a loop. The value is the time in
> seconds to execute read(2) for single byte and lseek back to the start
> of the file. The loop is executed 100,000,000 times. Machine has
> 3.4Ghz Core i7 2600K and used HEAD at 230866 with debugging options
> turned off.
> 
> As you see, the rangelock overhead for the worst (but uncontented)
> case is less then 10%.
> 
> x stock-1-byte.txt
> + vm1-1-byte.txt
> +--------------------------------------------------------------------------+
> |xx                                                                      ++|
> |xxx                                                                    +++|
> ||A                                                                     |A||
> +--------------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x   5  1.063206e-06  1.065569e-06  1.064172e-06  1.064109e-06 9.8031959e-10
> +   5  1.167145e-06  1.170244e-06  1.168939e-06 1.1690444e-06 1.2477022e-09
> Difference at 95.0% confidence
> 	1.04935e-07 +/- 1.63638e-09
> 	9.86134% +/- 0.153779%
> 	(Student's t, pooled s = 1.122e-09)
> 

I am reviving the thread.

Since the original publication of the patch, it got quite intensive reviews
and testing from several people, which I appreciate very much. The tagline
for the commit would include
Reviewed by:	attilio, mdf, pjd, rmacklem (nfs client bits)
Tested by:	pho, flo, Gustau P?rez <gperez entel upc edu>

The latest version of the patch is at
http://people.freebsd.org/~kib/misc/vm1.13.patch

The main change comparing with the previous publically discussed version
is the handling of the user buffers after vm_fault_quick_hold_pages(). I
did uiomove() over the region in the previous patch, but apparently
VM does not guarantee that corresponding pte entries are not removed,
or writeable access is kept. So new version of the patch uses
uiomove_fromphys() to avoid touching the usermode buffer, and operates
on the hold pages. I shall note that the issue was never observed in
real life.

This requires trivial modifications of the filesystem code, namely
the replacement of uiomove() with new helper function vn_io_fault_uiomove()
which handles the details for hold pages access transparently for the
filesystem.

Again, comments and testers are welcomed. I consider the patch ready
to be committed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20120521/a8fcf6c9/attachment.pgp


More information about the freebsd-arch mailing list