Prefaulting for i/o buffers

Konstantin Belousov kostikbel at gmail.com
Sun Feb 5 22:13:05 UTC 2012


On Sun, Feb 05, 2012 at 08:20:45PM +0100, Pawel Jakub Dawidek wrote:
> On Sun, Feb 05, 2012 at 06:47:29PM +0200, Konstantin Belousov wrote:
> > On Sun, Feb 05, 2012 at 12:43:46PM +0100, Pawel Jakub Dawidek wrote:
> > > @@ -199,6 +200,7 @@ thread_init(void *mem, int size, int flags)
> > >  
> > >  	td->td_sleepqueue = sleepq_alloc();
> > >  	td->td_turnstile = turnstile_alloc();
> > > +	td->td_rlqe = rlqentry_alloc();
> > > 
> > > What is the purpose of td_rlqe field? From what I see thread can lock
> > > more than one range. Could you elaborate on that as well?
> > td_rlqe is a cached rangelock entry used for the typical case of the
> > thread not doing recursive i/o. In this case, it is possible to avoid
> > memory allocation on the i/o hot path by using entry allocated during
> > thread creation. Since thread creation already allocates many chunks
> > of memory, and typical thread performs much more i/o then it suffers
> > creation, this shorten the i/o calculation path.
> 
> I see. I'd be in favour of dropping entry allocation on thread creation.
> This would make the allocation lazy and it will be done on the first I/O
> only. What do you think? Thread creation should be fast, so if we can
> avoid adding up, I would go that route.
I think this is negligible change in the speed of much rare operation
(thread creation) comparing with a hit on the first time i/o, but I just
did it in a way you suggested.

> 
> > > This is a bit hard to understand immediately. Maybe something like the
> > > following is a bit more readable (I assume this goto could happen only
> > > once):
> > > 
> > > 		len = MIN(uio->uio_iov->iov_len, io_hold_cnt * PAGE_SIZE);
> > > 		addr = (vm_offset_t)uio->uio_iov->iov_base;
> > > 		end = round_page(addr + len);
> > > 		if (howmany(end - trunc_page(addr), PAGE_SIZE) > io_hold_cnt) {
> > > 			len -= PAGE_SIZE;
> > > 			end = round_page(addr + len);
> > > 		}
> > I think it can happen twice, if both start and len are perfectly mis-aligned.
> 
> I see. That make sense. It would be nice to add comment there, as it
> wasn't immediately obvious (at least for me) what's going on.
> 
> Another way to simplify this piece of code would be to either make ma[]
> array of 'io_hold_cnt + 2' elements or set 'len' to '(io_hold_cnt - 2) *
> PAGE_SIZE'.  Short comment describing why we add or subtract 2 would be
> of course welcome.
I am not sure that +2 would be improvement in readability, but indeed it
removes some code. I implemented this and added a comment.

> 
> Just curious, why io_hold_cnt is 'static const int' instead of define?
I want to change it into a sysctl or tunable one day.

http://people.freebsd.org/~kib/misc/vm1.9.patch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20120205/4c021d5e/attachment.pgp


More information about the freebsd-arch mailing list