svn commit: r308026 - in head/sys: kern sys ufs/ffs

Gleb Smirnoff glebius at FreeBSD.org
Wed Nov 9 23:31:11 UTC 2016


  Konstantin,

On Tue, Nov 01, 2016 at 02:53:10PM +0200, Konstantin Belousov wrote:
K> > K> +static int buf_pager_relbuf;
K> > K> +SYSCTL_INT(_vfs, OID_AUTO, buf_pager_relbuf, CTLFLAG_RWTUN,
K> > K> +    &buf_pager_relbuf, 0,
K> > K> +    "Make buffer pager release buffers after reading");
K> > K> +
K> > K> +/*
K> > K> + * The buffer pager.  It uses buffer reads to validate pages.
K> > K> + *
K> > K> + * In contrast to the generic local pager from vm/vnode_pager.c, this
K> > K> + * pager correctly and easily handles volumes where the underlying
K> > K> + * device block size is greater than the machine page size.  The
K> > K> + * buffer cache transparently extends the requested page run to be
K> > K> + * aligned at the block boundary, and does the necessary bogus page
K> > K> + * replacements in the addends to avoid obliterating already valid
K> > K> + * pages.
K> > K> + *
K> > K> + * The only non-trivial issue is that the exclusive busy state for
K> > K> + * pages, which is assumed by the vm_pager_getpages() interface, is
K> > K> + * incompatible with the VMIO buffer cache's desire to share-busy the
K> > K> + * pages.  This function performs a trivial downgrade of the pages'
K> > K> + * state before reading buffers, and a less trivial upgrade from the
K> > K> + * shared-busy to excl-busy state after the read.
K> > 
K> > IMHO, should be noted that the pager ignores requested rbehind and rahead
K> > values, and does the rbehind and rahead sizes that he prefers.
K> Pager interface considers the ahead/behind pages' page-in as unsignificant,
K> in particular because the pages can be recycled or invalidated during the
K> pager operation, when pager drops the object lock.
K> 
K> More important, this pager de-facto uses the optimal filesystem-depended
K> aligned io size due to its structure, comparing with the bmap pager.
K> For this reason, I consider additional attempts to follow optional
K> upper-level hints not very useful.  Measurements show no difference in
K> the real workload times, and marginal improvements for microbenchmarks
K> (5% scale).

The buildworld isn't the only true workload. If we do readbehind or readahead
we allocate pages for that, which means that some other pages need to be
purged.

There are cases, when the pager has absolutely no idea about what is optimal.
So, not following hints from the upper layers is a bug.

Note, that I don't ask you to fix it. I'm just asking to document that behaviour.

K> I might do something more aggressive when upper-level specified rahead is
K> (significantly) above the natural block size limit, like using breadn()
K> instead of bread().  Practice suggests that this would not help or even
K> be a pessimisation due to higher buf cache trashing.

-- 
Totus tuus, Glebius.


More information about the svn-src-all mailing list