more strict KPI for vm_pager_get_pages()
Julian Elischer
julian at freebsd.org
Mon May 4 09:50:51 UTC 2015
On 5/4/15 4:24 PM, Konstantin Belousov wrote:
> On Thu, Apr 30, 2015 at 05:24:08PM +0300, Gleb Smirnoff wrote:
>> Hi!
>>
>> The reason to write down this patch emerges from the
>> projects/sendfile branch, where vm_pager_get_pages() is
>> used in the sendfile(2) system call. Although the new
>> sendfile works flawlessly, it makes some assumptions
>> about vnode_pager that theoretically may not be valid,
>> however always hold in our current code.
>>
>> Going deeper in the problem I have found more important
>> points, which yielded in the suggested patch. To start,
>> let me display the current KPI assumptions:
>>
>> 1) vm_pager_get_pages() works on an array of consequtive
>> array of pages. Pindex of (n+1)-th pages must be pindex
>> of n-th + 1. One page is special, it is called reqpage.
>> 2) vm_pager_get_pages() guarantees to swapin only the reqpage,
>> and may skip or fail other pages for different reasons, that
>> may vary from pager to pager.
>> 3) There also is function vm_pager_has_page(), which reports
>> availability of a page at given index in the pager, and also
>> provides hints on how many consequtive pages before this one
>> and after this one can be swapped in in single pager request.
>> Most pagers return zeros in these hints. The vnode pager for
>> UFS returns a strong promise, that one can later utilize in
>> vm_pager_get_pages().
>> 4) All pages must be busied on enter. On exit only reqpage
>> will be left busied. The KPI doesn't guarantee that rest
>> of the pages is still in place. The pager usually calls
>> vm_page_readahead_finish() on them, which can either free,
>> or put the page on active/inactive queue, using quite
>> a strange approach to choose a queue.
>> 5) The pages must not be wired, since vm_page_free() may be
>> called on them. However, this is violated by several
>> consumers of KPI, relying on lack of errors in the pager.
>> Moreover, the swap pager has a special function to skip
>> wired pages, while doing the sweep, to avoid this problem.
>> So, passing wired pages to swapper is OK, while to the
>> reset is not.
>> 6) Pagers may replace a page in the object with a new one.
>> The sg_pager actually does that. To protect from this
>> event, consumers of vm_pager_get_pages() always run
>> vm_page_lookup() over the array of pages to relookup the pages.
>> However, not all consumers do this.
>>
>> Speaking of pagers and their consumers:
>> - 11 consumers request array of size 1, a single page
>> - 3 consumers actually request array
>>
>> My suggestion is to change the KPI assumptions to the following:
>>
>> 1) There is no reqpage. All pages are entered busied, all pages
>> are returned busied and validated. If pager fails to validate
>> all pages it must return error.
>> 2) The consumer (not the pager!) is to decide what to do with the
>> pages: vm_page_active, vm_page_deactivate, vm_page_flash or just
>> vm_page_free them. The consumer also unbusies pages, if it
>> wants to. The consumer is free to wire pages before the call.
>> 3) Consumers must first query the pager via vm_pager_has_page(),
>> and use the after/before hints to limit the size of the
>> requested pages array.
>> 4) In case if pager replaces pages, it must also update the array,
>> so that consumer doesn't need to do relookup.
>>
>> Doing this sweep, I also noticed that all pagers have a copy-pasted
>> code of zeroing invalid regions of partially valid pages. Also,
>> many pagers got a set of assertions copy and pasted from each
>> other. So, I decided to un-inline the vm_pager_get_pages(), bring
>> it to the vm_pager.c file and gather all these copy-pastes
>> into one place.
>>
>> The suggested patch is attached. As expected, it simplifies and
>> removes quite a lot of code.
>>
>> Right now it is tested on UFS only, testing NFS and ZFS is on my list.
>> There is one panic known, but it seems unrelated, and Peter pho@ says
>> that once it has been seen before.
> Below is the summary of my part of the internal discussion about the changes.
>
> Traditionally, Unix allows the filesystems to perform the short reads.
> Most fundamental change in the patch removes this freedom from the
> filesystem implementation, and I think that only local filesystems could
> be compliant with the proposed strictness.
>
> IMO, the response from vm_pager_haspages() is only advisory, since
> filesystem might not control the external entities which are the source
> of the required data.
Also since the backing object is not locked, a truncate() may be performed
between the operations making the prior return information invalid.
Certainly in remote filesystems, possibly on local ones too.
>
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>
More information about the freebsd-arch
mailing list