[Bug 246886] [sendfile] Nginx + NFS or FUSE causes VM stall

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 17 Jun 2022 16:52:41 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246886

--- Comment #70 from firk@cantconnect.ru ---
So, the source of the problem seems was base r337165 . I didn't tested
revisions around, but I did tested rollback of this specific comit, setting
f_iosize back to PAGE_SIZE=4096 and the problem is gone. But this is not the
bug itself, it is just a trigger for another problem(s).

As already noted, backtrace for the deadlock is:

 vm_page_grab_pages+0x3f2
 allocbuf+0x371 (vfs_vmio_extend inlined inside)
 getblkx+0x5be
 breadn_flags+0x3d
 vfs_bio_getpages+0x403
 fuse_vnop_getpages+0x46
 VOP_GETPAGES_APV+0x7b
 vop_stdgetpages_async+0x49
 VOP_GETPAGES_ASYNC_APV+0x7b
 vnode_pager_getpages_async+0x7d
 vn_sendfile+0xdf2 (sendfile_swapin inlined inside)
 sendfile+0x12b
 amd64_syscall+0x387
 fast_syscall_common+0xf8

What happens:

1) sendfile_swapin() grabs and exclusively-busies bunch of pages via
vm_page_grab_pages();

2) it then scans them sequentially, unbusies already loaded ones, and calls
vm_pages_get_pages_async() for not yet loaded ones, which should load them and
call sendfile_iodone() callback, and that how it was in 11.x;

3) vm_pages_get_pages_async() calls some other nested functions, and now we are
in vfs_bio_getpages(). Note: despite the "async" name all this done
synchronously;

4) vfs_bio_getpages() still have vm_page[] array and its size as arguments,
passed unchanged straightly from sendfile_swapin(); it downgrades
exclusive-busy state to shared-busy for the given pages range;

5) the next step (bread_gb -> breadn_flags) is done using block index and size
obtained from fusefs driver via get_lblkno() and get_blksize() callbacks, and
the new block size is 65536 by default. And, going through getblkx() ->
allocbuf() -> vfs_vmio_extend(), the last one calls vm_page_grab_pages() again,
but the range is not the requested one, but the one matches fusefs block size,
effectively aligned to 16-block boundary (65536 = 16*4096). This leads to
deadlock because the pages after currently requested are still exclusively-busy
(see p.2)

What could be fixed:

1) easiest: rollback iosize to PAGE_SIZE, but this will reduce i/o speed back

2) rework sendfile_swapin() to first scan entire range for being loaded or not
and only then calling queued vm_pager_get_pages_async(); don't think it is good
because everythink already works when fusefs/nfs not used.

3) make "async" functions really async (see p.3) for fusefs; i don't know if it
easy or not - this will resolve deadlock too because vfs_bio_getpages() will
not block the sequential scan of requested pages by sendfile_swapin()

4) prevent partially loaded filesystem f_iosize blocks from happening; again, I
don't know is it easy or even desirable or not.

PS: I don't know how all this works or not in 13.x

-- 
You are receiving this mail because:
You are the assignee for the bug.