Process stuck in "vnread"

Mon Mar 28 17:19:54 UTC 2016

On 28/03/2016 19:23, Konstantin Belousov wrote:
> On Mon, Mar 28, 2016 at 08:52:03AM -0700, Maxim Sobolev wrote:
>> Done some head scratching, it looks like it's got page fault in the
>> copyin() (cp(1) AFAIK mmaps source file). There might be some interlock
>> issue between competing write to the same ZFS, the md0 device is locked
>> forever waiting for the write operation to complete at the very same time.
>> I am curious as to whether we are allowed to sleep in the dmu_write_uio_dbuf(),
>> AFAIK dmu is ZFS's transaction layer, so maybe copyin() should be done
>> earlier to avoid possible page fault in there?

Maxim,

is this copy from UFS to ZFS?
It looks like that because the copyin() fault goes to
vnode_pager_generic_getpages() -> bwait()...

> No idea about ZFS, but if the issue is due to copyin(9) recursing into
> VM and then VFS while owning file system locks, it is well-known and
> long-standing issue. I sometimes call it 'ups deadlock', for some
> reasons, see tools/test/upsdl/ for the distilled test case.
> 
> It is handled for UFS and NFS, read the long comment starting with 'The
> vn_io_fault() is a wrapper' in sys/kern/vfs_vnops.c, which describes the
> deadlock in details and explains the mechanism which is used to prevent
> it. Filesystems must opt-in into it by specifiying MNTK_NO_IOPF flag,
> and then being ready to get an array of pages for io instead of the buffer
> KVA.

I don't have any idea why the thread would be stuck in bwait() and what locks
and threads are involved here.  But, as Kostik said, there is a general problem
and I have a patch for ZFS:
https://reviews.freebsd.org/D2790

-- 
Andriy Gapon