Re: git: 5b353925ff61 - main - vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault()
- Reply: Konstantin Belousov : "Re: git: 5b353925ff61 - main - vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault()"
- In reply to: Konstantin Belousov : "git: 5b353925ff61 - main - vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault()"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 28 Jul 2023 00:17:51 UTC
On 7/25/23, Konstantin Belousov <kib@freebsd.org> wrote:
> The branch main has been updated by kib:
>
> URL:
> https://cgit.FreeBSD.org/src/commit/?id=5b353925ff61b9ddb97bb453ba75278b578ed7d9
>
> commit 5b353925ff61b9ddb97bb453ba75278b578ed7d9
> Author: Konstantin Belousov <kib@FreeBSD.org>
> AuthorDate: 2023-07-23 15:55:50 +0000
> Commit: Konstantin Belousov <kib@FreeBSD.org>
> CommitDate: 2023-07-24 22:02:59 +0000
>
> vnode read(2)/write(2): acquire rangelock regardless of
> do_vn_io_fault()
>
> To ensure atomicity of reads against parallel writes and truncates,
> vnode lock was not enough at least since introduction of vn_io_fault().
> That code only take rangelock when it was possible that vn_read() and
> vn_write() could drop the vnode lock.
>
> At least since the introduction of VOP_READ_PGCACHE() which generally
> does not lock the vnode at all, rangelocks become required even
> for filesystems that do not need vn_io_fault() workaround. For
> instance, tmpfs.
>
Is there a bug with pgcache reads disabled (as in when the vnode lock
is held for reads?)
Note this patch adds 2 lock trips which were previously not present,
which has to slow things down single-threaded, but I did not bother
measuring that part.
As this adds to vnode-wide *lock* acquires this has to very negatively
affect scalability.
This time around I ran: ./readseek3_processes -t 10 (10 workers
reading from *disjoint* offsets from the same vnode. this in principle
can scale perfectly)
I observed a 90% drop in performance:
before: total:25723459 ops/s
after: total:2455794 ops/s
Going to an unpatched kernel and disabling pgcache reads instead:
disabled: total:6522480 ops/s
or about 2.6x of performance of the current kernel
In other words I think the thing to do here is to revert the patch and
instead flip pgcache reads to off by default until a better fix can be
implemented.
> PR: 272678
> Analyzed and reviewed by: Andrew Gierth
> <andrew@tao11.riddles.org.uk>
> Sponsored by: The FreeBSD Foundation
> MFC after: 1 week
> Differential revision: https://reviews.freebsd.org/D41158
> ---
> sys/kern/vfs_vnops.c | 21 ++++++++++++++-------
> 1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> index 83e95731d7c4..306840ff0357 100644
> --- a/sys/kern/vfs_vnops.c
> +++ b/sys/kern/vfs_vnops.c
> @@ -1443,6 +1443,7 @@ vn_io_fault(struct file *fp, struct uio *uio, struct
> ucred *active_cred,
> void *rl_cookie;
> struct vn_io_fault_args args;
> int error;
> + bool rl_locked;
>
> doio = uio->uio_rw == UIO_READ ? vn_read : vn_write;
> vp = fp->f_vnode;
> @@ -1465,12 +1466,7 @@ vn_io_fault(struct file *fp, struct uio *uio, struct
> ucred *active_cred,
> }
>
> foffset_lock_uio(fp, uio, flags);
> - if (do_vn_io_fault(vp, uio)) {
> - args.kind = VN_IO_FAULT_FOP;
> - args.args.fop_args.fp = fp;
> - args.args.fop_args.doio = doio;
> - args.cred = active_cred;
> - args.flags = flags | FOF_OFFSET;
> + if (vp->v_type == VREG) {
> if (uio->uio_rw == UIO_READ) {
> rl_cookie = vn_rangelock_rlock(vp, uio->uio_offset,
> uio->uio_offset + uio->uio_resid);
> @@ -1482,11 +1478,22 @@ vn_io_fault(struct file *fp, struct uio *uio, struct
> ucred *active_cred,
> rl_cookie = vn_rangelock_wlock(vp, uio->uio_offset,
> uio->uio_offset + uio->uio_resid);
> }
> + rl_locked = true;
> + } else {
> + rl_locked = false;
> + }
> + if (do_vn_io_fault(vp, uio)) {
> + args.kind = VN_IO_FAULT_FOP;
> + args.args.fop_args.fp = fp;
> + args.args.fop_args.doio = doio;
> + args.cred = active_cred;
> + args.flags = flags | FOF_OFFSET;
> error = vn_io_fault1(vp, uio, &args, td);
> - vn_rangelock_unlock(vp, rl_cookie);
> } else {
> error = doio(fp, uio, active_cred, flags | FOF_OFFSET, td);
> }
> + if (rl_locked)
> + vn_rangelock_unlock(vp, rl_cookie);
> foffset_unlock_uio(fp, uio, flags);
> return (error);
> }
>
--
Mateusz Guzik <mjguzik gmail.com>