From nobody Thu Jun 06 03:43:42 2024 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Vvqtk3BF7z5MvsT; Thu, 06 Jun 2024 03:43:42 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Vvqtk20bkz4hq3; Thu, 6 Jun 2024 03:43:42 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1717645422; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=6mIyLcUwazqOAbqk6Q5K/4ujyDrAO1FVC2cHTeonxQc=; b=b/PTBwe50Z32sibVoEXnDpnKbdv0BqDp29KPZHHLFeLAeenUnKhCttxLvFpm4oF+smmKtf YXjPIMMIPLZHhEautVF2tcgBlRQqdJCVuINDLfm/t8r7tomUjbCgpzPj6OGha36VVQaRGJ SdUtecC8fx9x/7DCa1EvBH5vQ8Kzk/dhWXG/XzlEscUsPS/tkJAeCKElPl/6V17cMnXFfe 1D5/7FTyB29wbbOhNob+MYj/XzA+66TieQIYG18EoIxQk9ozVijdMiutjNKXn0+xDI+h2r HihADjndYy496IdUfYkAqCutJDWnZwyZOM2hAaIQDJBLLzoXi87EdTVE5MmB6g== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1717645422; a=rsa-sha256; cv=none; b=K8Iw0UFAL2MqJ63yKvnKjhOudM7PSQ3f35JYjcO+SmyjXKl2R9eP1L512SQ1kI/K9y6quI 9L3uwCrzn+bypwWk7kcx/m+750P+EhFdbiKn21WLZMHs3r+JF0PxqlkTsAlC7nJfh39Pxw Y56rjtRqg2JgKGXjHPhfIweFS+26d+CcbrmaHbVU15ykVjTPJ44QfIhtOwtHS4dTWRl9yO TD8wP5ZpyV91E07DWaVGVVOGupw/gHvJ8Y05z3p0sX8U6j8W2SbDqmYlXS/hGFKtjsou2x 1da1RcFl+lL24kYJGO8XvDFnJrMnfFNXsxxQqlT/+I3XduGXx65bDpTM1rGAsg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1717645422; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=6mIyLcUwazqOAbqk6Q5K/4ujyDrAO1FVC2cHTeonxQc=; b=rArk9xXsshF9ThYjMHf0ClCcUAK+R/3CnPshvPV3t/Uib9G5CXmQNesMBIeVaAPRjJcV6K hiKHZLtcIg1t0GR14VA10TpvtNz1vQgiKiuLNgelCDH+ucztkFr/rPFMJ1QvF9YtHCC0Gk slCnZMYVCm3Ml0mZ4fdqCftFdtmCT91nryok05EDFzM/owMm/T7w9n/BMW9nzb9DkmDO0N IKFLP5goBJW1/ZQvxO0JHI6zz8BWZ0T8IKZ79aTFFBil9ZkMLhqOka845Hz96qexh5j2aH Dd0YEbuxWGDFquMFYLe6ElvWlnWPAJAroSva2ncX8JnQcRr390ccLP77CCwMXA== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Vvqtk1c1DzV5D; Thu, 6 Jun 2024 03:43:42 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 4563hg1Q089365; Thu, 6 Jun 2024 03:43:42 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 4563hg37089362; Thu, 6 Jun 2024 03:43:42 GMT (envelope-from git) Date: Thu, 6 Jun 2024 03:43:42 GMT Message-Id: <202406060343.4563hg37089362@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Ryan Libby Subject: git: 780666c09bf9 - main - getblk: reduce time under bufobj lock List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-all@freebsd.org Sender: owner-dev-commits-src-all@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: rlibby X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 780666c09bf97d6e92b995b831c511b9ee37872c Auto-Submitted: auto-generated The branch main has been updated by rlibby: URL: https://cgit.FreeBSD.org/src/commit/?id=780666c09bf97d6e92b995b831c511b9ee37872c commit 780666c09bf97d6e92b995b831c511b9ee37872c Author: Ryan Libby AuthorDate: 2024-06-06 03:21:22 +0000 Commit: Ryan Libby CommitDate: 2024-06-06 03:21:22 +0000 getblk: reduce time under bufobj lock Use the new pctrie combined insert/lookup facility to reduce work and time under the bufobj interlock when associating a buf with a vnode. We now do one lookup in the dirty tree and one combined lookup/insert in the clean tree instead of one lookup in dirty, two in clean, and then an insert in clean. We also avoid touching the possibly unrelated buf at the tail of the queue. Also correct an issue where the actual order of the tail queue depended on the insertion order due to sign issues. Reviewed by: kib (previous version), dougm, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D45395 --- sys/kern/vfs_bio.c | 38 ++++++++---------- sys/kern/vfs_subr.c | 111 +++++++++++++++++++++++++++++++++++++++------------- sys/sys/buf.h | 2 +- 3 files changed, 100 insertions(+), 51 deletions(-) diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c index 4d5e3a014050..4f1df9711dec 100644 --- a/sys/kern/vfs_bio.c +++ b/sys/kern/vfs_bio.c @@ -4231,35 +4231,29 @@ newbuf_unlocked: } /* - * This code is used to make sure that a buffer is not - * created while the getnewbuf routine is blocked. - * This can be a problem whether the vnode is locked or not. - * If the buffer is created out from under us, we have to - * throw away the one we just created. * - * Note: this must occur before we associate the buffer - * with the vp especially considering limitations in - * the splay tree implementation when dealing with duplicate - * lblkno's. - */ - BO_LOCK(bo); - if (gbincore(bo, blkno)) { - BO_UNLOCK(bo); - bp->b_flags |= B_INVAL; - bufspace_release(bufdomain(bp), maxsize); - brelse(bp); - goto loop; - } - - /* * Insert the buffer into the hash, so that it can * be found by incore. + * + * We don't hold the bufobj interlock while allocating the new + * buffer. Consequently, we can race on buffer creation. This + * can be a problem whether the vnode is locked or not. If the + * buffer is created out from under us, we have to throw away + * the one we just created. */ bp->b_lblkno = blkno; bp->b_blkno = d_blkno; bp->b_offset = offset; - bgetvp(vp, bp); - BO_UNLOCK(bo); + error = bgetvp(vp, bp); + if (error != 0) { + KASSERT(error == EEXIST, + ("getblk: unexpected error %d from bgetvp", + error)); + bp->b_flags |= B_INVAL; + bufspace_release(bufdomain(bp), maxsize); + brelse(bp); + goto loop; + } /* * set B_VMIO bit. allocbuf() the buffer bigger. Since the diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 8f0b00a87cb5..398eda7ed897 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -2697,12 +2697,12 @@ buf_vlist_remove(struct buf *bp) } /* - * Add the buffer to the sorted clean or dirty block list. - * - * NOTE: xflags is passed as a constant, optimizing this inline function! + * Add the buffer to the sorted clean or dirty block list. Return zero on + * success, EEXIST if a buffer with this identity already exists, or another + * error on allocation failure. */ -static void -buf_vlist_add(struct buf *bp, struct bufobj *bo, b_xflags_t xflags) +static inline int +buf_vlist_find_or_add(struct buf *bp, struct bufobj *bo, b_xflags_t xflags) { struct bufv *bv; struct buf *n; @@ -2713,30 +2713,69 @@ buf_vlist_add(struct buf *bp, struct bufobj *bo, b_xflags_t xflags) ("buf_vlist_add: bo %p does not allow bufs", bo)); KASSERT((xflags & BX_VNDIRTY) == 0 || (bo->bo_flag & BO_DEAD) == 0, ("dead bo %p", bo)); - KASSERT((bp->b_xflags & (BX_VNDIRTY|BX_VNCLEAN)) == 0, - ("buf_vlist_add: Buf %p has existing xflags %d", bp, bp->b_xflags)); - bp->b_xflags |= xflags; + KASSERT((bp->b_xflags & (BX_VNDIRTY | BX_VNCLEAN)) == xflags, + ("buf_vlist_add: b_xflags %#x not set on bp %p", xflags, bp)); + if (xflags & BX_VNDIRTY) bv = &bo->bo_dirty; else bv = &bo->bo_clean; - /* - * Keep the list ordered. Optimize empty list insertion. Assume - * we tend to grow at the tail so lookup_le should usually be cheaper - * than _ge. - */ - if (bv->bv_cnt == 0 || - bp->b_lblkno > TAILQ_LAST(&bv->bv_hd, buflists)->b_lblkno) - TAILQ_INSERT_TAIL(&bv->bv_hd, bp, b_bobufs); - else if ((n = BUF_PCTRIE_LOOKUP_LE(&bv->bv_root, bp->b_lblkno)) == NULL) + error = BUF_PCTRIE_INSERT_LOOKUP_LE(&bv->bv_root, bp, &n); + if (n == NULL) { + KASSERT(error != EEXIST, + ("buf_vlist_add: EEXIST but no existing buf found: bp %p", + bp)); + } else { + KASSERT((uint64_t)n->b_lblkno <= (uint64_t)bp->b_lblkno, + ("buf_vlist_add: out of order insert/lookup: bp %p n %p", + bp, n)); + KASSERT((n->b_lblkno == bp->b_lblkno) == (error == EEXIST), + ("buf_vlist_add: inconsistent result for existing buf: " + "error %d bp %p n %p", error, bp, n)); + } + if (error != 0) + return (error); + + /* Keep the list ordered. */ + if (n == NULL) { + KASSERT(TAILQ_EMPTY(&bv->bv_hd) || + (uint64_t)bp->b_lblkno < + (uint64_t)TAILQ_FIRST(&bv->bv_hd)->b_lblkno, + ("buf_vlist_add: queue order: " + "%p should be before first %p", + bp, TAILQ_FIRST(&bv->bv_hd))); TAILQ_INSERT_HEAD(&bv->bv_hd, bp, b_bobufs); - else + } else { + KASSERT(TAILQ_NEXT(n, b_bobufs) == NULL || + (uint64_t)bp->b_lblkno < + (uint64_t)TAILQ_NEXT(n, b_bobufs)->b_lblkno, + ("buf_vlist_add: queue order: " + "%p should be before next %p", + bp, TAILQ_NEXT(n, b_bobufs))); TAILQ_INSERT_AFTER(&bv->bv_hd, n, bp, b_bobufs); - error = BUF_PCTRIE_INSERT(&bv->bv_root, bp); - if (error) - panic("buf_vlist_add: Preallocated nodes insufficient."); + } + bv->bv_cnt++; + return (0); +} + +/* + * Add the buffer to the sorted clean or dirty block list. + * + * NOTE: xflags is passed as a constant, optimizing this inline function! + */ +static void +buf_vlist_add(struct buf *bp, struct bufobj *bo, b_xflags_t xflags) +{ + int error; + + KASSERT((bp->b_xflags & (BX_VNDIRTY | BX_VNCLEAN)) == 0, + ("buf_vlist_add: Buf %p has existing xflags %d", bp, bp->b_xflags)); + bp->b_xflags |= xflags; + error = buf_vlist_find_or_add(bp, bo, xflags); + if (error) + panic("buf_vlist_add: error=%d", error); } /* @@ -2775,26 +2814,42 @@ gbincore_unlocked(struct bufobj *bo, daddr_t lblkno) /* * Associate a buffer with a vnode. */ -void +int bgetvp(struct vnode *vp, struct buf *bp) { struct bufobj *bo; + int error; bo = &vp->v_bufobj; - ASSERT_BO_WLOCKED(bo); + ASSERT_BO_UNLOCKED(bo); VNASSERT(bp->b_vp == NULL, bp->b_vp, ("bgetvp: not free")); CTR3(KTR_BUF, "bgetvp(%p) vp %p flags %X", bp, vp, bp->b_flags); VNASSERT((bp->b_xflags & (BX_VNDIRTY|BX_VNCLEAN)) == 0, vp, ("bgetvp: bp already attached! %p", bp)); - vhold(vp); - bp->b_vp = vp; - bp->b_bufobj = bo; /* - * Insert onto list for new vnode. + * Add the buf to the vnode's clean list unless we lost a race and find + * an existing buf in either dirty or clean. */ - buf_vlist_add(bp, bo, BX_VNCLEAN); + bp->b_vp = vp; + bp->b_bufobj = bo; + bp->b_xflags |= BX_VNCLEAN; + error = EEXIST; + BO_LOCK(bo); + if (BUF_PCTRIE_LOOKUP(&bo->bo_dirty.bv_root, bp->b_lblkno) == NULL) + error = buf_vlist_find_or_add(bp, bo, BX_VNCLEAN); + BO_UNLOCK(bo); + if (__predict_true(error == 0)) { + vhold(vp); + return (0); + } + if (error != EEXIST) + panic("bgetvp: buf_vlist_add error: %d", error); + bp->b_vp = NULL; + bp->b_bufobj = NULL; + bp->b_xflags &= ~BX_VNCLEAN; + return (error); } /* diff --git a/sys/sys/buf.h b/sys/sys/buf.h index cee9547912a6..832cfaa617a5 100644 --- a/sys/sys/buf.h +++ b/sys/sys/buf.h @@ -603,7 +603,7 @@ void vfs_unbusy_pages(struct buf *); int vmapbuf(struct buf *, void *, size_t, int); void vunmapbuf(struct buf *); void brelvp(struct buf *); -void bgetvp(struct vnode *, struct buf *); +int bgetvp(struct vnode *, struct buf *) __result_use_check; void pbgetbo(struct bufobj *bo, struct buf *bp); void pbgetvp(struct vnode *, struct buf *); void pbrelbo(struct buf *);