From nobody Wed Apr 06 00:24:40 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 1EBBC1A8499B for ; Wed, 6 Apr 2022 00:24:53 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com [IPv6:2607:f8b0:4864:20::932]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KY4yq6qn8z3n1L for ; Wed, 6 Apr 2022 00:24:51 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua1-x932.google.com with SMTP id s28so908024uac.1 for ; Tue, 05 Apr 2022 17:24:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5eFQNF7C1F+ZHGN6o0iV/D7hdMUybR06JJpZf5ZlYDY=; b=DZE57LDOB7SbhYJzmPrpSa93Rp/QoDYT8QsBpmlQaAaLyDPruH3mC9Rcmzw9bTCrt6 4TP24K2ZGgt9ekKJ2uRQZdfuPTeEw0SUjxwJ0KPX6oRH0TR05g8peVsJwi8xrIvNx3DY uyzLDtLv1J9uVinRQaxRZ1+ubU43E2cX+dqUtJGitJaXp7AT5JQYqTspXkMJduxr35st BnIoUb+i1nrY7jOtkClszPEFNvL9t/g/dXSg/BkUNG797pNO4tzGxHsy2RTiBalGLTKa 8WiNkNRnDP0AHCRCANA6FhXeId8Rqwpun4/RFCGyKNFlp5Dou5WivDPeN8+vSIC8C8X+ LuZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5eFQNF7C1F+ZHGN6o0iV/D7hdMUybR06JJpZf5ZlYDY=; b=qVaLMlyKaUme+6dJr8QoTto1dr4hJTYfHJjUUnngiLKVhW4S1cisJ9Rzp5UmWHdmgC zOMm10sWxoYj9pgJCxiVbwdQY6y5jqVRTKVH4ZXn9fHrk3PGxaAMWEWBMq7Itswfpyz2 AZw88N+V/0Dta6/RdH8KWdFXShD6Qxtz1/lF4l8MGPuax2B7/fDOF8/rtCJMzxyZW0Ic afKKuY0Hmxba7IIagg04r0NvxoiX/zRsbj07imzLnFqnGYEh7vzpn6LvGvez2chh+Xz3 jVWTvxSjHdL7Uilw3aUo2w066akeTFN1cR31VofqAQ9bpUny8H6TCY6YgyPTvBivY+ek W+SA== X-Gm-Message-State: AOAM533fs7FkKC/l5NsGsjUxyW5Bjdm3yNk6aAtu3opgT2TRAWUusW3L UhH5xcO6fQDhXy5d/cdUEZU8kQu9i/JxVkMdzqi49v/KvfCoxiB6 X-Google-Smtp-Source: ABdhPJz4yOu9iIT7Ucs038m1FiOVilG2bHvEYsYhKqm5K0apkIV0yOlTGHOy0NGmHNwF5e1TcprFmbC4TVOB79sRo2g= X-Received: by 2002:a05:6102:2333:b0:325:b03e:aa4b with SMTP id b19-20020a056102233300b00325b03eaa4bmr2045951vsa.68.1649204691390; Tue, 05 Apr 2022 17:24:51 -0700 (PDT) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Tue, 5 Apr 2022 18:24:40 -0600 Message-ID: Subject: Re: Hour-long sleeps in the ZFS write throttle: fix for 13.1 ? To: Alan Somers Cc: freebsd-fs Content-Type: multipart/alternative; boundary="000000000000cae50605dbf16376" X-Rspamd-Queue-Id: 4KY4yq6qn8z3n1L X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20210112.gappssmtp.com header.s=20210112 header.b=DZE57LDO; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::932) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-0.24 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.998]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20210112.gappssmtp.com:s=20210112]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.76)[0.755]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20210112.gappssmtp.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::932:from]; MLMMJ_DEST(0.00)[freebsd-fs]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_SPF_NA(0.00)[no SPF record]; MIME_TRACE(0.00)[0:+,1:+,2:~]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N --000000000000cae50605dbf16376 Content-Type: text/plain; charset="UTF-8" On Tue, Apr 5, 2022 at 3:06 PM Alan Somers wrote: > All year long I've occasionally seen my ZFS processes get blocked in > dmu_tx_wait. They stay blocked for more than an hour but eventually > recover. I finally found the cause: an integer overflow bug in > ustosbt. The fix is simple enough, but my question is: should we try > to commit this in time for 13.1-RELEASE? It's a very disruptive bug, > but also very hard to trigger. It takes a pretty highly congested ZFS > system to trigger it. In theory the bug could affect other > subsystems, too. > > https://github.com/openzfs/zfs/issues/13289 > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263073 These routines were originally not meant for large times (> 1s). However, that was poorly documented and so I fixed it. But did so incorrectly. If you look at the bug, I've posted what I think is the fix (it also matches Alan's description). Warner --000000000000cae50605dbf16376 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Apr 5, 2022 at 3:06 PM Alan S= omers <asomers@freebsd.org>= ; wrote:
All yea= r long I've occasionally seen my ZFS processes get blocked in
dmu_tx_wait.=C2=A0 They stay blocked for more than an hour but eventually recover.=C2=A0 I finally found the cause: an integer overflow bug in
ustosbt.=C2=A0 The fix is simple enough, but my question is: should we try<= br> to commit this in time for 13.1-RELEASE?=C2=A0 It's a very disruptive b= ug,
but also very hard to trigger.=C2=A0 It takes a pretty highly congested ZFS=
system to trigger it.=C2=A0 In theory the bug could affect other
subsystems, too.

https://github.com/openzfs/zfs/issues/13289
https://bugs.freebsd.org/bugzilla/show_bu= g.cgi?id=3D263073

These routines were o= riginally not meant for large times (> 1s). However,
that was = poorly documented and so I fixed it. But did so incorrectly.
= If you look at the bug, I've posted what I think is the fix (it also ma= tches
Alan's description).

Warner
--000000000000cae50605dbf16376--