Re: Hour-long sleeps in the ZFS write throttle: fix for 13.1 ?

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 06 Apr 2022 00:24:40 UTC
On Tue, Apr 5, 2022 at 3:06 PM Alan Somers <asomers@freebsd.org> wrote:

> All year long I've occasionally seen my ZFS processes get blocked in
> dmu_tx_wait.  They stay blocked for more than an hour but eventually
> recover.  I finally found the cause: an integer overflow bug in
> ustosbt.  The fix is simple enough, but my question is: should we try
> to commit this in time for 13.1-RELEASE?  It's a very disruptive bug,
> but also very hard to trigger.  It takes a pretty highly congested ZFS
> system to trigger it.  In theory the bug could affect other
> subsystems, too.
>
> https://github.com/openzfs/zfs/issues/13289
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263073


These routines were originally not meant for large times (> 1s). However,
that was poorly documented and so I fixed it. But did so incorrectly.
If you look at the bug, I've posted what I think is the fix (it also matches
Alan's description).

Warner