From nobody Sun Dec 12 16:45:06 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id A165118D95E9 for ; Sun, 12 Dec 2021 16:45:23 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oo1-f41.google.com (mail-oo1-f41.google.com [209.85.161.41]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JBr9H494qz3JMW for ; Sun, 12 Dec 2021 16:45:23 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oo1-f41.google.com with SMTP id d1-20020a4a3c01000000b002c2612c8e1eso3652674ooa.6 for ; Sun, 12 Dec 2021 08:45:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Kuy6EO1jGnlHtgdbKXFtAqtLeCzaN6Zzj6wtST6ziRo=; b=bn32kRZX26VOfKwamZWThQTZEqRmy26RUahtIWTZTqpfDB/HBOz4ZPKGB0Fv2i/v8x KeIAAvwZd9sJR8sCtjg7rE0fmPRqlIX1/BQnA2pwsLrH4xBgVIX4KcZjcxmnVs22AHuf UV9SeEk10310p41+jyDzwDE0mqWoBPtCqm01+moRmClD1b/q8SdqeBJ6Yy81NeQTMayk TB4+UUOUJKUvm4SHPKp1+qhBc2x30q/z1Q037cjyI7ql4NA5BDmPBWCaVEwFPgrtnLiq Cy5t9It0tEhZrNLvlc2fs9+VHwQrijcu2zNY+ubxd50R4nvqUq6QqNvz87b7ZOOqAjm0 ve9w== X-Gm-Message-State: AOAM530hNxnYCbKZ88WMhv+qykXthcxDcCOYAQZMYt8GyMo1y0JcVFH3 kC19ahjhZ1VSepkvcQu3CS2NXE5WEGz3bY/SAbHmx5bB X-Google-Smtp-Source: ABdhPJy/CKSDueKdUHIERZbUU8/VGI4K0WL2pWIevrCTQD3CWzR7nFvuRIi+v/sJMfprxNGK38X30MLmm3d9JOF7B6A= X-Received: by 2002:a4a:e1a9:: with SMTP id 9mr16371440ooy.41.1639327517464; Sun, 12 Dec 2021 08:45:17 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <20211212102032.08af9689@jelly.fritz.box> In-Reply-To: <20211212102032.08af9689@jelly.fritz.box> From: Alan Somers Date: Sun, 12 Dec 2021 09:45:06 -0700 Message-ID: Subject: Re: CURRENT: ZFS freezes system beyond reboot To: FreeBSD User Cc: FreeBSD CURRENT Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4JBr9H494qz3JMW X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Sun, Dec 12, 2021 at 2:22 AM FreeBSD User wrote: > > Running CURRENT (FreeBSD 14.0-CURRENT #52 main-n251260-156fbc64857: Thu > Dec 2 14:45:55 CET 2021 amd64), out of the sudden the ZFS RAIDZ pool > suffered from an error: > > Solaris: WARNING: Pool 'POOL00' has encountered an uncorrectable I/O > failure and has been suspended. > > The system does not repsond anymore on that pool, transactions to and > from that pool are frozen, the system is 99.9% idle. > The most "not so funny" part is: the box doesn't even recognize a > "shutdown -r now" or a brute force "reboot". I still can login via ssh, > but any action regarding the ZFS pool freezes the console/terminal. > > ZFS very often renders the system unresponsible forever. How can this > be mitigated? The system in question is on a remote site and it seems > not only to be bound to CURRENT, we realised similar problems on > 13-STABLE as well. > > What can I do to "unfreeze" the ZFS? The main OS is, luckily, on an > UFS/FFS filesystem and so not affected from that problem. > > By the way, here some more details, as far as I can pick those up: > > zpool clear POOL00 cannot clear errors for POOL00: I/O error > > Whatever took out the ZFS pool (can not see any hardware errors, the > pool is part of services and especially a poudriere build system and > under heavy load all the time, the box has 16 GB RAM), it also renders > the rest of the system unusable in a way which is beyond a "reboot". > > Kind regrads, > oh You need to look at what's causing those errors. What kind of disks are you using, with what HBA? It's not surprising that any access to ZFS hangs; that's what it's designed to do when a pool is suspended.