From nobody Sun Apr 14 15:47:56 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VHZSv1zw1z5GpQm for ; Sun, 14 Apr 2024 15:47:59 +0000 (UTC) (envelope-from kempe@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4VHZSt5MMNz4Mlf for ; Sun, 14 Apr 2024 15:47:58 +0000 (UTC) (envelope-from kempe@lysator.liu.se) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=lysator.liu.se; spf=pass (mx1.freebsd.org: domain of kempe@lysator.liu.se designates 130.236.254.3 as permitted sender) smtp.mailfrom=kempe@lysator.liu.se Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 8C31F1DB2D for ; Sun, 14 Apr 2024 17:47:57 +0200 (CEST) Received: from shipon.lysator.liu.se (shipon.lysator.liu.se [IPv6:2001:6b0:17:f0a0::83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 7EDDF1D9E7 for ; Sun, 14 Apr 2024 17:47:57 +0200 (CEST) Date: Sun, 14 Apr 2024 17:47:56 +0200 From: Andreas Kempe To: "freebsd-fs@freebsd.org" Subject: Automount + NFS hang issues (follow-up to FreeBSD 12.3/13.1 NFS client hang thread) Message-ID: List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Virus-Scanned: ClamAV using ClamSMTP X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.99 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; NEURAL_HAM_SHORT(-0.99)[-0.995]; DMARC_POLICY_ALLOW(-0.50)[lysator.liu.se,none]; RCVD_IN_DNSWL_MED(-0.20)[130.236.254.3:from]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; MIME_GOOD(-0.10)[text/plain]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; MISSING_XM_UA(0.00)[]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_EQ_ADDR_ALL(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; ARC_NA(0.00)[] X-Rspamd-Queue-Id: 4VHZSt5MMNz4Mlf Hello, I'm doing a follow-up on the thread FreeBSD 12.3/13.1 NFS client hang, message ID YpEwxdGCouUUFHiE@shipon.lysator.liu.se. After having had recurring issues ever since that thread and not managing a good tcpdump of a hang, I decided to simply get rid of automount and instead mount the NFS shares via the fstab. With the mounts being done via the fstab instead of automount, the NFS server restarting causes processes using the mount to hang, but when the server comes back things recover. When using automount as the NFS server becomes unresponsive, the system log is filled with lines like 7 Apr 10 13:00:14 shipon kernel: WARNING: autofs_trigger_one: request for /home/ completed with error 60, pid 68836 (fish) 8 Apr 10 13:00:14 shipon kernel: WARNING: autofs_trigger_one: request for /home/ completed with error 60, pid 69248 (sshd) 9 Apr 10 13:00:14 shipon kernel: WARNING: autofs_trigger_one: request for /home/ completed with error 60, pid 2221 (weechat) and it seems like automount is repeatedly trying to perform mounts until the system eventually hangs. When the system has hung, all automount processes are stuck in the kernel in uninterruptable sleep in the NFS code. You can find some stack traces in the old thread. Sometimes a umount -N on all the mounts would solve the issue, but often a system reboot was the only way to recover. Best regards, Andreas Kempe