From nobody Thu Oct 14 01:28:37 2021 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 1DA941801239 for ; Thu, 14 Oct 2021 01:28:37 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HVBcj0HGdz3MbP for ; Thu, 14 Oct 2021 01:28:37 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id DFFA9272A for ; Thu, 14 Oct 2021 01:28:36 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 19E1SawR073055 for ; Thu, 14 Oct 2021 01:28:36 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 19E1Sa32073054 for bugs@FreeBSD.org; Thu, 14 Oct 2021 01:28:36 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 259163] 12.2-p9 & -p10 NFSv3 server times out on CentOS 8 client after ~13-14 minute for file copying Date: Thu, 14 Oct 2021 01:28:37 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: parv.0zero9+freebsd@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D259163 Bug ID: 259163 Summary: 12.2-p9 & -p10 NFSv3 server times out on CentOS 8 client after ~13-14 minute for file copying Product: Base System Version: 12.2-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: parv.0zero9+freebsd@gmail.com Our ZFS-based storage system tracks FreeBSD 12.2 release. A directory from = ZFS RAID-Z2 dataset is exported, with use of NFS v3 server. (NFS v4 server is n= ot being used due to the need to implement idmap(d).) FreeBSD server has this in /etc/exports ... /real/dir -alldirs,-maproot=3Droot -network IP-address/netmask ... CentOS client mounts the exported directory via autofs as ... SERVER-addr -soft,retrans=3D1,retry=3D1,vers=3D3 SERVER-addr:/real/dir= =20 After updating 12.2 to -p9 from -p8, which brought changes to libcasper, Ce= ntOS 8 NFS v3 client started seeing NFS connection timing out after ~13-14 minut= es of a file copy job. Going to -p10 did not help the situation any. Moving back to -p8 does not cause any time out issues on CentOS side. Around time of time-out, tcpdump of NFS connection on both CentOS client & FreeBSD server only showed that FreeBSD server was indeed slower to respond= in the expected time. Output on FreeBSD server due to "vfs.nfsd.debuglevel=3D2" MIB did not enlig= hten me; produced a flood of seemingly irrelevant output so it's entirely possi= ble I might have missed something. There had/have not been any messages on FreeBSD related to or around the ti= me of NFS connection time out. When CentOS client experiences NFS server time out, it puts out "CPU stuck" messages about multiple CPUs on the tty; dmesg gets ... ... 2021-08-23T21:20:08,041420 watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u24:3:856093] 2021-08-23T21:20:08,043420 watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/u24:7:856767] 2021-08-23T21:20:08,044420 watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u24:5:856593] 2021-08-23T21:20:08,044420 Modules linked in: ib_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c nfnetlink tun bridge stp llc sunrpc vfat fat intel_rapl= _msr intel_rapl_common snd_hda_codec_hdmi snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_xtensa_= dsp snd_sof_intel_hda snd_hda_codec_realtek snd_sof snd_hda_codec_generic ledtrig_audio snd_soc_skl snd_soc_sst_ipc snd_soc_sst_dsp x86_pkg_temp_ther= mal intel_powerclamp snd_hda_ext_core coretemp snd_soc_acpi_intel_match snd_soc_acpi kvm_intel snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation iTCO_wdt mei_wdt iTCO_vendor_support intel_wmi_thunderbolt wmi_bmof snd_soc_core kvm snd_compress soundwire_cade= nce soundwire_bus irqbypass snd_hda_codec snd_hda_core crct10dif_pclmul 2021-08-23T21:20:08,044438 crc32_pclmul snd_hwdep snd_seq ghash_clmulni_in= tel rapl snd_seq_device intel_cstate snd_pcm intel_uncore snd_timer snd pcspkr soundcore i2c_i801 mei_me mei intel_pch_thermal ie31200_edac wmi acpi_pad intel_pmc_core ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg i915 cec intel_= gtt drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb ahci e1000e libahci libata dca i2c_algo_bit crc32c_intel video pinctrl_cannonlake fuse 2021-08-23T21:20:08,044451 CPU: 2 PID: 856593 Comm: kworker/u24:5 Kdump: lo= aded Tainted: G W L --------- - - 4.18.0-305.10.2.el8_4.x86_64 #1 2021-08-23T21:20:08,044451 Hardware name: Puget Systems C246M-WU4/C246M-WU4= -CF, BIOS F4e 09/16/2020 2021-08-23T21:20:08,044466 Workqueue: rpciod rpc_async_schedule [sunrpc] 2021-08-23T21:20:08,044469 RIP: 0010:native_queued_spin_lock_slowpath+0x18f/0x1d0 2021-08-23T21:20:08,044469 Code: c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 05 48 = 63 c9 48 05 00 ac 02 00 48 03 04 cd a0 b8 b7 a7 48 89 10 8b 42 08 85 c0 75 09 = f3 90 <8b> 42 08 85 c0 74 f7 48 8b 02 48 85 c0 0f 84 6e ff ff ff 48 89 c1 2021-08-23T21:20:08,044470 RSP: 0018:ffff9d9bc9ea7de8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 2021-08-23T21:20:08,044471 RAX: 0000000000000000 RBX: ffff8a8ac1679000 RCX: 0000000000000001 2021-08-23T21:20:08,044471 RDX: ffff8a8b7c2aac00 RSI: 00000000000c0000 RDI: ffff8a8ac1679670 2021-08-23T21:20:08,044472 RBP: ffff8a8ac1679670 R08: 0000000000200000 R09: 0000000000000000 2021-08-23T21:20:08,044472 R10: ffff8a7c88217200 R11: 0000000000000001 R12: ffff8a80d33fc7d0 2021-08-23T21:20:08,044472 R13: 0000000000000000 R14: ffffffffc0c52b20 R15: ffffffffc0c44cf0 2021-08-23T21:20:08,044473 FS: 0000000000000000(0000) GS:ffff8a8b7c280000(0000) knlGS:0000000000000000 2021-08-23T21:20:08,044473 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2021-08-23T21:20:08,044474 CR2: 000055a105562088 CR3: 0000000eb0610004 CR4: 00000000003706e0 2021-08-23T21:20:08,044474 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2021-08-23T21:20:08,044475 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 2021-08-23T21:20:08,044475 Call Trace: 2021-08-23T21:20:08,044478 _raw_spin_lock+0x1c/0x20 2021-08-23T21:20:08,044487 xprt_prepare_transmit+0x4a/0xb0 [sunrpc] 2021-08-23T21:20:08,044495 call_transmit+0x2e/0x80 [sunrpc] 2021-08-23T21:20:08,044503 __rpc_execute+0x85/0x3c0 [sunrpc] 2021-08-23T21:20:08,044506 ? try_to_wake_up+0x1cd/0x540 2021-08-23T21:20:08,044514 rpc_async_schedule+0x29/0x40 [sunrpc] 2021-08-23T21:20:08,044515 process_one_work+0x1a7/0x360 2021-08-23T21:20:08,044516 worker_thread+0x30/0x390 2021-08-23T21:20:08,044517 ? create_worker+0x1a0/0x1a0 2021-08-23T21:20:08,044518 kthread+0x116/0x130 2021-08-23T21:20:08,044519 ? kthread_flush_work_fn+0x10/0x10 2021-08-23T21:20:08,044520 ret_from_fork+0x35/0x40 ... 021-08-23T21:20:41,880955 nfs: server SERVER not responding, timed out 2021-08-23T21:20:41,881036 nfs: server SERVER not responding, timed out 2021-08-23T21:20:41,881039 nfs: server SERVER not responding, timed out ... Earlier, mostly one-sided, discussion elsewhere ... - NFS v3 - FreeBSD 12 server connection getting timed out on CentOS 8 clien= t, https://lists.freebsd.org/pipermail/freebsd-questions/2021-August/294634.ht= ml - centos-freebsd-nfs-issue, https://github.com/parv/centos8-nfs3-issue (dmesg from CentOS; tcpdumps from CentOS & FreeBSD) - https://twitter.com/_parv/status/1429889151155916800 --=20 You are receiving this mail because: You are the assignee for the bug.=