[Bug 261690] NFSv4 mount on Linux client hangs during complex access patterns (gcc bootstrapping on client)
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261690] NFSv4 mount on Linux client hangs during complex access patterns (gcc bootstrapping on client)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261690] NFSv4 mount on Linux client hangs during complex access patterns (gcc bootstrapping on client)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261690] NFSv4 mount on Linux client hangs during complex access patterns (gcc bootstrapping on client)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261690] NFSv4 mount on Linux client hangs during complex access patterns (gcc bootstrapping on client)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261690] NFSv4 mount on Linux client hangs during complex access patterns (gcc bootstrapping on client)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261690] NFSv4 mount on Linux client hangs during complex access patterns (gcc bootstrapping on client)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 03 Feb 2022 13:24:01 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261690
Bug ID: 261690
Summary: NFSv4 mount on Linux client hangs during complex
access patterns (gcc bootstrapping on client)
Product: Base System
Version: 13.0-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs@FreeBSD.org
Reporter: bf@cebitec.uni-bielefeld.de
A ZFS dataset mounted from a FreeBSD-13.0-p7 NFS server via NFSv4.1 on
a Linux Ubuntu 20.04.3 client with 5.4.0-81-generic kernel is used to
build gcc-11.2.0 from source (sources and object files on NFS). The
build process will become stuck in kernel space at different points
in time.
The same setup with FreeBSD 11.4-RELEASE-p5 as server works flawlessly.
*** How to reproduce:
Server: FreeBSD 13.0-RELEASE-p7
***
vfs.nfs.enable_uidtostring=1
vfs.nfsd.enable_stringtouid=1
vfs.nfsd.server_min_nfsvers=3
***
rpcbind_enable="YES"
nfs_server_enable="YES"
nfsv4_server_enable="YES"
nfs_reserved_port_only="YES"
nfsuserd_flags="-manage-gids -usertimeout 10 -usermax 2300 20"
mountd_enable="YES"
mountd_flags="-r"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
Client: Ubuntu 20.04.3 LTS 5.4.0-81-generic
ZFS dataset exported sec=sys, mounted on Linux client via NFSv4.1
(NFSv4.0 shows the same behaviour).
On Linux client, vanilla build of gcc-11.2.0, sources and build dir
on the same NFS mount:
cd /vol/perf/bsd13-3x7/bf # this is on the NFS mount
tar xf gcc-11.2.0.tar.gz
cd gcc-11.2.0
./contrib/download_prerequisites
cd ..
mkdir obj
cd obj
../gcc-11.2.0/configure --prefix=/vol/bsd13-3x7/bf/gcc \
--enable-languages=c,c++,fortran,go --disable-multilib
make -j20
*** After some time the compiler processes become stuck in kernel space:
[Wed Feb 2 17:01:39 2022] cc1plus D 0 940194 940193 0x00004320
[Wed Feb 2 17:01:39 2022] Call Trace:
[Wed Feb 2 17:01:39 2022] __schedule+0x2e3/0x740
[Wed Feb 2 17:01:39 2022] schedule+0x42/0xb0
[Wed Feb 2 17:01:39 2022] io_schedule+0x16/0x40
[Wed Feb 2 17:01:39 2022] wait_on_page_bit+0x11c/0x200
[Wed Feb 2 17:01:39 2022] ? file_fdatawait_range+0x30/0x30
[Wed Feb 2 17:01:39 2022] wait_on_page_writeback+0x43/0x90
[Wed Feb 2 17:01:39 2022] __filemap_fdatawait_range+0x98/0x100
[Wed Feb 2 17:01:39 2022] filemap_write_and_wait+0x60/0xa0
[Wed Feb 2 17:01:39 2022] nfs_wb_all+0x1f/0x130 [nfs]
[Wed Feb 2 17:01:39 2022] nfs4_file_flush+0x73/0xa0 [nfsv4]
[Wed Feb 2 17:01:39 2022] filp_close+0x37/0x70
[Wed Feb 2 17:01:39 2022] __close_fd+0x7d/0xa0
[Wed Feb 2 17:01:39 2022] __x64_sys_close+0x22/0x50
[Wed Feb 2 17:01:39 2022] do_syscall_64+0x57/0x190
[Wed Feb 2 17:01:39 2022] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Wed Feb 2 17:01:39 2022] RIP: 0033:0x7f4c3121a4ab
[Wed Feb 2 17:01:39 2022] Code: Bad RIP value.
[...]
[Wed Feb 2 17:02:30 2022] nfs: server allerbeeke not responding, still trying
[Wed Feb 2 17:02:30 2022] nfs: server allerbeeke not responding, still trying
*** What happens on the wire:
Capture on the FreeBSD server with:
tcpdump -i mce0 -w /var/tmp/fbsd-nfs4-server.pcap host bildhorst
pcap file is available on our Nextcloud:
https://docs.cebitec.uni-bielefeld.de/s/n5ZYmjnYd2fjQZ3
On the close system call the Linux client seems to flush the file to disk with
a series of SEQUENCE,PUTFH,WRITE,GETATTR compounds which the FreeBSD server
simply ceases to reply to after some time (right at the end of the capture).
Also notable, a few calls seem to get multiple replies, e.g. frame 1604448 in
that capture (also SEQUENCE,PUTFH,WRITE,GETATTR) got two replies with different
seq IDs in 1604458 and 1604459.
--
You are receiving this mail because:
You are the assignee for the bug.