Re: Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Wed, 21 Aug 2024 14:45:31 UTC
Please create a PR for this and include at least
one backtrace. I will try and figure out how
locallocks could cause it.

I suspect few use locallocks=1.

rick

On Wed, Aug 21, 2024 at 7:29 AM Matthew L. Dailey <
Matthew.L.Dailey@dartmouth.edu> wrote:

> Hi all,
>
> I posted messages to the this list back in February and March
> (
> https://lists.freebsd.org/archives/freebsd-current/2024-February/005546.html)
>
> regarding kernel panics we were having with nfs clients doing hdf5 file
> operations. After a hiatus in troubleshooting, I had more time this
> summer and have found the cause - the vfs.nfsd.enable_locallocks sysctl.
>
> When this is set to 1, we can induce either a panic or hung nfs server
> (more rarely) usually within a few hours, but sometimes within several
> days to a week. We have replicated this on 13.0 through 15.0-CURRENT
> (20240725-82283cad12a4-271360). With this set to 0 (default), we are
> unable to replicate the issue, even after several weeks of 24/7 hdf5
> file operations.
>
> One other side-effect of these panics is that on a few occasions it has
> corrupted the root zpool beyond repair. This makes sense since kernel
> memory is getting corrupted, but obviously makes this issue more impactful.
>
> I'm hoping this is enough information to start narrowing down this
> issue. We are specifically using this sysctl because we are also serving
> files via samba and want to ensure consistent locking.
>
> I have provided some core dumps and backtraces previously, but am happy
> to provide more as needed. I also have a writeup of exactly how to
> reproduce this that I can send directly to anyone who is interested.
>
> Thanks so much for any and all help with this tricky problem. I'm happy
> to do whatever I can to help get this squashed.
>
> Best,
> Matt
>