Re: Bugzilla PR#293127 and 292884

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Tue, 10 Mar 2026 00:08:09 UTC
On Sun, Feb 22, 2026 at 4:00 PM Rick Macklem <rick.macklem@gmail.com> wrote:
>
> On Sun, Feb 22, 2026 at 3:44 PM Mark Johnston <markj@freebsd.org> wrote:
> >
> > On Sun, Feb 22, 2026 at 03:27:42PM -0800, Rick Macklem wrote:
> > > On Sun, Feb 22, 2026 at 3:21 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > > >
> > > > On Sun, Feb 22, 2026 at 2:19 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Could someone who is familiar with what has changed
> > > > > related to socket handling between FreeBSD 14 and 15
> > > > > please look at these bugzilla PRs.
> > > > >
> > > > > They show similar crashes, which indicate that the
> > > > > socket receive buffer has been discarded when
> > > > > soreceive() returns EWOULDBLOCK.
> > > > I've been looking at svc_vc.c and I think there needs
> > > > to be some updating done to it.
> > > >
> > > > The code currently looks like...
> > > > error = soreceive(so, NULL, &uio, &m, &ctrl, &rcvflag);
> > > >
> > > > if (error == EWOULDBLOCK) {
> > > >     /*
> > > >      * We must re-test for readability after
> > > >      * taking the lock to protect us in the case
> > > >      * where a new packet arrives on the socket
> > > >      * after our call to soreceive fails with
> > > >      * EWOULDBLOCK.
> > > >      */
> > > >      SOCK_RECVBUF_LOCK(so);
> > > >      if (!soreadable(so))
> > > >           xprt_inactive_self(xprt);
> > > >      SOCK_RECVBUF_UNLOCK(so);
> > > >      sx_xunlock(&xprt->xp_lock);
> > > >      return (FALSE);
> > > > }
> > > >
> > > > When I look at the code in soreceive_stream(), it calls
> > > >    error = SOCK_IO_RECV_LOCK(so, SBLOCKWAIT(MGS_DONTWAIT));
> > > > which will return EWOULDBLOCK and this happens before SOCKBUF_LOCK()
> > > > in soreceive_stream_locked().
> >
> > That'll happen only if a different thread holds the socket I/O lock,
> > i.e., a different thread is trying to read data from the socket at the
> > same time.  Can that happen here?
> No, it shouldn't. An xprt is assigned to an nfsd thread and that should
> only be one at a time. It does get re-assigned after
> xprt_inactive()/xprt_active()
> calls, but I don't think that should happen while a thread assigned to an xprt
> is in soreceive(), as protected by the xp_lock.
>
> >
> > > > I don't think the EWOULDBLOCK here was meant to check that some other
> > > > thread has the I/O lock.  I think that EWOULDBLOCK was meant to check
> > > > for "no rcv data on socket"?
> >
> > Yes, if it's possible that something else is reading from the socket at
> > the same time, then we might be doing the wrong thing here, but I'm not
> > sure if that's possible?  Are xprts one-to-one with kernel worker
> > threads?
> >
> > I'm pretty this hasn't changed recently either, though certainly that
> > part of the socket code's been refactored recently.
> Yea, the code appears to be in 14 and there has not been any
> reported crashes with 14.
>
> >
> > > > So, does someone (like markj@) know what this code should now be?
> > > > (The crashes might be because this code is just plain broken now.)
> > >
> > > Oh, and it appears that soreceive_stream_locked() returns EAGAIN when
> > > there is no data to be read and not EWOULDBLOCK.
> >
> > EAGAIN and EWOULDBLOCK are two names for the same thing.  errno.h
> > defines EWOULDBLOCK to be EAGAIN.
> I knew that, but I forgot;-)
> (It would be a boring/bothersome commit, but maybe someone should make
> the kernel always use one of them, to avoid confusion..)
>
> Well, that's all I can think of. Hopefully the reporter can run with
> GENERIC-DEBUG
> and we can get more info. (Maybe I'll ask the PR#292884 reporters the
> same thing.)
Could someone TCP conversant take a look at attachment core.txt.6 in
PR#293127.

It shows a crash in tcp_discardcb(). He was using the patches that
avoided crashes for the other reporters, which simply acquires/releases
an extra reference count on the socket. (As noted, these crashes have
been showing up for FreeBSD-15.0, although the socket handling code
in the krpc hasn't changed in something like 10 years.)

I'm hoping a crash in tcp_discardcb() might give someone a hint
w.r.t. what is going on?

Thanks, rick

>
> Thanks for your help with this, rick