AW: NFS issues since upgrading to 13-RELEASE

Scheffenegger, Richard Richard.Scheffenegger at
Thu Apr 15 21:14:36 UTC 2021


r367492 fixes an issue around "premature" transmission of an ACK due to the incoming segment only been partially processed at the time - related to in-kernel TCP consumers which use socket upcalls.

Rick mentioned, that the NFS server (one in-kernel TCP user) has stringent requirements on the state of the socket during the upcall, thus D29690 is retaining the lock on the socket buffer until TCP processing is finalized and the upcall can be done without running any risk for transmitting outdated information back to the other end.

However, I have no proper way to verify/validate this interaction.

My ask would be to test the behavior with D29690 first - but if similar hangs keep reoccurring, then revert r367492 (which will also mean more severe surgery on the TCP processing flow).


Richard Scheffenegger

-----Ursprüngliche Nachricht-----
Von: Rick Macklem <rmacklem at> 
Gesendet: Donnerstag, 15. April 2021 23:05
An: Allan Jude <allanjude at>; freebsd-current at
Cc: Richard Scheffenegger <rscheff at>; Juraj Lutter <otis at>
Betreff: Re: NFS issues since upgrading to 13-RELEASE

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

I wrote:
[stuff snipped]
>- Alternately you can try rscheff@'s alternate proposed patch that is 
>  https://reviews.freebsd.og/D29690.
Oops, that's


  I have not yet had time to test this one, but since I cannot reproduce the hang, I can
  only do testing of it to see that it is "no worse" than reverting r367492 for my

Please let us know which you choose and whether or not it fixes your problem.

>> Any pointers for troubleshooting this? I've been looking through vmstat, gstat, top, etc. when the problem occurs, but I haven't been able to pinpoint the issue. I can get pcap, but it would be from the hosts, because I don't have a 10G tap or managed switch.
>run `nfsstat -d 1` and try to capture a few lines from before, during, 
>and after the stall, and that may provide some insight.
>Specifically, does the queue length grow, suggesting it is waiting on 
>the I/O subsystem, or does it just stop getting traffic all together.

If the revert of r367492 does not fix the problem, monitor the TCP connection(s) via "netstat -a" and, if possible, capture packets via tcpdump -s 0 -w hang.pcap host <nfs-client> or similar, run on the server.

Ideally the tcpdump would  be started before the "hang" occurs, but running one while the hang is occurring (until after it recovers) could also be useful.

Thanks for reporting this, rick

Allan Jude
freebsd-current at mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscribe at"

More information about the freebsd-current mailing list