AW: NFS Mount Hangs
    Scheffenegger, Richard 
    Richard.Scheffenegger at netapp.com
       
    Sat Apr 10 09:19:35 UTC 2021
    
    
  
Hi Rick,
> Well, I have some good news and some bad news (the bad is mostly for Richard).
>
> The only message logged is:
> tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, segment processed normally
>
> But...the RST battle no longer occurs. Just one RST that works and then the SYN gets SYN,ACK'd by the FreeBSD end and off it goes...
>
> So, what is different?
>
> r367492 is reverted from the FreeBSD server.
> I did the revert because I think it might be what otis@ hang is being caused by. (In his case, the Recv-Q grows on the socket for the stuck Linux client, while others work.
>
> Why does reverting fix this?
> My only guess is that the krpc gets the upcall right away and sees a EPIPE when it does soreceive()->results in soshutdown(SHUT_WR).
With r367492 you don't get the upcall with the same error state? Or you don't get an error on a write() call, when there should be one?
>From what you describe, this is on writes, isn't it? (I'm asking, at the original problem that was fixed with r367492, occurs in the read path (draining of ths so_rcv buffer in the upcall right away, which subsequently influences the ACK sent by the stack).
I only added the so_snd buffer after some discussion, if the WAKESOR shouldn't have a symmetric equivalent on WAKESOW....
Thus a partial backout (leaving the WAKESOR part inside, but reverting the WAKESOW part) would still fix my initial problem about erraneous DSACKs (which can also lead to extremely poor performance with Linux clients), but possible address this issue...
Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 for the revert only on the so_snd upcall?
If this doesn't help, some major surgery will be necessary to prevent NFS sessions with SACK enabled, to transmit DSACKs...
> I know from a printf that this happened, but whether it caused the RST battle to not happen, I don't know.
> 
> I can put r367492 back in and do more testing if you'd like, but I think it probably needs to be reverted?
Please, I don't quite understand why the exact timing of the upcall would be that critical here...
A comparison of the soxxx calls and errors between the "good" and the "bad" would be perfect. I don't know if this is easy to do though, as these calls appear to be scattered all around the RPC / NFS source paths.
> This does not explain the original hung Linux client problem, but does shed light on the RST war I could create by doing a network partitioning.
>
> rick
    
    
More information about the freebsd-net
mailing list