svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls

Wed Jul 1 02:20:46 UTC 2020

On Wed, Jul 01, 2020 at 01:23:50AM +0000, Rick Macklem wrote:
> Rick Macklem wrote:
> >Benjamin Kaduk wrote:
> >>On Tue, Jun 30, 2020 at 04:20:45PM +0000, Rick Macklem wrote:
> >>> If you happen to know how to set a timeout for SSL_connect() in the openssl
> >>> library, I would be interested in hearing that.
> >>
> >>As it happens, I took a look before I wrote the initial note, and there
> >>doesn't seem to be any intrinsic TLS (not DTLS) handshake timeouts in
> >>libssl itself; I expect this is actually just the (kernel's!) TCP timeout.
> >>So you'd be getting the socket fd (e.g., SSL_get_fd(), if you don't have a
> >>reference already) and using setsockopt() to set the timeout(s).
> >Interesting. The test case I simulated did not close the TCP socket used by
> >SSL_connect(). The server just replied to the STARTTLS Null RPC, but did not
> >call SSL_accept(), so the server side just isn't playing "handshake".
> >"netstat -a" showed the connection as ESTABLISHED.
> >During debugging, I also used the trick of putting:
> >    while (1)
> >        sleep(1);
> >right after the SSL_connect() call and, when watching it via "ps",
> >it would switch from "sbwait" to "nanoslp" after 6 minutes and
> >a syslog() call showed that SSL_connect() had returned -1.
> >
> >So, if the TCP connection was "established", what caused the SSL_connect()
> >to return with an error (-1) after 6 minutes?
> >
> >Now, there is a 6 minute idle timeout in the RPC code for TCP where it,
> >by default, closes the connection when there is 6 minutes without any
> >activity. (I have to look if waiting for a reply for the upcall implies "no activity" and >if
> >this also happens for AF_LOCAL sockets, which is what the upcalls use.)
> Ok, I figured out what is happening for this test.
> It is the 6 minute idle timeout, but it occurs at the server end, where the NFS server
> end shuts down the TCP connection.

Ah, that makes sense.

> Now, the client cannot assume all servers will do this.

Right.

> I'm going to try playing around with doing a shutdown of the socket on the
> client end after a shorter timeout on the upcall and see if that can get
> SSL_connect() to return with a failure in the daemon.
> 
> >Now, if that happens, a SIGPIPE would be posted to the daemon, which
> >is SIG_IGN'd by the daemon. But maybe the SIGPIPE somehow causes
> >SSL_connect() to return -1 by making the syscall it is doing (read/recv on the
> >TCP socket sitting in sbwait) return EINTR, or something like that?
> Ignore this "theory". It was bunk.

Non-ignored signals would cause SSL_connect() to return, but ignored ones
should be wholly ignored, yes.

> >I can change this 6minute timeout to see if that affects it.
> Can't be changed, since it is at the server end of the TCP connection.

Can't you set a client-side (e.g., read) timeout, though?

-Ben

> (A comment in the krpc code mentions a 5minute timeout in the client,
>  but I don't see that in the code?)
> 
> >When you've got upcalls and library functions both talking to sockets it
> >can get interesting.
> >
> >Thanks for the comments, rick
> 
> Correcting myself, rick
> 
> -Ben
>