Re: HEADS UP: NFS changes coming into CURRENT early February

In reply to: Gleb Smirnoff : "Re: HEADS UP: NFS changes coming into CURRENT early February"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: Gleb Smirnoff <glebius_at_freebsd.org>
Date: Wed, 29 Jan 2025 06:36:07 UTC

On Tue, Jan 28, 2025 at 09:14:00PM -0800, Gleb Smirnoff wrote:
T> Second, with the patch the M_RPC leak count for me is 2. And I found that these
T> two items are basically is a clnt_vc that belongs to a closed connection:
T> 
T> fffff80029614a80 tcp4       0      0 10.6.6.9.772       10.6.6.9.2049      CLOSED     
T> 
T> There is no connection peer connection, as the server received a timeout trying
T> to send. But rpc.tlsclntd doesn't try to send anything on the socket, it just
T> keeps it select(2) fd set and doesn't garbage collect.
T> 
T> So it is a bigger resource leak than just two pieces of M_RPC. I don't think
T> this is related to my changes.

Here is what is going on here:

- TCP connection is teared down and tcp_close() calls soisdisconnected()
- soisdisconnected() calls clnt_vc_soupcall() to notify of error condition
- clnt_vc_soupcall() tries soreceive() and gets so->so_error.
- clnt_vc_soupcall() sets the client to error state. It doesn't wakeup
  anything cause there were no running RPC requests. It can't report back
  to clnt_rc that connection is dead. It doesn't mark itself
  for the clnt_vc_dotlsupcall() processing.

So we end up with:

(kgdb) p $tp->t_state
$25 = 0	/* TCPS_CLOSED */
(kgdb) p/x $tp->t_inpcb.inp_flags & 0x04000000	/* INP_DROPPED */
$27 = 0x4000000
(kgdb) p/x $tp->t_inpcb.inp_socket->so_state
$28 = 0x2000	/* SS_ISDISCONNECTED */
(kgdb) p/x $tp->t_inpcb.inp_socket->so_count
$35 = 0x2
(kgdb) p/x $ct->ct_rcvstate 
$29 = 0x41	/* RPCRCVSTATE_UPCALLTHREAD | RPCRCVSTATE_NORMAL */
(kgdb)  p $ct->ct_error
$30 = {re_status = RPC_CANTRECV, ru = {RE_errno = 13, RE_why = RPCSEC_GSS_CREDPROBLEM, RE_vers = {low = 13, high = 0}, RE_lb = {s1 = 13, s2 = 0}}}
(kgdb) p $ct->ct_pending
$31 = {tqh_first = 0x0, tqh_last = 0xfffff80002838ea8}

Note: In my case so->so_error was EACCESS, cause I used ipfw(4) rule to tear down
connection, for normal TCP timeout should be ETIMEDOUT or ECONNRESET if remote
has reset.  That's why $ct->ct_error.ru.RE_errno == 13.

So we need some mechanism for clnt_vc_soupcall() to report to upper clnt_rc
that we are dead and ready to be garbage collected via CLNT_CLOSE() and then
CLNT_RELEASE().

Once clnt_vc_destroy() is called the daemon will be notified that the TLS
socket can be closed by the daemon, bringing so_count to 1 and then final
sorele() will bring it to 0 and free.

-- 
Gleb Smirnoff