Re: HEADS UP: NFS changes coming into CURRENT early February
- In reply to: Gleb Smirnoff : "Re: HEADS UP: NFS changes coming into CURRENT early February"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 29 Jan 2025 06:36:07 UTC
On Tue, Jan 28, 2025 at 09:14:00PM -0800, Gleb Smirnoff wrote:
T> Second, with the patch the M_RPC leak count for me is 2. And I found that these
T> two items are basically is a clnt_vc that belongs to a closed connection:
T>
T> fffff80029614a80 tcp4 0 0 10.6.6.9.772 10.6.6.9.2049 CLOSED
T>
T> There is no connection peer connection, as the server received a timeout trying
T> to send. But rpc.tlsclntd doesn't try to send anything on the socket, it just
T> keeps it select(2) fd set and doesn't garbage collect.
T>
T> So it is a bigger resource leak than just two pieces of M_RPC. I don't think
T> this is related to my changes.
Here is what is going on here:
- TCP connection is teared down and tcp_close() calls soisdisconnected()
- soisdisconnected() calls clnt_vc_soupcall() to notify of error condition
- clnt_vc_soupcall() tries soreceive() and gets so->so_error.
- clnt_vc_soupcall() sets the client to error state. It doesn't wakeup
anything cause there were no running RPC requests. It can't report back
to clnt_rc that connection is dead. It doesn't mark itself
for the clnt_vc_dotlsupcall() processing.
So we end up with:
(kgdb) p $tp->t_state
$25 = 0 /* TCPS_CLOSED */
(kgdb) p/x $tp->t_inpcb.inp_flags & 0x04000000 /* INP_DROPPED */
$27 = 0x4000000
(kgdb) p/x $tp->t_inpcb.inp_socket->so_state
$28 = 0x2000 /* SS_ISDISCONNECTED */
(kgdb) p/x $tp->t_inpcb.inp_socket->so_count
$35 = 0x2
(kgdb) p/x $ct->ct_rcvstate
$29 = 0x41 /* RPCRCVSTATE_UPCALLTHREAD | RPCRCVSTATE_NORMAL */
(kgdb) p $ct->ct_error
$30 = {re_status = RPC_CANTRECV, ru = {RE_errno = 13, RE_why = RPCSEC_GSS_CREDPROBLEM, RE_vers = {low = 13, high = 0}, RE_lb = {s1 = 13, s2 = 0}}}
(kgdb) p $ct->ct_pending
$31 = {tqh_first = 0x0, tqh_last = 0xfffff80002838ea8}
Note: In my case so->so_error was EACCESS, cause I used ipfw(4) rule to tear down
connection, for normal TCP timeout should be ETIMEDOUT or ECONNRESET if remote
has reset. That's why $ct->ct_error.ru.RE_errno == 13.
So we need some mechanism for clnt_vc_soupcall() to report to upper clnt_rc
that we are dead and ready to be garbage collected via CLNT_CLOSE() and then
CLNT_RELEASE().
Once clnt_vc_destroy() is called the daemon will be notified that the TLS
socket can be closed by the daemon, bringing so_count to 1 and then final
sorele() will bring it to 0 and free.
--
Gleb Smirnoff