panic in tcp_do_segment()

Fri Apr 5 11:10:11 UTC 2013

Hey Rick,

I believe Juan and I have root caused this crash recently.  The t_state =
0x1, TCPS_LISTEN, in the link provided at the time of the assertion.

In tcp_input(), if we're in TCPS_LISTEN, SO_ACCEPTCONN should be set on the
socket and we should never enter tcp_do_segment() for this state.  I think
if you look in your corefile, you'll see the socket *doesn't* have this
flag set in your case.

1043         /*
1044          * When the socket is accepting connections (the INPCB is in
LISTEN
1045          * state) we look into the SYN cache if this is a new
connection
1046          * attempt or the completion of a previous one.  Because listen
1047          * sockets are never in TCPS_ESTABLISHED, the V_tcbinfo lock
will be
1048          * held in this case.
1049          */
1050         if (so->so_options & SO_ACCEPTCONN) {
1051                 struct in_conninfo inc;
1052
1053                 KASSERT(tp->t_state == TCPS_LISTEN, ("%s: so accepting
but "
1054                     "tp not listening", __func__));
...
1356                 syncache_add(&inc, &to, th, inp, &so, m, NULL, NULL);
1357                 /*
1358                  * Entry added to syncache and mbuf consumed.
1359                  * Everything already unlocked by syncache_add().
1360                  */
1361                 INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
1362                 return;
1363         }
...
1384         /*
1385          * Segment belongs to a connection in SYN_SENT, ESTABLISHED or
later
1386          * state.  tcp_do_segment() always consumes the mbuf chain,
unlocks
1387          * the inpcb, and unlocks pcbinfo.
1388          */
1389         tcp_do_segment(m, th, so, tp, drop_hdrlen, tlen, iptos,
ti_locked);

I think this has to do with this patch in soclose() where SO_ACCEPTCONN is
being turned off in soclose().  I suspect if you look at the other threads
in your corefile, you'll see one at this point in soclose() operating on
the same socket as the one in the tcp_do_segment() thread.

http://svnweb.freebsd.org/base?view=revision&revision=243627

 817                 /*
 818                  * Prevent new additions to the accept queues due
 819                  * to ACCEPT_LOCK races while we are draining them.
 820                  */
 821                 so->so_options &= ~SO_ACCEPTCONN;
 822                 while ((sp = TAILQ_FIRST(&so->so_incomp)) != NULL) {
 823                         TAILQ_REMOVE(&so->so_incomp, sp, so_list);
 824                         so->so_incqlen--;
 825                         sp->so_qstate &= ~SQ_INCOMP;
 826                         sp->so_head = NULL;
 827                         ACCEPT_UNLOCK();
 828                         soabort(sp);
 829                         ACCEPT_LOCK();
 830                 }

Juan had evaluated this code path and it seemed safe to just drop the
packet in this case:

+     /*
+      * In closing down the socket, the SO_ACCEPTCONN flag is removed to
+      * prevent new connections from being established.  This means that
+      * any frames in that were in the midst of being processed could
+      * make it here.  Need to just drop the frame.
+      */
+     if (TCPS_LISTEN == tp->t_state) {
+         TCPSTAT_INC(tcps_rcvwhileclosing);
+         goto drop;
+     }
      KASSERT(tp->t_state > TCPS_LISTEN, ("%s: TCPS_LISTEN",
          __func__));

Or, if there's someone more familiar with the locking in these paths, they
may be able to come up with a way to restructure the locks and logic to
close this window.

-Matt

On Thu, Apr 4, 2013 at 6:33 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:

> Hi,
>
> When pho@ was doing some NFS testing, he got the
> following crash, which I can't figure out. (As far
> as I can see, INP_WLOCK() is always held when
> tp->t_state = TCPS_CLOSED and it is held from before
> the test for TCPS_CLOSED in tcp_input() up until
> the tcp_do_segment() call. As such, I don't see how
> tp->t_state can be TCPS_CLOSED, but that seems to
> be what causes the panic?)
>
> The "umount -f" will result in:
>   soshutdown(so, SHUT_WR);
>   soclose(so);
> being done by the krpc on the socket.
>
> Anyone have any ideas on this?
> pho@ wrote:
> > I continued running the NFS tests and got this "panic: tcp_do_segment:
> > TCPS_LISTEN". It's the second time I get this panic with the same test
> > scenario, so it seems to be reproducible. The scenario is "umount -f"
> > of a mount point that is very active.
> >
> > http://people.freebsd.org/~pho/stress/log/kostik555.txt
>
> Thanks in advance for any help, rick
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>