SCHEDULE and high load situations
Robert Watson
rwatson at FreeBSD.org
Thu Aug 12 10:17:59 PDT 2004
On Thu, 12 Aug 2004, Don Lewis wrote:
> > (gdb) l *unp_connect2+0x2a
> > 0x1f93 is in unp_connect2 (/usr/src/sys/kern/uipc_usrreq.c:892).
> > 887 UNP_LOCK_ASSERT();
> > 888
> > 889 if (so2->so_type != so->so_type)
> > 890 return (EPROTOTYPE);
> > 891 unp2 = sotounpcb(so2);
> > 892 unp->unp_conn = unp2;
> > 893 switch (so->so_type) {
> > 894
> > 895 case SOCK_DGRAM:
> > 896 LIST_INSERT_HEAD(&unp2->unp_refs, unp, unp_reflink);
>
> Looks like unp is NULL here.
>
> My first suspicion would be the recent memory allocation changes that
> affected the type safety of various dynamically allocated data
> structures, though I'm not sure that fits the symptoms.
Hmm. I thought unix domain sockets weren't affected by those changes, but
could be wrong.
However, it does look like a null pointer dereference, and in particular,
a possible race between two threads accessesing either the same end or
opposite ends of a unix domain socket. Martin's dropping a core dump,
kernel, and source tree for me to look at. Some early debugging shows
that the unix domain socket is a datagram oriented socket, and that the
SS_NOFDREF flag is set in so->so_state, suggesting maybe we have a race
between connect() and close() in the application. However, I need to sit
down with the core for a bit. I would have expected a more likely race to
be between two unix domain socket endpoints, since most applications don't
mess up with file descriptors, I would think. In any case, more details
soon.
I'm guessing the race was present previously, but the move to
ADAPTIVE_GIANT has caused it to trigger more frequently on Martin's
system.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org Principal Research Scientist, McAfee Research
More information about the freebsd-current
mailing list