panic: m_copym, length > size of mbuf chain
Don Lewis
truckman at FreeBSD.org
Sat Jul 10 15:25:57 PDT 2004
On 10 Jul, Daniel Lang wrote:
> Hi Robert,
>
> Robert Watson wrote on Wed, Jul 07, 2004 at 12:24:59PM -0400:
> [..]
>> Just to try ruling out possibilities -- have you run an extensive set of
>> hardware diagnostics? Most server class hardware ships with a decent
>> diagnostics disk, and I'm sure we can find some for you in the event your
>> hardware didn't come with some. While it's quite possibly a software
>> problem, tracking hardware problems using software symptoms constitutes
>> undesirable pain and so it wouldn't hurt to give that a spin. I remember
>> seing your earlier e-mails about running with WITNESS increasing the
>> chances of pain -- this could be a bug in WITNESS as you suggest, or it
>> could be that WITNESS increases the opportunities for a variety of locking
>> related races by increasing the cost of lock/unlock operations.
> [..]
>
> So I come back to the issue. As I already wrote, I guess I can
> rule out hardware problems now. I did a very thorough test with
> the Dell diagnosis utilities which showed no problems.
>
> Also, after John's patch I did not see any WITNESS related
> problems (so far) again. But I had the m_copy panic again
> (see subject). This time I did file a PR and did some more detailed
> gdb analysis. It is all documented at:
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/68889
>
> I am puzzled, because the stack frame on entering m_copym has
> 0x0 as first argument (m), however in the previous frame
> when m_copy() is called, the struct mbuf* argument is valid.
m_copym() overwrites its first and third arguments as it walks the mbuf
chain.
struct mbuf *
m_copym(struct mbuf *m, int off0, int len, int wait)
{
[snip]
while (off > 0) {
KASSERT(m != NULL, ("m_copym, offset > size of mbuf chain"));
if (off < m->m_len)
break;
off -= m->m_len;
m = m->m_next;
}
[snip]
while (len > 0) {
if (m == NULL) {
KASSERT(len == M_COPYALL,
("m_copym, length > size of mbuf chain"));
break;
}
[snip]
if (len != M_COPYALL)
len -= n->m_len;
off = 0;
m = m->m_next;
np = &n->m_next;
}
The interesting bits would seem to be in stack frame 11, tcp_output().
Check the arguments being passed to m_copym():
#10 0xc0551805 in m_copym (m=0x0, off0=737, len=1222, wait=1)
at /usr/src/sys/kern/uipc_mbuf.c:380
We don't know the original value of len that was passed to m_copym(),
because it could have been decremented if m_copym() iterated a few times
before it paniced, but it was at least 1222. If we add that to off0,
then the length of original mbuf chain passed to m_copym() should have
been at least 1959.
Now take look at the call to m_copy():
#11 0xc059ed5a in tcp_output (tp=0xc3f50000)
at /usr/src/sys/netinet/tcp_output.c:748
748 m->m_next = m_copy(so->so_snd.sb_mb, off, (int) len);
It would be interesting to see the value of len in stack frame 11, so
that we know the original value passed to m_copym().
Also the contents of *so is interesting.
(kgdb) p *so
[snip]
sb_cc = 975, sb_hiwat = 33580, sb_mbcnt = 1536, sb_mbmax = 262144,
I'm not sure if sb_cc or sb_mbcnt is the important member, but I think
it is sb_cc. I think this means that the mbuf chain contains 975 bytes
of data but tcp_output() is telling m_copy() to copy (at least) 1222
bytes of data starting at offset 737.
It looks to me like tcp_output() is passing a bogus len value to
m_copy().
More information about the freebsd-current
mailing list