reproducible panic in netisr
rwatson at FreeBSD.org
Thu Aug 6 14:11:27 UTC 2009
On Thu, 6 Aug 2009, Larry Rosenman wrote:
> On Thu, 6 Aug 2009, Robert Watson wrote:
>> On Tue, 4 Aug 2009, Navdeep Parhar wrote:
>>>>> This occurs on today's HEAD + some unrelated patches. That makes it
>>>>> 8.0BETA2+ code. I haven't tried older builds.
>>>> We have finally been able to reproduce this ourselves yesterday and
>>> Well, it happens every single time on all of my amd64 machines. After I'd
>>> already sent my email I noticed that the netisr mutex has an odd address
>>> (pun intended :-))
>> Heh, indeed. We just spotted the same result here. In this case it's
>> causing a panic because it leads to a non-atomic read due to mtx_lock
>> spanning a cache line boundary, followed shortly by a panic because it's
>> not a valid thread pointer when it's dereferenced, as we get a fractional
> Do we have an ETA for a testable patch?
RSN, I'm afraid. We can eliminate the effect by reverting the use of DPCPU in
netisr.c (basically reverting to pre-r195019 of netisr.c). The interesting
question is where the problem originates -- is gcc/ld/etc not laying out the
elf section properly, or are the MD parts not providing an aligned base?
There are also probably issues in the DPCPU handling of modules along similar
lines, but first things first.
We'll be adding assertions of alignment to the various lock init functions to
catch this happening explicitly in the future. There are probably one or two
other places where we have very strong alignment requirements on i386/amd64,
such as the td_ucred pointer that we check for change on system calls/traps to
see if we need to refresh the thread's credential from the process credential.
Robert N M Watson
University of Cambridge
More information about the freebsd-current