[follow-up] FreeBSD/amd64 r195146 to r195848, fatal trap 12 under network load

Tue Aug 4 15:18:08 UTC 2009

Kamigishi Rei wrote:
> Kamigishi Rei wrote:
>> Revisions mentioned are those which were tested by me; r195849+ has 
>> the corruption padded somewhere else so it might produce a panic with 
>> a different set of options. For reference, my test kernel uses a 
>> GENERIC config from May 09 snapshot without WITNESS and with 
>> IPFIREWALL, IPFIREWALL_DEFAULT_TO_ACCEPT and DEVICE_POLLING enabled.
> r195981 (latest checkout) traps with the *GENERIC* kernel (with WITNESS 
> enabled). Same backtrace, same cause, and UP systems are not affected 
> again.
> Apparently, my diagnostics patch from the previous message seems to pad 
> the corruption somewhere, so I can't use it to check lo_witness or other 
> fields of nws_mtx at the time when mtx_lock gets corrupted.
> 
> Trap can be triggered with "ping -f -s 65507 localhost", iperf (just 
> "iperf -c localhost" works for me), or by generating some high-speed 
> network throughput (even a mysql query over localhost will do as we have 
> a race here). Running ping will mostly trigger the trap inside 
> swi_net(); iperf - inside netisr_queue_internal().
> 
> I will be grateful if someone could provide me some information on how 
> to further debug it. Currently, I suspect that there's something about 
> handling modspace (incorrect dereference somewhere, or something like 
> that).

For the benefit of the list, we've finally got this reproduced on a 
netperf cluster node after much gnashing of teeth. Stay tuned for updates.

Cheers,
Lawrence