Dual Xeon EM64T crashes reliably w/ 5.x amd64

Palle Girgensohn girgen at pingpong.net
Wed May 25 13:40:15 PDT 2005


When running our Dell 2850, dual xeon CPUs, in SMP mode with 5.4 (same 
w/ 5.3-stable), it will relibaly crash at least a couple of times per 
day. When crashin, the kernel panics (Fatal trap 12: page fault while 
in kernel mode) and the system will not reboot, neither will it save a 
core dump. I need to manually hit the big button to reboot.

This machine is very loaded, mostly due to some rather sloppy php 
scripts, that can be well optimized. Average load >1 most of the time, 
I'd say. Still, it's not a reason to panic, IMHO :)

I've built a uni-processor kernel, and now the machine is quite stable, 
but that's not a solution, of course.

I'm cc:ing Jon Kuster, since he describes exactly the same problem, 
with identical hardware. His machine is not as loaded, so in his case 
moving from four CPUs (two "real" + HTT) to two real (shutting down 
HTT) was enough to stop the crashes. For me, I must run UP.

So, I don't get any core dumps, the machine does not reboot 
automatically, and customers are really unhappy. I'm clueless and need 
help. What do I do? Don't say "linux"... :(


