[PATCH] Machine Check Architecture on amd64
Suleiman Souhlal
ssouhlal at FreeBSD.org
Tue Jun 26 07:10:43 UTC 2007
On Jun 25, 2007, at 11:55 PM, Ed Schouten wrote:
> * Suleiman Souhlal <ssouhlal at FreeBSD.org> wrote:
>> Hi,
>>
>> I have a simple patch for amd64 that uses the Machine Check
>> Architecture/Exceptions on most recent x86 CPUs to detect memory
>> errors:
>>
>> http://people.freebsd.org/~ssouhlal/testing/mce-20070621.diff
>>
>> It will report uncorrected and corrected errors (the latter, only
>> if sysctl
>> machdep.mce.log_corrected=1).
>> You can ask the kernel to panic if it gets an uncorrected error
>> by setting
>> machdep.mce.panic_on_uc=1.
>> All this can be disabled by setting the machdep.mce.enable
>> tunable to 0. I'm
>> still not sure if I want this enabled by default, as I don't have
>> any Intel
>> machines to test this on, but I have tested it on Opteron (both
>> corrected
>> and uncorrected errors).
>>
>> I would appreciate it if someone would try this, especially if
>> you have
>> Intel machines with bad RAM.
>>
>> Comments are welcome.
>
> | /*
> | * Uncorrected MCEs will generate a #MC, while corrected
> | * don't, so we have to periodically poll for them.
> | */
>
> What about adding an option to only print uncorrected MCE's? That's
> the
> most interesting data and we can get that without using a kthread,
> right?
sysctl machdep.mce.log_corrected=0 machdep.mce.poll_delay=0 will stop
reporting the corrected errors and will stop the kthread (but won't
actually kill it (I guess I'll fix that before I commit the patch)).
Thanks,
-- Suleiman
More information about the freebsd-current
mailing list