Enabling MCA causes system hangs

Andriy Gapon avg at icyb.net.ua
Fri Sep 10 07:29:04 UTC 2010


on 10/09/2010 05:36 Daniel O'Connor said the following:
> Hi, I recently tried enabling MCA (ie w.mca.enabled=1 in loader.conf) on an
> 8.0-STABLE system and found that it would cause the system to hang after a
> few minutes of uptime.
> 
> The screen would go black and the monitor would turn off (regardless of
> wether it was in X or not) and only a hard reset would bring it back.
> 
> Also I found that quite often I had to power cycle the whole PC or the BIOS
> wouldn't detect the hard disks on boot(!) after a hang.
> 
> uname is.. FreeBSD midget.dons.net.au 8.0-STABLE FreeBSD 8.0-STABLE #6
> r202903M: Sun Jan 24 13:45:11 CST 2010
> darius at midget.dons.net.au:/usr/obj/usr/src/sys/MIDGET  amd64
> 
> The motherboard is a Gigabyte GA-MA785GM-US2H with an Athlon II X2 240 CPU &
> 4Gb of RAM.

Do you also have superpages enabled (vm.pmap.pg_ps_enabled)?
If so, please try to turn them off and report back if that helps.

If not, then it's a tougher situation.
What you see looks like a consequence of HyperTransport sync flood, which is a
way to handle certain errors detected by CPU.  Essentially it means that all
HyperTransport communications are frozen.  A system just hangs.

My impression is that consumer-type systems are often configured to produce sync
flood to stop error propagation in situations where more 'serious' systems would
report machine check exception (MCE), probably an uncorrectable one.

NOTE: the following may hurt your system and your data!
Please stop reading if you are unsure if you can handle that!

You may try to investigate the sync flood situation further by checking the
following bit in CPU configuration:

F3x180 Extended NB MCA Configuration Register
21 SyncFloodOnCpuLeakErr: sync flood on CPU leak error enable.

You can examine current value with a command like the following:
$ pciconf -r pci0:0:24:3 0x180

Where pci0:0:24:3 is PCI handle that corresponds to the device reported as
follows by pciconf -lv:
'(Family 10h) Athlon64/Opteron/Sempron Miscellaneous Control'

If the bit is set, you can try to flip it off (using pciconf -w) and see how
your system behaves when the MCA condition strikes.

Be careful and cautious.

-- 
Andriy Gapon


More information about the freebsd-stable mailing list