Enabling MCA causes system hangs

Daniel O'Connor doconnor at gsoft.com.au
Fri Sep 10 07:46:47 UTC 2010


On 10/09/2010, at 16:58, Andriy Gapon wrote:
>> The motherboard is a Gigabyte GA-MA785GM-US2H with an Athlon II X2 240 CPU &
>> 4Gb of RAM.
> 
> Do you also have superpages enabled (vm.pmap.pg_ps_enabled)?
> If so, please try to turn them off and report back if that helps.

Yes, they are - I will try without.


> If not, then it's a tougher situation.
> What you see looks like a consequence of HyperTransport sync flood, which is a
> way to handle certain errors detected by CPU.  Essentially it means that all
> HyperTransport communications are frozen.  A system just hangs.
> 
> My impression is that consumer-type systems are often configured to produce sync
> flood to stop error propagation in situations where more 'serious' systems would
> report machine check exception (MCE), probably an uncorrectable one.

Ahh..
The system does seem to operate normally without MCA and I haven't noticed any data corruption issues. FWIW I am using ZFS on this box and haven't seen any complaints about corrupt files.

> NOTE: the following may hurt your system and your data!
> Please stop reading if you are unsure if you can handle that!

Woooh, sounds fun :)

> You may try to investigate the sync flood situation further by checking the
> following bit in CPU configuration:
> 
> F3x180 Extended NB MCA Configuration Register
> 21 SyncFloodOnCpuLeakErr: sync flood on CPU leak error enable.
> 
> You can examine current value with a command like the following:
> $ pciconf -r pci0:0:24:3 0x180
> 
> Where pci0:0:24:3 is PCI handle that corresponds to the device reported as
> follows by pciconf -lv:
> '(Family 10h) Athlon64/Opteron/Sempron Miscellaneous Control'
> 
> If the bit is set, you can try to flip it off (using pciconf -w) and see how
> your system behaves when the MCA condition strikes.

It does look like it is set:

hostb4 at pci0:0:24:3:     class=0x060000 card=0x00000000 chip=0x12031022 rev=0x00 hdr=0x00
    vendor     = 'Advanced Micro Devices (AMD)'
    device     = '(Family 10h) Athlon64/Opteron/Sempron Miscellaneous Control'
    class      = bridge
    subclass   = HOST-PCI

[midget 17:09] /usr/src/sys >sudo pciconf -r pci0:0:24:3 0x180
00f003e2 

Which is..
   0    0    f    0    0    3    e    2
0000 0000 1111 0000 0000 0011 1110 0010
|    |    |    |    |    |    |    |  |
31   27   23   19   15   11   7    3  0

> Be careful and cautious.

Thanks, I'll let you know how I go!

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C








More information about the freebsd-stable mailing list