8.1-RC2 MCE caused by some LAPIC/clock changes? (was: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?)

Markus Gebert markus.gebert at hostpoint.ch
Sat Jul 17 18:35:23 UTC 2010


On 13.07.2010, at 16:02, Markus Gebert wrote:

> Unfortunately, I have not been able to get anything useful out the svn commit logs, which could explain this. Maybe someone else has an idea what could have changed between 7 and 8 to break it, and again between 8 and CURRENT to magically fix it again.

I tracked this down further. I couldn't easily downgrade my 8.1 installation to see when the problem was introduced because the zpool version used is 14. So I tried to figure out, when the problem was solved in CURRENT.

I started with the first possible revision that can boot off my v14 pool (r201143, Dec 28, zfs v14 commit). With this revision, I was able to trigger the MCE.

Then I took some later revision (rev206010, Apr 1, chosen randomly), and I couldn't reproduce the problem. I started narrowing the revisions down until I found out, that while on r202386 I'm still able to trigger the MCE, r202387 seems to solve the problem on CURRENT:

http://svn.freebsd.org/viewvc/base?view=revision&revision=202387

Since John Baldwin mentioned this problem could be timing related, it seems reasonable, that a clock-related change could be fix it. But this commit seems to have been MFC'd to 8-STABLE and 8.1 (at least as far as I can tell) along with some other changes to amd64 specific code. I thought that maybe these other changes that have been MFC'd could have reintroduced the problem later on, but so far I could not reproduce the problem with newer CURRENT revisions. So, I actually nailed this one done to a single commit on CURRENT, but still cannot tell what the actual difference is compared to 8-STABLE/8.1.

Any ideas how to proceed?


Markus


More information about the freebsd-stable mailing list