git: d39f7430a6e1 - main - amd64: preserve %cr2 in NMI/MCE/DBG handlers.

Konstantin Belousov kostikbel at gmail.com
Sun Dec 27 22:45:15 UTC 2020


On Sun, Dec 27, 2020 at 03:13:09PM -0500, Andrew Gallatin wrote:
> On 12/27/20 6:14 AM, Konstantin Belousov wrote:
> > The branch main has been updated by kib:
> > 
> > URL: https://urldefense.com/v3/__https://cgit.FreeBSD.org/src/commit/?id=d39f7430a6e1da419d6e4fb871bca5ba7863f738__;!!OToaGQ!7EPo6uRRpq8kWDLzM05a4h158xFeRyJ9PhhE1j04Y5uZaHKskCoGhso0T717aEhpYQ$
> > 
> > commit d39f7430a6e1da419d6e4fb871bca5ba7863f738
> > Author:     Konstantin Belousov <kib at FreeBSD.org>
> > AuthorDate: 2020-12-25 21:58:43 +0000
> > Commit:     Konstantin Belousov <kib at FreeBSD.org>
> > CommitDate: 2020-12-27 10:59:33 +0000
> > 
> >      amd64: preserve %cr2 in NMI/MCE/DBG handlers.
> >      These handlers could interrupt code which has interrupts disabled,
> >      and if a spurious page fault occurs during exception handler run,
> >      we get clobbered %cr2 in higher level stack.
> >      This is mostly a speculation, but it is based on hints from good sources.
> 
> I assume this is based around the mystery panic I was talking about on irc
> last week.
Yes, but it is not supposed to fix it, the hope is that it might reduce
amount of the smoke around it.

> 
> Can you please explain what a spurious page fault is?  A fault where
> there is a valid mapping, but we somehow take a fault for no reason?
> How often does this happen?

Hopefully spurious faults occur rarely, they happens due to the bugs
in CPUs. It was relatively common for older models of Intel' CPUs some
time ago so that amd64 trap.c has special handling for page faults
that should not occur according to the kernel bookkeeping.  Look for
TDP_RESETSPUR flag and its use in trap_pfault() if interested.  In short,
we retry the faulted instruction and fall to normal fault handling if it
faulted again on retry.

In fact I do not think that this code can trigger during NMI.  The patch
intent was to cover a case that was immediately asked about when I described
the paradoxical %cr2 != %rip fault to some people.  If the panic can be
repeated, at least we will know for sure that it is not NMI handler
corrupting %cr2 and can show evidence to relevant channel.


More information about the dev-commits-src-all mailing list