AMD Erratum 383 crashes FreeBSD 9-Stable

Sat Mar 17 19:21:59 UTC 2012

On Sat, Mar 17, 2012 at 12:10 PM, Richard Yao <ryao at cs.stonybrook.edu>wrote:

> On 03/17/12 13:08, Richard Yao wrote:
> > Dear FreeBSD Developers:
> >
> > I used the ZFS Guru LiveCD to install FreeBSD 9 in KVM on a host system
> > with an AMD Thuban processor (K10h). I then proceeded to compile perl
> > and the VM crashed. Linux's dmesg gave me the following hint as to the
> > cause:
> >
> > [ 3568.234654] KVM: Guest triggered AMD Erratum 383
> >
> > I also tried installing Gentoo Prefix, a userland package manager like
> > NetBSD pkgsrc, and the VM also crashed with the same message when
> > compiling the first component. AMD has documented this issue, with a
> > workaround for hypervisors and a statement saying that they won't fix it:
> >
> > "If system software performs uncommon methods to change the page size of
> > an active page table that is valid, the CPU core may, under a highly
> > specific and detailed set of conditions, form duplicate TLB entries for
> > a single linear address. The CPU core will machine check if this page is
> > then accessed prior to it being invalidated from the TLB."
> >
> > http://support.amd.com/us/Embedded_TechDocs/41322.pdf
> >
> > Has anyone done anything to workaround this issue? I have a Gentoo
> > Hardened VM running on this machine which has no problem compiling
> > software, so I am sure that some sort of page table workaround is
> possible.
> >
> > Yours truly,
> > Richard Yao
> >
>
> I was tired when I wrote that, so my eyes seem to have skipped some
> advice from AMD on how to workaround this in the kernel:
>
> "Affected software must ensure that page sizes are only increased or
> decreased after the entry is invalidated and flushed out of all TLBs.
> When flushing multiple entries from the TLB, software may wish to use a
> single MOV CR3 value to invalidate the TLB instead of repetitive INVLPG
> instructions"
>
> Also, I am not on the mailing list, so please CC replies to me.
>
>
When the FreeBSD kernel detects that it is running on an affected
processor, it automatically enables the recommended workaround.  However,
because you are running within a virtual machine, the automatic detection
may not be working.  Alternatively, you may be using a newer processor
revision that still suffers from the bug, but the kernel doesn't enable the
workaround for.  Can you tell us how the FreeBSD guest sees the underlying
processor, e.g., the first few lines of dmesg from the guest?

Alan