jenkins bhyve vms crashing and burning after several days of use

Neel Natu neelnatu at gmail.com
Thu Jun 26 22:43:15 UTC 2014


Hi Sean,

On Thu, Jun 26, 2014 at 3:23 PM, Sean Bruno <sbruno at ignoranthack.me> wrote:
> On Thu, 2014-06-26 at 15:00 -0700, Neel Natu wrote:
>> Hi Sean,
>>
>> On Thu, Jun 26, 2014 at 2:46 PM, Sean Bruno <sbruno at ignoranthack.me> wrote:
>> > On Thu, 2014-06-26 at 14:42 -0700, Sean Bruno wrote:
>> >> so, we're seeing the bhyve vms running in the freebsd cluster for
>> >> jenkins crashing and burning after a couple of days of use.
>> >>
>> >> vm exit[9]
>> >> reason          VMX
>> >> rip             0x0000000029286336
>> >> inst_length     3
>> >> status          0
>> >> exit_reason     49
>> >> qualification   0x0000000000000000
>> >> inst_type       0
>> >> inst_error      0
>> >>
>> >>
>> >> It looks like we have an active core file on havoc.ysv if you have a
>> >> moment to look at it:
>> >>
>> >> http://people.freebsd.org/~sbruno/bhyve.core
>> >>
>> >> FreeBSD havoc.ysv.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #2
>> >> r267362: Wed Jun 11 14:56:34 UTC 2014
>> >> sbruno at havoc.freebsd.org:/usr/obj/usr/src/sys/HAVOC  amd64
>> >>
>> >
>> > Also, from chaos.ysv
>> >
>> > http://people.freebsd.org/~sbruno/bhyve.core.chaos
>> >
>> > FreeBSD chaos.ysv.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1
>> > r267362: Wed Jun 11 15:50:24 UTC 2014
>> > sbruno at chaos.ysv.freebsd.org:/usr/obj/usr/src/sys/CHAOS  amd64
>> >
>>
>> Can you tell us the processor and memory configuration on havoc and chaos?
>>
>> Also, could you execute the following commands on havoc:
>>
>> # bhyvectl --vm=vmname --cpu=9 --get-vmcs-guest-physical-address
>> -- this will output the offending guest physical address that
>> triggered the EPT misconfiguration
>>
>> # bhyvectl --vm=vmname --get-gpa-pmap=<gpa_from_above>
>> -- this will output the page table entries in the EPT that map to the
>> offending GPA
>>
>> Hopefully that provides us with something to work with.
>>
>> best
>> Neel
>>
>> >
>
> chaos:
> CPU: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz (2200.05-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x206d6  Family=0x6  Model=0x2d  Stepping=6
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> Features2=0x1fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
>   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>   AMD Features2=0x1<LAHF>
>   TSC: P-state invariant, performance statistics
> avail memory = 66298322944 (63227 MB)
>
> havoc:
> FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
> CPU: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz (2400.14-MHz
> K8-class CPU)
>   Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> Features2=0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI>
>   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>   AMD Features2=0x1<LAHF>
>   TSC: P-state invariant, performance statistics
> avail memory = 16571621376 (15803 MB)
>

Thanks, we'll see if there are relevant errata for these processors.

>
> There appear to be three vms running on havoc:
> root at havoc.ysv:/home/sbruno # bhyvectl --vm=vm1 --cpu=9
> --get-vmcs-guest-physical-address
> gpa[9]          0x0000000000000000
> root at havoc.ysv:/home/sbruno # bhyvectl --vm=vm2 --cpu=9
> --get-vmcs-guest-physical-address
> gpa[9]          0x0000000000000000
> root at havoc.ysv:/home/sbruno # bhyvectl --vm=vm3 --cpu=9
> --get-vmcs-guest-physical-address
> gpa[9]          0x0000000000000000
>
> root at havoc.ysv:/home/sbruno # bhyvectl --vm=vm1 --cpu=9
> --get-gpa-pmap=0x0000000000000000
> gpa 0: 0x300002c936e007 0x300002c9353007 0x300002c9352007 0
>
> root at havoc.ysv:/home/sbruno # bhyvectl --vm=vm2 --cpu=9
> --get-gpa-pmap=0x0000000000000000
> gpa 0: 0x30000286cb0007 0x300003ad105007 0x3000019b1fd007 0
>
> root at havoc.ysv:/home/sbruno # bhyvectl --vm=vm3 --cpu=9
> --get-gpa-pmap=0x0000000000000000
> gpa 0: 0x300002c9348007 0x300002c9339007 0
>
>
> But there's no information available on chaos at the moment as there are
> no active vms running.
>

Sorry, I should explained a bit more.

After a bhyve(8) exits because of the EPT misconfiguration error there
are breadcrumbs left over in the VMCS as well as the nested page
tables. We can use them to diagnose what happened.

The bhyvectl commands above should be executed after the VM exits but
before it is restarted again. Once it restarts, the breadcrumbs get
written over and are of no use.

The "--vm=<vmname>" passed to the bhyvectl command should be of the
virtual machine that crashed.
The "--cpu=<vcpuid>" passed to the bhyvectl command should be the
vcpuid that detected the EPT misconfiguration. The reason I used '9'
as an example above was because you saw this on the console:

vm exit[9]
reason          VMX
rip             0x0000000029286336

Hope that helps.

best
Neel

> sean
>


More information about the freebsd-virtualization mailing list