Recent MFC to 7 causes crash on VMware ESXi
John Baldwin
jhb at freebsd.org
Mon Feb 8 19:42:44 UTC 2010
On Monday 08 February 2010 1:51:46 pm Kostik Belousov wrote:
> On Mon, Feb 08, 2010 at 01:15:06PM -0500, John Baldwin wrote:
> > On Monday 08 February 2010 11:06:00 am Kostik Belousov wrote:
> > > On Mon, Feb 08, 2010 at 10:32:37AM -0500, John Baldwin wrote:
> > > > On Monday 08 February 2010 9:56:36 am Kostik Belousov wrote:
> > > > > On Mon, Feb 08, 2010 at 09:49:00AM -0500, John Baldwin wrote:
> > > > > > On Saturday 06 February 2010 4:47:16 pm Tom McLaughlin wrote:
> > > > > > > John Baldwin wrote, On 02/05/2010 08:27 AM:
> > > > > > > > On Thursday 04 February 2010 10:00:55 pm Tom McLaughlin wrote:
> > > > > > > >> Hi all, a recent MFC to 7-STABLE has started to cause issues
for
> > my VMs
> > > > > > > >> on VMware ESXi 3.5u4. After loading the mpt driver for the
LSI
> > disk
> > > > > > > >> controller the VM just shuts off. The workaround is to
change
> > the disk
> > > > > > > >> controller to the BusLogic type. Still, it used to work up
until
> > last
> > > > > > > >> week. The change was made around January 26th and based on
the
> > commits
> > > > > > > >> that day I'm guessing it's either r203047 or r203073
> > > > > > > >>
> > > > > > > >> I have the same issue with both amd64 and i386 VMs. This
affects
> > HEAD
> > > > > > > >> and 8-STABLE as well and first affected HEAD over the summer.
(I
> > just
> > > > > > > >> worked around it and went about my business at the time. :-/)
> > I've
> > > > > > > >> attached a dmesg from a kernel before the problem and one
from
> > after it
> > > > > > > >> started.
> > > > > > > >
> > > > > > > > What if you set 'hw.clfush_disable=1' from the loader?
> > > > > > > >
> > > > > > >
> > > > > > > Yes, that corrected it on all my VMs. I've talked to people on
ESXi
> > 4
> > > > > > > and they do not see the problem. I have yet to try 3.5u5 to see
if
> > this
> > > > > > > is a non-issue. 3.5 will be supported for awhile longer from
> > VMware.
> > > > > > > I'm going to try upgrading the box during the week.
> > > > > >
> > > > > > I believe folks had to do this on HEAD/8.x as well. Perhaps we
can
> > > > > > automatically disable clflush if we are executing under VMware or
Xen:
> > > > > >
> > > > > > Index: amd64/amd64/initcpu.c
> > > > > >
===================================================================
> > > > > > --- amd64/amd64/initcpu.c (revision 203430)
> > > > > > +++ amd64/amd64/initcpu.c (working copy)
> > > > > > @@ -177,17 +177,16 @@
> > > > > > if ((cpu_feature & CPUID_CLFSH) != 0)
> > > > > > cpu_clflush_line_size = ((cpu_procinfo >> 8) & 0xff) * 8;
> > > > > > /*
> > > > > > - * XXXKIB: (temporary) hack to work around traps generated when
> > > > > > - * CLFLUSHing APIC registers window.
> > > > > > + * XXXKIB: (temporary) hack to work around traps generated
> > > > > > + * when CLFLUSHing APIC registers window under virtualization
> > > > > > + * environments.
> > > > > > */
> > > > > > TUNABLE_INT_FETCH("hw.clflush_disable", &hw_clflush_disable);
> > > > > > - if (cpu_vendor_id == CPU_VENDOR_INTEL && !(cpu_feature &
CPUID_SS)
> > &&
> > > > > > - hw_clflush_disable == -1)
> > > > > > + if (vm_guest != 0 /* VM_GUEST_NO */ && hw_clflush_disable == -1)
> > > > > > cpu_feature &= ~CPUID_CLFSH;
> > > > > > /*
> > > > > > * Allow to disable CLFLUSH feature manually by
> > > > > > - * hw.clflush_disable tunable. This may help Xen guest on some
AMD
> > > > > > - * CPUs.
> > > > > > + * hw.clflush_disable tunable.
> > > > > > */
> > > > > > if (hw_clflush_disable == 1)
> > > > > > cpu_feature &= ~CPUID_CLFSH;
> > > > > > Index: i386/i386/initcpu.c
> > > > > >
===================================================================
> > > > > > --- i386/i386/initcpu.c (revision 203430)
> > > > > > +++ i386/i386/initcpu.c (working copy)
> > > > > > @@ -724,17 +724,16 @@
> > > > > > if ((cpu_feature & CPUID_CLFSH) != 0)
> > > > > > cpu_clflush_line_size = ((cpu_procinfo >> 8) & 0xff) * 8;
> > > > > > /*
> > > > > > - * XXXKIB: (temporary) hack to work around traps generated when
> > > > > > - * CLFLUSHing APIC registers window.
> > > > > > + * XXXKIB: (temporary) hack to work around traps generated
> > > > > > + * when CLFLUSHing APIC registers window under virtualization
> > > > > > + * environments.
> > > > > > */
> > > > > > TUNABLE_INT_FETCH("hw.clflush_disable", &hw_clflush_disable);
> > > > > > - if (cpu_vendor_id == CPU_VENDOR_INTEL && !(cpu_feature &
CPUID_SS)
> > &&
> > > > > > - hw_clflush_disable == -1)
> > > > > > + if (vm_guest != 0 /* VM_GUEST_NO */ && hw_clflush_disable == -1)
> > > > > > cpu_feature &= ~CPUID_CLFSH;
> > > > > > /*
> > > > > > * Allow to disable CLFLUSH feature manually by
> > > > > > - * hw.clflush_disable tunable. This may help Xen guest on some
AMD
> > > > > > - * CPUs.
> > > > > > + * hw.clflush_disable tunable.
> > > > > > */
> > > > > > if (hw_clflush_disable == 1)
> > > > > > cpu_feature &= ~CPUID_CLFSH;
> > > > >
> > > > > It might be better to "or" old condition, i.e. Intel without SS, and
> > > > > new one, vm_guest != 0, instead of replacing the old ?
> > > >
> > > > I thought the old condition only happened under VMware?
> > >
> > > Reports I got where from XEN.
> >
> > Ok. Those would also be covered under the vm_guest test as it is
> > non-zero for Xen, VMware, Parallels, etc.
>
> What I said was suggestion and not objection. Ignore me.
Were there any reports of problems with Intel CPUs that weren't under a
virtualization system? If so, we should keep the test, but my understanding
was that the test was only true under specific virtualization environments.
--
John Baldwin
More information about the freebsd-stable
mailing list