Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

Adrian Chadd adrian at freebsd.org
Fri May 25 00:56:01 UTC 2012


Hi,

You guys now absolutely, positively have enough information for a PR.

It's still not clear whether it's a device/interrupt layer issue in
FreeBSD, or whether vmware is doing something wrong with how it
implements shared interrupts, or a bit of both..

Adrian

On 24 May 2012 13:54, dane foster <dene at ilovedene.com> wrote:
> Hey all,
>
> On 25/05/2012, at 1:47 AM, Mark Felder wrote:
>
>> On Wed, 23 May 2012 17:30:40 -0500, Adrian Chadd <adrian at freebsd.org> wrote:
>>
>>> Hi,
>>>
>>> can you please, -please- file a PR? And place all of the above
>>> information in it so we don't lose it?
>>>
>>
>> I'd be glad to post a PR and assist in helping to get it permanently fixed. I certainly don't want this data to get lost and honestly our business uses FreeBSD on VMWare so much that we really need a permanent fix as much as anyone else :-)
>>
>> The reason I've hesitated to post a PR so far is that I didn't have any truly useful or concrete evidence of where the problem lies. After Dane Foster contacted me and told me he could recreate the crash on demand with his workload it was easier to narrow things down. The suggestion that it was an interrupts issue (by possibly Bjoern Zeeb?) and Dane's discovery that his crashes ceased when em0 and mpt0 share an IRQ, but em0 is completely unused was starting to prove there is some strong evidence here in favor of the interrupts issue.
>>
>> Dane, what's the status on your end? Has your fix still been successful? Is it also stable if you simply set hint.mpt.0.msi_enable="1" ?
>>
>
> The situation I've got that's stable now is:
>
> hw.pci.enable_msi="0"
> hw.pci.enable_msix="0"
>
> in /boot/loader.conf
>
> and:
>
> samael:~:% vmstat -i                                                  [ 6:31PM]
> interrupt                          total       rate
> irq1: atkbd0                           6          0
> irq18: em0 mpt0                  3061100         15
> irq19: em1                       6891706         35
> cpu0: timer                    166383735        868
> cpu1: timer                    166382123        868
> cpu3: timer                    166382123        868
> cpu2: timer                    166382121        868
> Total                          675482914       3525
>
> Not using em0. This works for 8 (FreeBSD samael.slush.ca 8.3-STABLE FreeBSD 8.3-STABLE #1: Mon May  7 11:51:03 NZST 2012     root at samael.slush.ca:/usr/obj/usr/src/sys/DENE  amd64).
>
> Neither of those settings on their own seem to stop it from happening.
>
> The 9 box I've tried this on still hangs almost every time i run handbrake, no matter whether MSI/MSIX is enabled, or I have separate IRQs for mpt0 and em0/1
>
> I can cause the hang mostly on demand, but not quite sure what information to provide from the hung system. If somebody can let me know what they need, including root access, I can make that happen.
>
> Cheers,
>
> Dane
>
>
>
>>
>> Thanks!
>
>
>
>


More information about the freebsd-questions mailing list