Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

Thu Mar 29 16:53:03 UTC 2012

On Thu, Mar 29, 2012 at 11:27 AM, Mark Felder <feld at feld.me> wrote:

> On Thu, 29 Mar 2012 10:55:36 -0500, Hans Petter Selasky <hselasky at c2i.net>
> wrote:
>
>>
>> It almost sounds like the lost interrupt issue I've seen with USB EHCI
>> devices, though disk I/O should have a retry timeout?
>>
>> What does "wmstat -i" output?
>>
>> --HPS
>>
>
>
> Here's a server that has a week uptime and is due for a crash any hour now:
>
> root at server:/# vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                          34          0
> irq6: fdc0                             9          0
> irq15: ata1                           34          0
> irq16: em1                        778061          1
> irq17: mpt0                     19217711         31
> irq18: em0                     283674769        460
> cpu0: timer                    246571507        400
> Total                          550242125        892
>
>

Not so long ago, VMware implemented a clever scheme for reducing the
overhead of virtualized interrupts that must be delivered by at least some
(if not all) of their emulated storage controllers:

http://static.usenix.org/events/atc11/tech/techAbstracts.html#Ahmad

Perhaps, there is a bad interaction between this scheme and FreeBSD's mpt
driver.

Alan