amd64/139614: Minidump fail when many interrupts fire

Andrew Brampton brampton+freebsd at gmail.com
Wed Oct 14 22:40:03 UTC 2009


>Number:         139614
>Category:       amd64
>Synopsis:       Minidump fail when many interrupts fire
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-amd64
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Oct 14 22:40:02 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Andrew Brampton
>Release:        FreeBSD 7 and FreeBSD 8
>Organization:
>Environment:
>Description:
There have been at least two discussions on the FreeBSD mailing lists over the past couple of years about minidumps failing due to interrupts being enabled. I couldn't find an existing PR, so to track this bug I'm creating one.

The problem is summed up by Ruslan Ermilov:
"Kernel minidumps on amd64 SMP can write beyond the bounds
of the configured dump device causing (as in our case) the
file system data following swap partition to be overwritten
with the dump contents.

The problem is that while we're in the process of dumping
mapped physical pages via a bitmap (in minidump_machdep.c),
other CPUs continue to work and may modify page mappings of
processes.  This in turn causes the modifications to
pv_entries, which in turn modifies the bitmap of pages to
dump.  As the result, we can dump more pages than we've
calculated, and since dumps are written to the end of the
dump device, we may end up overwriting it.

The attached patch mitigates the problem, but the real solution
seems to be to disable interrupts (there's an XXX about this
in kern_shutdown.c before calling doadump()), and stopping
other CPUs, so we don't modify page tables while we're dumping."[1]

This problem does not seem to be avoided by expanding your swap space[1], and it seems to hit those with interrupt heavy workloads, such as servers with lots of network traffic[2].

Hopefully someone will be able to find a suitable fix.

thanks

[1] http://lists.freebsd.org/pipermail/freebsd-current/2008-January/082752.html
[2] http://lists.freebsd.org/pipermail/freebsd-current/2008-June/086574.html
[3] http://lists.freebsd.org/pipermail/freebsd-current/2009-August/010599.html


>How-To-Repeat:
Panic your kernel and ensure a device is generating lots of interrupts, for example, a network card with packets being sent to it.
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-amd64 mailing list