kern/152250: [patch] Kernel panic when
hw.ciss.expose_hidden_physical is set
Loic Pefferkorn
loic-freebsd at loicp.eu
Sun Nov 14 20:20:09 UTC 2010
>Number: 152250
>Category: kern
>Synopsis: [patch] Kernel panic when hw.ciss.expose_hidden_physical is set
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sun Nov 14 20:20:08 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator: Loic Pefferkorn
>Release: 7.2-RELEASE
>Organization:
>Environment:
FreeBSD squeak.estat 7.2-STABLE FreeBSD 7.2-STABLE #5: Sun Nov 14 20:35:21 CET 2010 root at squeak.estat:/usr/obj/usr/src/sys/GENERIC amd64
>Description:
HP ProLiant DL360 G6 server with an HP StorageWorks MSL4048 Tape Library
# grep ciss /boot/loader.conf
hw.ciss.expose_hidden_physical=1
When the tunable hw.ciss.expose_hidden_physical is set at boot time, I have a kernel panic:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x8:0xffffffff80201686
stack pointer = 0x10:0xffffff807c6ab930
frame pointer = 0x10:0x400
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 77 (sysctl)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 6s
Physical memory: 4073 MB
Dumping 1230 MB:
Backtrace from the core dump:
(kgdb) bt
#0 doadump () at pcpu.h:195
#1 0x0000000000000004 in ?? ()
#2 0xffffffff8054cff9 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:418
#3 0xffffffff8054d402 in panic (fmt=0x104 <Address 0x104 out of bounds>)
at /usr/src/sys/kern/kern_shutdown.c:574
#4 0xffffffff80812563 in trap_fatal (frame=0xffffff0003eb4390, eva=Variable "eva" is not available.
)
at /usr/src/sys/amd64/amd64/trap.c:756
#5 0xffffffff80812935 in trap_pfault (frame=0xffffff807c6ab880, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:672
#6 0xffffffff80813274 in trap (frame=0xffffff807c6ab880)
at /usr/src/sys/amd64/amd64/trap.c:443
#7 0xffffffff807fd2ce in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:218
#8 0xffffffff80201686 in acpi_child_pnpinfo_str_method (cbdev=Variable "cbdev" is not available.
)
at /usr/src/sys/dev/acpica/acpi.c:850
#9 0xffffffff805753c9 in device_sysctl_handler (oidp=Variable "oidp" is not available.
)
at /usr/src/sys/kern/subr_bus.c:260
#10 0xffffffff8055654f in sysctl_root (oidp=Variable "oidp" is not available.
)
at /usr/src/sys/kern/kern_sysctl.c:1419
#11 0xffffffff805578c5 in userland_sysctl (td=0x0, name=0xffffff807c6abac0,
namelen=4, old=0x0, oldlenp=Variable "oldlenp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:1522
#12 0xffffffff80557ad2 in __sysctl (td=0xffffff0003eb4390,
uap=0xffffff807c6abbf0) at /usr/src/sys/kern/kern_sysctl.c:1449
#13 0xffffffff80812bb7 in syscall (frame=0xffffff807c6abc80)
at /usr/src/sys/amd64/amd64/trap.c:899
#14 0xffffffff807fd4db in Xfast_syscall ()
at /usr/src/sys/amd64/amd64/exception.S:339
#15 0x0000000800719cac in ?? ()
Previous frame inner to this frame (corrupt stack?)
Faulty instruction:
(kgdb) x/i 0xffffffff80201686
0xffffffff80201686 <acpi_child_pnpinfo_str_method+70>: mov 0x8(%rbx),%edx
>How-To-Repeat:
With the same hardware, put hw.ciss.expose_hidden_physical=1 in loader.conf and reboot.
>Fix:
Last called function is acpi_child_pnpinfo_str_method in sys/dev/acpica/acpi.c
static int
acpi_child_pnpinfo_str_method(device_t cbdev, device_t child, char *buf,
size_t buflen)
{
ACPI_BUFFER adbuf = {ACPI_ALLOCATE_BUFFER, NULL};
ACPI_DEVICE_INFO *adinfo;
struct acpi_device *dinfo = device_get_ivars(child);
char *end;
int error;
error = AcpiGetObjectInfo(dinfo->ad_handle, &adbuf);
adinfo = (ACPI_DEVICE_INFO *) adbuf.Pointer;
if (error)
snprintf(buf, buflen, "unknown");
else
snprintf(buf, buflen, "_HID=%s _UID=%lu",
(adinfo->Valid & ACPI_VALID_HID) ?
adinfo->HardwareId.Value : "none",
(adinfo->Valid & ACPI_VALID_UID) ?
strtoul(adinfo->UniqueId.Value, &end, 10) : 0);
if (adinfo)
AcpiOsFree(adinfo);
return (0);
}
buf is modified accordingly to "error" value.
I have found adbuf.Pointer to be set to 0x0 while "error" was set to a zero value. Therefore, references to adinfo struct in snprintf have 0x0 as base.
"error" value is not set correctly. Let's see why in AcpiGetObjectInfo, in sys/contrib/dev/acpica/nsxfname.c
Node = AcpiNsMapHandleToNode (Handle);
if (!Node)
{
(void) AcpiUtReleaseMutex (ACPI_MTX_NAMESPACE);
goto Cleanup;
}
(...)
Cleanup:
ACPI_FREE (Info);
if (CidList)
{
ACPI_FREE (CidList);
}
return (Status);
If AcpiNsMapHandleToNode fails, we release a mutex and go to Cleanup:, which does not update Status value before return.
Status value hence is the one from AcpiUtAcquireMutex called earlier, which is wrong.
Setting Status to AE_BAD_PARAMETER before going to Cleanup fix the issue (I found that AE_BAD_PARAMETER is used elsewhere in the kernel in similar flows when AcpiNsMapHandleToNode is called).
7.0 to 7.3 are affected, patch is attached.
Hope I'm right :)
Patch attached with submission follows:
--- src/sys/contrib/dev/acpica/nsxfname.c.orig 2010-11-14 20:51:57.000000000 +0100
+++ src/sys/contrib/dev/acpica/nsxfname.c 2010-11-14 20:50:46.000000000 +0100
@@ -361,6 +361,7 @@
if (!Node)
{
(void) AcpiUtReleaseMutex (ACPI_MTX_NAMESPACE);
+ Status = AE_BAD_PARAMETER;
goto Cleanup;
}
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list