kern/57611: mlockall(2) related VM-bug

Marius Strobl marius at alchemy.franken.de
Sun Oct 5 11:40:20 PDT 2003


>Number:         57611
>Category:       kern
>Synopsis:       mlockall(2) related VM-bug
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Oct 05 11:40:15 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     Marius Strobl
>Release:        FreeBSD 5.1-CURRENT i386
>Organization:
>Environment:
>Description:
Since Bruce M Simpson commited the support for the mlockall(2) and
munlockall(2) system calls on Aug 11 07:14:07 2003 UTC a cdrecord compiled
on FreeBSD since then causes the kernel to panic.
Backtrace of the panic I get with FreeBSD sources as of today:

GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-undermydesk-freebsd"...
panic: mutex vm object not owned at /usr/src/sys/vm/vm_page.c:762
panic messages:
---
panic: mutex vm object not owned at /usr/src/sys/vm/vm_page.c:762
cpuid = 0; lapic.id = 01000000
panic: from debugger
cpuid = 0; lapic.id = 01000000
boot() called on cpu#0
Uptime: 2m11s
Dumping 512 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496
---
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
240             dumping++;
(kgdb) where
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
#1  0xc050c3a3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:372
#2  0xc050c788 in panic () at /usr/src/sys/kern/kern_shutdown.c:550
#3  0xc043eff2 in db_panic () at /usr/src/sys/ddb/db_command.c:450
#4  0xc043ef6a in db_command (last_cmdp=0xc06d8e00, cmd_table=0x0, 
    aux_cmd_tablep=0xc0693a6c, aux_cmd_tablep_end=0xc0693a70)
    at /usr/src/sys/ddb/db_command.c:346
#5  0xc043f078 in db_command_loop () at /usr/src/sys/ddb/db_command.c:472
#6  0xc0441db9 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_trap.c:73
#7  0xc0623143 in kdb_trap (type=3, code=0, regs=0xddb95ae8)
    at /usr/src/sys/i386/i386/db_interface.c:171
#8  0xc063b5ce in trap (frame=
      {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -1066954496, tf_esi = 1, tf_ebp = -575055052, tf_isp = -575055084, tf_ebx = 0, tf_edx = 0, tf_ecx = 0, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1067305938, tf_cs = 8, tf_eflags = 642, tf_esp = -1066874695, tf_ss = -1066952000})
    at /usr/src/sys/i386/i386/trap.c:578
#9  0xc0624a48 in calltrap () at {standard input}:103
#10 0xc050c721 in panic (fmt=0xc0679100 "mutex %s not owned at %s:%d")
    at /usr/src/sys/kern/kern_shutdown.c:534
#11 0xc050352f in _mtx_assert (m=0xc46595c8, what=0, 
    file=0xc0689a02 "/usr/src/sys/vm/vm_page.c", line=762)
    at /usr/src/sys/kern/kern_mutex.c:855
#12 0xc05f9f7c in vm_page_alloc (object=0xc46595c8, pindex=0, req=0)
    at /usr/src/sys/vm/vm_page.c:762
#13 0xc05ecd4b in vm_fault_copy_entry (dst_map=0xc1922dc8, src_map=0xc45d4bd0, 
    dst_entry=0xc46a5ac8, src_entry=0x0) at /usr/src/sys/vm/vm_fault.c:1167
#14 0xc05f3221 in vm_map_copy_entry (src_map=0xc45d4bd0, dst_map=0xc1922dc8, 
    src_entry=0xc4695ca8, dst_entry=0xc46a5ac8)
    at /usr/src/sys/vm/vm_map.c:2373
#15 0xc05f3547 in vmspace_fork (vm1=0xc45d4bd0)
    at /usr/src/sys/vm/vm_map.c:2488
#16 0xc05ee30e in vm_forkproc (td=0xc467f390, p2=0xc467ed3c, td2=0xc467f130, 
    flags=20) at /usr/src/sys/vm/vm_glue.c:624
#17 0xc04f7c25 in fork1 (td=0xc467f390, flags=20, pages=0, procp=0xddb95cdc)
    at /usr/src/sys/kern/kern_fork.c:654
#18 0xc04f6c6b in fork (td=0xc467f390, uap=0xddb95d14)
    at /usr/src/sys/kern/kern_fork.c:102
#19 0xc063bec3 in syscall (frame=
      {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 4096, tf_esi = 65536, tf_ebp = -1077946760, tf_isp = -575054476, tf_ebx = 64, tf_edx = 1307, tf_ecx = 672785664, tf_eax = 2, tf_trapno = 0, tf_err = 2, tf_eip = 672191759, tf_cs = 31, tf_eflags = 582, tf_esp = -1077946804, tf_ss = 47})
    at /usr/src/sys/i386/i386/trap.c:1006
#20 0xc0624a9d in Xint0x80_syscall () at {standard input}:145
---Can't read userspace from dump, or kernel process---


With a kernel built with sources as of directly after the mentioned commit
the backtrace looks like this:

GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-undermydesk-freebsd"...
panic: vm_fault_copy_wired: page missing
panic messages:
---
panic: vm_fault_copy_wired: page missing
cpuid = 1; lapic.id = 00000000
panic: from debugger
cpuid = 1; lapic.id = 00000000
boot() called on cpu#1
Uptime: 54s
Dumping 512 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496
---
Reading symbols from /boot/kernel/linux.ko...done.
Loaded symbols for /boot/kernel/linux.ko
#0  doadump () at ../../../kern/kern_shutdown.c:240
240             dumping++;
(kgdb) where
#0  doadump () at ../../../kern/kern_shutdown.c:240
#1  0xc0209223 in boot (howto=260) at ../../../kern/kern_shutdown.c:372
#2  0xc0209608 in panic () at ../../../kern/kern_shutdown.c:550
#3  0xc013ed72 in db_panic () at ../../../ddb/db_command.c:450
#4  0xc013ecea in db_command (last_cmdp=0xc03d0c00, cmd_table=0x0, 
    aux_cmd_tablep=0xc038baec, aux_cmd_tablep_end=0xc038baf0)
    at ../../../ddb/db_command.c:346
#5  0xc013edf8 in db_command_loop () at ../../../ddb/db_command.c:472
#6  0xc0141b39 in db_trap (type=3, code=0) at ../../../ddb/db_trap.c:73
#7  0xc031cf41 in kdb_trap (type=3, code=0, regs=0xdc446b44)
    at ../../../i386/i386/db_interface.c:172
#8  0xc033568d in trap (frame=
      {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -1070069368, tf_esi = 1, tf_ebp = -599495792, tf_isp = -599495824, tf_ebx = 0, tf_edx = 0, tf_ecx = 0, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1070476754, tf_cs = 8, tf_eflags = 642, tf_esp = -1070052523, tf_ss = -1070129405})
    at ../../../i386/i386/trap.c:577
#9  0xc031e848 in calltrap () at {standard input}:103
#10 0xc02095a1 in panic (fmt=0xc0380988 "vm_fault_copy_wired: page missing")
    at ../../../kern/kern_shutdown.c:534
#11 0xc02e7256 in vm_fault_copy_entry (dst_map=0xc1529b00, src_map=0xc437a500, 
    dst_entry=0xc436c924, src_entry=0x0) at ../../../vm/vm_fault.c:1088
#12 0xc02ed6e1 in vm_map_copy_entry (src_map=0xc437a500, dst_map=0xc1529b00,
    src_entry=0xc082ce4c, dst_entry=0xc436c924) at ../../../vm/vm_map.c:2376
#13 0xc02eda07 in vmspace_fork (vm1=0xc437a500) at ../../../vm/vm_map.c:2491
#14 0xc02e878e in vm_forkproc (td=0xc41ce850, p2=0xc41cb000, td2=0xc402e260, 
    flags=20) at ../../../vm/vm_glue.c:624
#15 0xc01f4a78 in fork1 (td=0xc41ce850, flags=20, pages=0, procp=0xdc446cdc)
    at ../../../kern/kern_fork.c:654
#16 0xc01f3a2b in fork (td=0xc41ce850, uap=0xdc446d14)
    at ../../../kern/kern_fork.c:102
#17 0xc0335f83 in syscall (frame=
      {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134831178, tf_esi = 4096, tf_ebp = -1077946712, tf_isp = -599495308, tf_ebx = 65536, tf_edx = 64, tf_ecx = 672822528, tf_eax = 2, tf_trapno = 0, tf_err = 2, tf_eip = 672228559, tf_cs = 31, tf_eflags = 582, tf_esp = -1077946740, tf_ss = 47})
    at ../../../i386/i386/trap.c:1005
#18 0xc031e89d in Xint0x80_syscall () at {standard input}:145

The change in the backtraces seems to be caused by Marcel Moolenaar's
addition of upward growable stacks on Aug 30 21:25:22 2003 UTC. However,
this is only an assumption based on the differences in sys/vm between
two kernels I tested and not exactly tracked down.

Cdrecord has a single call of mlockall(2) in cdrecord.c:
    364 #if defined(HAVE_MLOCKALL)
    365                 if (mlockall(MCL_CURRENT|MCL_FUTURE) < 0) {
    366                         errmsg("WARNING: Cannot do mlockall(2).\n");
    367                         errmsgno(EX_BAD, "WARNING: This causes a high risk for buffer underruns.\n");
    368                 }
    369 #endif

This looks like a fairly normal use of mlockall(2) in cdrecord and is
known to work on e.g. Solaris and Linux.

Notes:	- I only have tested this on SMP-boxes.
	- There are several other reports of this panic, however, it was
	  first thought to be caused by the ATAng commit. See the archive
	  of freebsd-current.
	- A workaround (patch-conf::configure) was commited to sysutils/
	  cdrtools-devel port that hinders cdrecord from using mlockall(2).
>How-To-Repeat:
Build the sysutils/cdrtools port on a FreeBSD built after the addition of
mlockall(2) and try to write a CD:
	cdrecord dev=<your burner> <some ISO-image>
>Fix:
Reverting the addition of mlockall(2) and munlockall(2) or hacking cdrecord
to not use mlockall(2) avoids the panic.

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list