spin lock smp rendezvous held by 0xffffff01250a7980 for > 5 seconds

Kris Kennaway kris at obsecurity.org
Sat Nov 26 23:22:47 GMT 2005


On Thu, Nov 24, 2005 at 06:26:16PM -0500, Kris Kennaway wrote:
> I got this on a quad amd64 machine running 6.0-STABLE.  At the time it
> was running 21 simultaneous tar extractions onto a sync-mounted md.
> 
> panic() at panic+0x1e6
> _mtx_lock_spin() at _mtx_lock_spin+0xad
> pmap_invalidate_range() at pmap_invalidate_range+0xb3
> pmap_qremove() at pmap_qremove+0x53
> vfs_vmio_release() at vfs_vmio_release+0x1e0
> getnewbuf() at getnewbuf+0x368
> getblk() at getblk+0x3d9
> ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662
> ffs_write() at ffs_write+0x31b
> VOP_WRITE_APV() at VOP_WRITE_APV+0xed
> vn_write() at vn_write+0x228
> dofilewrite() at dofilewrite+0x90
> kern_writev() at kern_writev+0x54
> write() at write+0x4b
> 
> Unfortunately I can't dump on this machine (and no debugging is
> currently enabled), but I can try to reproduce it.

I tried for 24 hours with witness enabled but couldn't reproduce.  The
same panic happened in the same way when witness was disabled, although the failure mode was a bit different:


Fatal double fault
cpuid = 3; apic id = 03
panic: double fault
cpuid = 3
KDB: enter: panic
[...]
mtx_lock_spin() at _mtx_lock_spin+0x6b
getit() at getit+0x6f
DELAY() at DELAY+0x44
_mtx_lock_spin() at _mtx_lock_spin+0x6b
getit() at getit+0x6f
DELAY() at DELAY+0x44
_mtx_lock_spin() at _mtx_lock_spin+0x6b
getit() at getit+0x6f
DELAY() at DELAY+0x44
_mtx_lock_spin() at _mtx_lock_spin+0x6b
getit() at getit+0x6f
DELAY() at DELAY+0x44
_mtx_lock_spin() at _mtx_lock_spin+0x6b
pmap_invalidate_range() at pmap_invalidate_range+0xb3
pmap_qremove() at pmap_qremove+0x53
vfs_vmio_release() at vfs_vmio_release+0x1e0
getnewbuf() at getnewbuf+0x368
getblk() at getblk+0x3d9
ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662
ffs_write() at ffs_write+0x31b
VOP_WRITE_APV() at VOP_WRITE_APV+0xed
vn_write() at vn_write+0x228
dofilewrite() at dofilewrite+0x90
kern_writev() at kern_writev+0x54
write() at write+0x4b
syscall() at syscall+0x404
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (4, FreeBSD ELF64, write), rip = 0x80070ea6c, rsp = 0x7fffffffe6a8, rbp = 0x52ae00 ---

i.e. the first _mtx_lock_spin() tried to acquire the ipi lock and
spun, which called DELAY and getit, which tried to acquire the clock
lock:

        mtx_lock_spin(&clock_lock);

which *also* spun, and called DELAY...and at that point things went to
hell and it recursed until it blew out the stack.

I guess the next step is to try INVARIANTS alone in case that catches
something.

Kris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-amd64/attachments/20051126/ae7cd73e/attachment.bin


More information about the freebsd-amd64 mailing list