page fault panic tracked down (selwakeuppri())
Bruce Evans
bde at zeta.org.au
Mon Jan 5 02:37:08 PST 2004
On Sun, 4 Jan 2004, Stefan Ehmann wrote:
> For the first time I got a backtrace that ended in the soundcard module
> - So maybe this is the right direction (on the other hand this might be
> some newly introduced error)
>
> panic: bad bufsize
> #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:240
> #1 0xc04e5198 in boot (howto=256) at
> /usr/src/sys/kern/kern_shutdown.c:372
> #2 0xc04e5527 in panic () at /usr/src/sys/kern/kern_shutdown.c:550
> #3 0xc07ec648 in feed_vchan_s16 () from /boot/kernel/snd_pcm.ko
> #4 0xc07e2c6d in sndbuf_feed () from /boot/kernel/snd_pcm.ko
> #5 0xc07e3225 in chn_wrfeed () from /boot/kernel/snd_pcm.ko
> #6 0xc07e327c in chn_wrintr () from /boot/kernel/snd_pcm.ko
> #7 0xc07e3990 in chn_intr () from /boot/kernel/snd_pcm.ko
> #8 0xc07fca2f in csa_intr () from /boot/kernel/snd_csa.ko
> #9 0xc07fb724 in csa_intr () from /boot/kernel/snd_csa.ko
> #10 0xc04d1692 in ithread_loop (arg=0xc1737b00)
> at /usr/src/sys/kern/kern_intr.c:544
> #11 0xc04d0684 in fork_exit (callout=0xc04d1500 <ithread_loop>, arg=0x0,
> frame=0x0) at /usr/src/sys/kern/kern_fork.c:796
The sound driver is in modules. Have you ruled out having an inconsistent
module.
> Nearly all other panics (I put a collection of ~20 backtraces on
> http://stud4.tuwien.ac.at/~e0125637/fbsd/ if it's of any interest) ended
> up in file system code.
The args in many of these don't make sense. E.g., in bt.13:
% ...
% #15 0xc06743f8 in calltrap () at {standard input}:94
% #16 0xc0502a34 in lockmgr (lkp=0xc3a91028, flags=0, interlkp=0xe4,
% td=0xc06bfc68) at /usr/src/sys/kern/kern_lock.c:228
% #17 0xc0566c67 in vfs_busy (mp=0x0, flags=0, interlkp=0x0, td=0x0)
% at /usr/src/sys/kern/vfs_subr.c:527
% #18 0xc056362c in lookup (ndp=0xd8cd5bdc) at /usr/src/sys/kern/vfs_lookup.c:559
% ...
mp=0x0 in #17 can't happen, and is inconsistent with lkp!=0 in #16
(since lkp = &mp->mnt_lock). The panic in line 228 in #16 is from lkp
being garbage, so apparently it is not &mp->mnt_lock for a correct mp
either. flags=0 and interlkp=0x0 in #17 are correct. td=0x0 is
inconsistent with td!=0 in #16 (I think the latter is correct). flags=0
in #16 is correct. interlkp=0xe4 in #16 is inconsistent with interlkp=0
in #17.
Compiling without optimization might simplify debugging, but I thought that
gcc doesn't clobber stack args even with optimization.
bt.34 has an example of a garbage arg the is unlikely to be from
mis-debugging it since it is in 2 stack frames:
% ...
% #3 0xc0560029 in getblk (vp=0xc42ec6d8, blkno=16, size=7537385, slpflag=0,
^^^^^^^^^^^^
% slptimeo=0, flags=0) at /usr/src/sys/kern/vfs_bio.c:2415
% #4 0xc05650bd in cluster_read (vp=0xc42ec6d8, filesize=3675636, lblkno=16,
% size=7537385, cred=0x0, totread=32768, seqcount=0, bpp=0x0)
^^^^^^^^^^^^
% at /usr/src/sys/kern/vfs_cluster.c:127
% #5 0xc419efa7 in ext2_read (ap=0xd8d17bcc)
% at @/gnu/ext2fs/ext2_readwrite.c:115
% ...
`size' is the ext2fs block size. It should be either 1K or 4K for ext2fs.
So the size, or more likely the pointer to it, is corrupt in ext2_read()
here:
% size = BLKSIZE(fs, ip, lbn);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
% blkoffset = blkoff(fs, uio->uio_offset);
%
% xfersize = fs->s_frag_size - blkoffset;
% if (uio->uio_resid < xfersize)
% xfersize = uio->uio_resid;
% if (bytesinfile < xfersize)
% xfersize = bytesinfile;
%
% if (lblktosize(fs, nextlbn) >= ip->i_size)
% error = bread(vp, lbn, size, NOCRED, &bp);
% else if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0)
% error = cluster_read(vp,
% ip->i_size, lbn, size, NOCRED,
^^^^
% uio->uio_resid, (ap->a_ioflag >> 16), &bp);
BLKSIZE(fs, ip, lbn) depends mainly on ip (fs is ip->FS), so it seems that
we have a corrupt ip here. This is consistent with some of the panics
being nearer to inode allocation. *_vget() has delicate locking which
has caused problems before. I just remembered than ext2_vget() and
other *_vget() (nfs...) are missing the last round of locking changes
in ffs_vget() (ffs_vfsops.c 1.181). But these changes are only claimed
to be simplifications and optimizationns. I can't see any bugs in the
locking in ext2_vget() except that stuff like the following makes too
many assumptions:
% /*
% * Lock out the creation of new entries in the FFS hash table in
% * case getnewvnode() or MALLOC() blocks, otherwise a duplicate
% * may occur!
% */
% if (ext2fs_inode_hash_lock) {
% while (ext2fs_inode_hash_lock) {
% ext2fs_inode_hash_lock = -1;
% tsleep(&ext2fs_inode_hash_lock, PVM, "e2vget", 0);
% }
% goto restart;
% }
% ext2fs_inode_hash_lock = 1;
This assumes that the kernel is non-preemptible or that something else
prevents concurrent non-atomic test-and-set of ex2fs_inode_hash_lock.
I think Giant locking now prevents concurrent access, and anyway,
concurrent access is too unlikely to give so many panics if the bug
is here.
Bruce
More information about the freebsd-current
mailing list