page fault panic tracked down (selwakeuppri()) - really sound/pcm/*

Tue Jan 6 14:45:48 PST 2004

On  6 Jan, Stefan Ehmann wrote:
> On Tue, 2004-01-06 at 20:05, Don Lewis wrote:
>> On  6 Jan, Stefan Ehmann wrote:
>> 
>> > I'm also still not sure why I got three "bad bufsize" panics in a row
>> > all of the sudden instead of page faults. (Most of the previous kernels
>> > were built without INVARIANTS - but at least some with).
>> 
>> I forgot that historially the kernel options were not used when building
>> modules.  I think they are used right now, and were used for a while in
>> the recent past.
> 
> Finally found out in the handbook why debugging of the modules didn't
> work and now got it right it seems.
> 
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
> #1  0xc04e5178 in boot (howto=256) at
> /usr/src/sys/kern/kern_shutdown.c:372
> #2  0xc04e5507 in panic () at /usr/src/sys/kern/kern_shutdown.c:550
> #3  0xc07eb648 in feed_vchan_s16 (f=0xc3967c00, c=0x0, b=0xc37d0000 "", 
>     count=2048, source=0xc3741500) at
> /usr/src/sys/dev/sound/pcm/vchan.c:80
> #4  0xc07e1c6d in sndbuf_feed (from=0xc3741500, to=0xc3741600, 
>     channel=0xc37a8880, feeder=0xc3967c00, count=3279164672) at
> feeder_if.h:60
> #5  0xc07e2225 in chn_wrfeed (c=0xc37a8880)
>     at /usr/src/sys/dev/sound/pcm/channel.c:221
> #6  0xc07e227c in chn_wrintr (c=0x800)
>     at /usr/src/sys/dev/sound/pcm/channel.c:237
> #7  0xc07e2990 in chn_intr (c=0x800)
>     at /usr/src/sys/dev/sound/pcm/channel.c:483
> #8  0xc07fba2f in csa_intr (p=0xc37a8700)
>     at /usr/src/sys/dev/sound/pci/csapcm.c:623
> #9  0xc07fa724 in csa_intr (arg=0xc37a8600)
>     at /usr/src/sys/dev/sound/pci/csa.c:532
> #10 0xc04d1612 in ithread_loop (arg=0xc1737b00)
>     at /usr/src/sys/kern/kern_intr.c:544
> #11 0xc04d0604 in fork_exit (callout=0xc04d1480 <ithread_loop>, arg=0x0,
>     frame=0x0) at /usr/src/sys/kern/kern_fork.c:796

This code is really, really hard to follow, but it looks like there are
some assumptions about buffer sizes that could be violated and cause bad
things to happen.

In chn_wrfeed(), we have the following bits of code:

        struct snd_dbuf *b = c->bufhard;
        struct snd_dbuf *bs = c->bufsoft;
        unsigned int ret, amt;

       amt = sndbuf_getfree(b);
       sndbuf_feed(bs, b, c, c->feeder, amt)

sndbuf_feed() looks like:

int 
sndbuf_feed(struct snd_dbuf *from, struct snd_dbuf *to, struct pcm_channel *chan
nel, struct pcm_feeder *feeder, unsigned int count)
{
        if (sndbuf_getfree(to) < count)
                return EINVAL;

        count = FEEDER_FEED(feeder, channel, to->tmpbuf, count, from);

[ snip ]

In this case, FEEDER_FEED() invokes feed_vchan_s16(), which looks like:

static int
feed_vchan_s16(struct pcm_feeder *f, struct pcm_channel *c, u_int8_t *b, u_int32
_t count, void *source)
{
        /* we're going to abuse things a bit */
        struct snd_dbuf *src = source;
[ snip ]
        KASSERT(sndbuf_getsize(src) >= count, ("bad bufsize"));
        count &= ~1;
        bzero(b, count);
[ snip ]
        dst = (int16_t *)b;
        tmp = (int16_t *)sndbuf_getbuf(src);
        bzero(tmp, count);

We start off by calculation the amount to copy as the amount of free
space in bufhard.  In sndbuf_feed() "to" points to bufhard and we
validate the amount of data to copy is will fit into the available free
space.  In feed_vchan_16, the pointer to the destination buffer is in
"b", and source/src points to bufsoft.  We get a pointer to the actual
buffer for bufsoft and then bzero the desired number of bytes.  But what
if the bufsoft buffer is smaller than the bufhard buffer?  It doesn't
look like the buffer sizes are forced to match.  When chn_setblocksize()
is called, it calls sndbuf_remalloc on the channel's bufsoft, which will
free and rellocate the buffer, but it does not do the same for bufhard.
If bufsoft shrinks, feed_vchan_s16() is likely stomp the wrong thing on
the heap.

When the sound channel is closed, dsp_close() calls chn_reset(), which
calls chn_setblocksize(c, 0, 0) ...

It looks like there will be one final interrupt after dsp_close().
dsp_close() calls chn_flush(), which flushes the remaining buffered
data.  Any interrupts which happen after the final call to chn_sleep()
won't be serviced until after the channel mutex is unlocked and the
dsp_close() code has had a chance to put things in a bad state.
chn_flush() should probably wait for the final interrupt, possibly using
a flag set by the interrupt handler to signal that the handler has seen
the buffer drain (and the handler should not call chn_dmaupdate(), I
think).

The dsp_read() and dsp_write() routines should lock the channels to
protect the flag manipulations.

Some of this stuff is hairy enough that I don't want to touch it.  I
think a sound code expert is needed ...