sound/pcm/* bugs (was: Re: page fault panic tracked down (selwakeuppri()) - really sound/pcm/*)

Tue Jan 6 21:01:48 PST 2004

On  6 Jan, To: shoesoft at gmx.net wrote:
> On  6 Jan, Stefan Ehmann wrote:

>> Finally found out in the handbook why debugging of the modules didn't
>> work and now got it right it seems.
>> 
>> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
>> #1  0xc04e5178 in boot (howto=256) at
>> /usr/src/sys/kern/kern_shutdown.c:372
>> #2  0xc04e5507 in panic () at /usr/src/sys/kern/kern_shutdown.c:550
>> #3  0xc07eb648 in feed_vchan_s16 (f=0xc3967c00, c=0x0, b=0xc37d0000 "", 
>>     count=2048, source=0xc3741500) at
>> /usr/src/sys/dev/sound/pcm/vchan.c:80
>> #4  0xc07e1c6d in sndbuf_feed (from=0xc3741500, to=0xc3741600, 
>>     channel=0xc37a8880, feeder=0xc3967c00, count=3279164672) at
>> feeder_if.h:60
>> #5  0xc07e2225 in chn_wrfeed (c=0xc37a8880)
>>     at /usr/src/sys/dev/sound/pcm/channel.c:221
>> #6  0xc07e227c in chn_wrintr (c=0x800)
>>     at /usr/src/sys/dev/sound/pcm/channel.c:237
>> #7  0xc07e2990 in chn_intr (c=0x800)
>>     at /usr/src/sys/dev/sound/pcm/channel.c:483
>> #8  0xc07fba2f in csa_intr (p=0xc37a8700)
>>     at /usr/src/sys/dev/sound/pci/csapcm.c:623
>> #9  0xc07fa724 in csa_intr (arg=0xc37a8600)
>>     at /usr/src/sys/dev/sound/pci/csa.c:532
>> #10 0xc04d1612 in ithread_loop (arg=0xc1737b00)
>>     at /usr/src/sys/kern/kern_intr.c:544
>> #11 0xc04d0604 in fork_exit (callout=0xc04d1480 <ithread_loop>, arg=0x0,
>>     frame=0x0) at /usr/src/sys/kern/kern_fork.c:796
> 
> This code is really, really hard to follow, but it looks like there are
> some assumptions about buffer sizes that could be violated and cause bad
> things to happen.
> 
> In chn_wrfeed(), we have the following bits of code:
> 
>         struct snd_dbuf *b = c->bufhard;
>         struct snd_dbuf *bs = c->bufsoft;
>         unsigned int ret, amt;
> 
>        amt = sndbuf_getfree(b);
>        sndbuf_feed(bs, b, c, c->feeder, amt)
> 
> sndbuf_feed() looks like:
> 
> int 
> sndbuf_feed(struct snd_dbuf *from, struct snd_dbuf *to, struct pcm_channel *chan
> nel, struct pcm_feeder *feeder, unsigned int count)
> {
>         if (sndbuf_getfree(to) < count)
>                 return EINVAL;
>  
>         count = FEEDER_FEED(feeder, channel, to->tmpbuf, count, from);
> 
> [ snip ]
> 
> In this case, FEEDER_FEED() invokes feed_vchan_s16(), which looks like:
> 
> static int
> feed_vchan_s16(struct pcm_feeder *f, struct pcm_channel *c, u_int8_t *b, u_int32
> _t count, void *source)
> {
>         /* we're going to abuse things a bit */
>         struct snd_dbuf *src = source;
> [ snip ]
>         KASSERT(sndbuf_getsize(src) >= count, ("bad bufsize"));
>         count &= ~1;
>         bzero(b, count);
> [ snip ]
>         dst = (int16_t *)b;
>         tmp = (int16_t *)sndbuf_getbuf(src);
>         bzero(tmp, count);
>  
> 
> We start off by calculation the amount to copy as the amount of free
> space in bufhard.  In sndbuf_feed() "to" points to bufhard and we
> validate the amount of data to copy is will fit into the available free
> space.  In feed_vchan_16, the pointer to the destination buffer is in
> "b", and source/src points to bufsoft.  We get a pointer to the actual
> buffer for bufsoft and then bzero the desired number of bytes.  But what
> if the bufsoft buffer is smaller than the bufhard buffer?  It doesn't
> look like the buffer sizes are forced to match.  When chn_setblocksize()
> is called, it calls sndbuf_remalloc on the channel's bufsoft, which will
> free and rellocate the buffer, but it does not do the same for bufhard.
> If bufsoft shrinks, feed_vchan_s16() is likely stomp the wrong thing on
> the heap.
> 
> When the sound channel is closed, dsp_close() calls chn_reset(), which
> calls chn_setblocksize(c, 0, 0) ...

However it happens, in feed_vchan_s16() count is 2048, while
src->bufsize and src->maxsize are both 688, src->blksz is 344, and
src->blkcnt is 2.

> It looks like there will be one final interrupt after dsp_close().
> dsp_close() calls chn_flush(), which flushes the remaining buffered
> data.  Any interrupts which happen after the final call to chn_sleep()
> won't be serviced until after the channel mutex is unlocked and the
> dsp_close() code has had a chance to put things in a bad state.
> chn_flush() should probably wait for the final interrupt, possibly using
> a flag set by the interrupt handler to signal that the handler has seen
> the buffer drain (and the handler should not call chn_dmaupdate(), I
> think).

After thinking about this some more, I think the easiest fix would be to
skip the calls to chn_wrintr() and chn_rdintr() in chn_intr() if the
CHN_F_RUNNING flag is not set.  This will avoid running the channel
interrupt code when the channel is more likely to be in a bad state.

> The dsp_read() and dsp_write() routines should lock the channels to
> protect the flag manipulations.

This turns out to be OK since the channels are locked, I just didn't see
it at first.

I found another bug, though.  Holding a mutex across a malloc() call is
not allowed because malloc() can sleep, and sleeping while holding a
mutex is not allowed.  sndbuf_resize() sndbuf_remalloc() both call
malloc() to allocate buffers, and sndbuf_alloc() and sndbuf_setup() call
sndbuf_resize().  The problem is that chn_setblocksize() calls
sndbuf_remalloc() while the channel mutex is held.  We can't just unlock
the mutex around the sndbuf_remalloc() call because the channel
interrupt service routine could run while the buffer is being
re-allocated.  It would be best if the channel were to be shut down when
re-allocating the buffer.  This is likely to be messy since
chn_setblocksize() is called from lots of different places.