Questionable code in sys/dev/sound/pcm/channel.c

Mon Jul 26 15:33:57 PDT 2004

On 26 Jul, Conrad J. Sabatier wrote:
> 
> On 26-Jul-2004 Don Lewis wrote:
>> On 26 Jul, Conrad J. Sabatier wrote:
>>> I'm a little perplexed at the following bit of logic in chn_write()
>>> (which is where the "interrupt timeout, channel dead" messages are
>>> being generated).
>>> 
>>> Within an else branch within the main while loop, we have:
>>> 
>>>             else {
>>>                 timeout = (hz * sndbuf_getblksz(bs)) /
>>> (sndbuf_getspd(bs) * sndbuf_getbps(bs));
>>>                 if (timeout < 1)
>>>                     timeout = 1;
>>>                 timeout = 1;
>>> 
>>> Why the formulaic calculation of timeout, if it's simply going to be
>>> unconditionally set to 1 immediately afterwards anyway?  What's
>>> going on
>>> here?
>> 
>> Hmn, looks bogus to me.  I think the intention is to round timeout up
>> to 1 if the result of the formula is zero.  The final assignment
>> statement looks bogus to me.  Maybe a too short timeout is the
>> source of this problem.
>> 
>> It looks like this assignment appeared in rev 1.65.
> 
> Hmm, your guess is as good as (or probably better than) mine.  :-)
> A little more in the way of comments certainly wouldn't hurt.
> 
>>> Also, at the end of the function:
>>> 
>>>     if (count <= 0) {
>>>         c->flags |= CHN_F_DEAD;
>>>         printf("%s: play interrupt timeout, channel dead\n",
>>>         c->name);
>>>     }
>>> 
>>>     return ret;
>>> }
>>> 
>>> Could it be that the conditional test is wrong here?  Perhaps
>>> we should be using (count < 0) instead?
>>>
>>> I don't know.  I'm having no small difficulty understanding this
>>> code, but these two items caught my attention.
>> 
>> I ran into the same problem when I was looking at the code a few days
>> ago.
>> 
>> BTW, the trace output that was posted showed write() returning 0
>> immediately before the failure occurred.
> 
> Are you referring to the truss output I posted a few days ago?  The
> thing of it is, though, that the original "channel dead" message had
> already occurred in a previous run of madplay (which wasn't traced), so
> it's really hard to say if there's any useful info to be obtained from
> tracing a later run, after the pcm device was already "broken".

I think that was it.  The truss output looked like things were working
for a while before it croaked.  I saw a bunch of writes succeed, then a
write returned 0, and then it looked like it died.

> So far, I still haven't gotten the error with the new kernel I'm
> testing.  I wouldn't say absolutely that that single patch (of the
> final conditional test) is "the fix", but it may help in the meantime.

I just looked at the code some more.  With timeout hardwired to 1, count
can never go negative.  The code initializes count to hz, and then
decrements it whenever chn_sleep() returns EWOULDBLOCK, and
re-initializes count to hz if chn_sleep() returns zero.  With timeout
hardwired to 1, count should only be able to decrement to zero if
chn_sleep() returns EWOULDBLOCK hz times in a row, which means that
nothing could be stuffed into the buffer for one second, which seems
like a long time ...

I suspect that with your change the write() call is returning a 0 and
the player software is doing a retry that succeeds (or this might be
audible as a skip).