Bounty offered to fix sio device lock problem

Jo Rhett jrhett at svcolo.com
Fri Oct 13 02:12:29 PDT 2006


Bruce Evans wrote:
> On Thu, 12 Oct 2006, Jo Rhett wrote:
> 
>> Bruce -- who owns getting this fixed?  Or who should own it?   Or who 
>> will take on getting it fixed if we offer a bounty on it?
>>
>> Replication scenario:
>>   Modem on sio0 (or sio1 or any normal i386 serial port)
>>   /etc/ttys has port enabled with "dialup"
>>   qpage (from ports, unchanged) uses modem for dialout
>>     ** or just write a script that periodically dials out using tip
>>
>> Within a day and often within a few hours, the serial port will go 
>> awol.  You can't talk to the modem any more.   Modem is just fine.
>> Rebooting the system solves the problem.   Rebooting the modem does 
>> not solve it.
>>
>> 100% replicable, and sooner versus later if you call out more often.
> 
> [context lost to top posting]
> 
> I mentioned an old vfs refcounting bug.  New ones turned up a week or
> two ago.  They cause leaked pty masters and worse.  The pty leak is
> caused by last-close sometimes not being called.  For pty masters, the
> leak is permanent since reopening of the master is not permitted for
> security reasons so there is no way to reach the device close, but for
> sio devices it should be possible to fix up the problem by reopening
> and closing the device relevant device after ensuring that it is not
> already open:
> - for cua*, simply stty -f'ing it or just using it should be enough.
>   I guess this is not your problem, since the fix is almost automatic.
> - for tty*, it may be necessary to disable getty on the port and kill
>   the current getty, since the old vfs refcounting bug normally prevents
>   reaching last-close if any process is sleeping in open, so if you don't
>   disable getty on the port then you have to race with the new getty to
>   complete the open/last-close before the new getty sleeps in open.
> 
> Many nearby vfs bugs will be fixed in 6.2-RELEASE, but no fix is in
> sight for the main refcounting ones.

So these problems are all in 6.0-REL, not 6.1 or CURRENT.  (I assume 
they persist, but new ones may be newer than this)

On one system, I've disabled getty for over a month and ... well, 
waiting for schedule downtime for the host.  Still can't use the device.

I'd be happy to give you any debug or any information you need to 
diagnose.  And we'd be happy to give you money to adjust your priorities 
too :-)  It's a fairly serious annoyance for us, causing our emergency 
out of band pagers to missing crucial messages.


More information about the freebsd-hardware mailing list