kern/109232: [sio][patch] ibufsize calculation wrong causing
data loss
Bruce Evans
bde at zeta.org.au
Fri Feb 16 22:32:13 UTC 2007
On Fri, 16 Feb 2007, Greg Ansley wrote:
>> Description:
> sio comwakeup time (sio_timeout) is limited to 200 hz (see siosettimeout()) but calculation for sio ibufsize uses hz which is now >> 200 hz causing insufficiently sized interrupt level sio buffers. This in turn causes lost incoming data and the kernel issues the "interrupt-level buffer overflow" message when receiving long strings of high speed (baud > 19200) data.
The problem is actually a little different:
- the timeout is normally limited to 1 second and is not normally used
(except in FreeBSD-1 and maybe FreeBSD-2). It is used mainly for for
polling (when polling is enabled on some sio device, the timeout is
scalled by hz but limited to 200 Hz) but the polling case is broken
in other ways.
- ibufsize is limited from below to 128. This should be plenty for low
speeds up to 57600 bps, and should barely work at the low speed of 115200
bps too. 128 character times at 115200 bps = 11.1111 msec, so the system
only needs to have a worst-case tty software interrupt latency of than
11.1111 msec to avoid interrupt-level buffer overruns. However, the
broken interrupt latency in RELENG_[5-7] is apparently worse than that
in some cases, and the calculation of ibufsize is supposed to be leaving
a safety margin of a factor of 4.
- A period of 11.1111 msec is a frequency of just 90 Hz. The system also
needs a worst-case _clock_ software interrupt latency of slightly better
than that just to keep up with the old default HZ of 100. Both the clock
software interrupt handler (SWI) and the tty SWI have some Giant locking,
so they can't actually keep up with that in all cases, even on very fast
CPUs. The default for HZ is now 10 times larger, so there are many more
cases in which the system can't actually keep up. This is not usually
a problem, provided the CPU is not very slow -- the system can keep up
in most cases, and the other cases work provided everything uses
enormous buffers or doesn't mind dropping i/o.
- the calculation of ibufsize is written under the naive assumption that
the _clock_ interrupt handler can actually keep up with HZ. Note
that sio is essentially independent of HZ. The calculation just
uses hz as a hint of what the system can keep up with. The tty SWI
has a higher priority than the clock SWI. Thus if the clock SWI can
keep up with HZ, the tty SWI can keep up too. At least before
RELENG_5. With mutex locking, priority inversion is possible; also,
it is possible for the clock SWI to not have any Giant locking, while
the tty SWI always has it; thus the clock SWI might be less limited
by Giant locking than the tty SWI. I think that in practice the
priorities don't affect this problem much, but Giant locking does,
and that the clock SWI normally blocks at least once on Giant, and that
it only takes one long blockage on Giant in the clock SWI to destroy
its worst-case latency in the same way as for the tty SWI.
> Probalby has bearing on kern/26261 although I don't know when hz was raised above 200.
kern/26261 seems to be mostly about lower-level problems. Some of these
problems are smaller now that "fast" interrupt handlers can be shared.
Some of them are larger since shared interrupt handlers can only be "fast",
unless they cooperate with each other (using a scheduler a bit like the
one used for DEVICE_POLLING), but the sio interrupt handler works best
when it is fast.
>> How-To-Repeat:
> Open serial port at > 19200 baud and recieve long (> 128 bytes) of continuous full speed data.
>> Fix:
> Index: sio.c
> sio.c:1929
>
> Change:
> cp4ticks = speed / 10 / hz * 4;
> to
> cp4ticks = speed / 10 / (hz > 200 ? 200 : hz) * 4;
The 200 shouldn't be used here, except possibly in the polling case.
I previously suggested bumping the factor of 4 safety margin to a
factor of 40. This is also not quite right. Since memory is now
plentiful and hz is only indirectly relevant, it might be best to
make the buffer size independent of hz. E.g., scale it for hz = 10
or hz = 1, since no one should set hz that low. The tty layer already
does this. IIRC, it uses a size of
(1 second's worth of data) + 2 * ibufsize
It needs a reserve of (2 * ibufsize) since the ibuf level may want to
add up to ibufsize characters to it on short notice. The factor is 2
instead of 1 to support spacing of watermarks sufficient for flow
control. This shows that 1 second's worth of buffering in ibuf would
be too much -- it would almost triple the size of the ttybuf. 1/10
second's worth would be OK.
ibuf needs a similar reserve adding up to <hardware rx fifo size>
characters to it at short notice. This is currently only handled
indirectly via the factor of 4. Some UARTs have an rx fifo size of
128 or 256, but it is currently a confguration error to use these
since using sizes >= 16 has only small possible positive benefits
and has negative benefits in practice since sio's watermark processing
is not quite right for larger sizes. In particular, the watermark/
flow control stuff is more primitive than in the tty layer, but for
large rx fifos it needs to be similar.
So I suggest changing
cp4ticks = speed / 10 / hz * 4;
to
cp4ticks = speed / 10 / 10;
and renaming the variable and adjusting comments, since hz, ticks and 4
are no longer involved. The wastage would be small since the tty layer
already uses huge buffers. (However, the tty layer uses dynamic allocation
while ibuf is malloced. The complexities for dynamic allocation are silly
given plentiful memory. The tty subsystem only has small memory
requirements, but it manages its memory as if its requirements were
relatively large, like they were 30 years ago. Its 30-year old memory
allocation scheme was replaced 15 years in 386BSD, but came back.)
I use HZ = 100 without the above change (but with timeouts used again,
and the 200 in siosettimeout() adjusted to 1000) and haven't seen this
problem at any speed <= 921600 bps.
Bruce
More information about the freebsd-bugs
mailing list