svn commit: r333669 - in head/sys: dev/vt kern sys teken

Bruce Evans brde at optusnet.com.au
Wed May 16 11:15:03 UTC 2018


On Wed, 16 May 2018, [UTF-8] Jean-Sébastien Pédron wrote:

> Author: dumbbell
> Date: Wed May 16 09:01:02 2018
> New Revision: 333669
> URL: https://svnweb.freebsd.org/changeset/base/333669
>
> Log:
>  teken, vt(4): New callbacks to lock the terminal once
>
>  ... to process input, instead of inside each smaller operations such as
>  appending a character or moving the cursor forward.
> ....
>  The goal is to improve input processing speed of vt(4). As a benchmark,
>  here is the time taken to write a text file of 360 000 lines (26 MiB) on
>  `ttyv0`:
>
>    * vt(4), unmodified:      1500 ms
>    * vt(4), with this patch: 1200 ms
>    * syscons(4):              700 ms

Syscons was pessimized by a factor of about 12 using related methods
(excessive layering, aktough not so much locking).  So the correct
comparison is with unpessimized syscons taking about 60 ms.

These times are just for writing to the history buffer and are very relevant
for normal operation.  Pessimizations by factors of 12 just annoy me.

>  This is on a Haswell laptop with a GENERIC-NODEBUG kernel.

My times are on Haswell too, but on a 4.0GHz desktop with non-GENERIC
non-debug kernels.  My test does 65MB of (weird) text output consisting
650 lines of length 100000.  Again, this is not very representative.
The (trivial) benchmark sources have almost no changes since I wrote
it to benchmark and optimize syscons the Minix console driver 25-30
years ago.  (I didn't do much with syscons then, but made sure that
it was only slightly slower.)  Old tests did 80-column output and all
versions do 1 write(2) per line and it was convenient to scale up the
line length to avoid having too much of the time being for syscall
overhead.  The 65MB is scaled to take about 1 second on an AthlonXP
2.2GHz with the best version of syscons.

Approximate times on Haswell:

- 0.2 seconds with FreeBSD-5.2 syscons modified to recover some of my fixes
   from 1993.  I micro-optimized the inner loop in 1993.  The inner
   loop handles these 100000-character lines 80 characters at a time
   using as close as possible to *dst++ = *src++, with considerable
   overhead for attributes, checking for escape sequences, and for
   reducing to 80 columns.  According to the comment, this took 26
   cycles on i486's (probably DX2/66), but I optimized it to only 18.
   This optimized inner loop was turned into mostly nonsense before
   FreeBSD-5.2, using inlining in all the wrong places.  The loop was
   moved into an inline function (sc_term_gen_print()), but it calls a
   non-inline function (sc_vtb_putchar()).  This made it about 50%
   slower IIRC.

- 50% slower in pre-teken versions in pre-release versions of FreeBSD-8.
   FreeBSD-8 changed the upper layers of the tty driver.  The pessimization
   is to do quoting stuff per-char.  This made the i/o even slower than the
   old way.  The older way produced larger tinygrams by transfering between
   the layers only about 100 bytes per output call, and used inefficent clists.

- about 500% slower for teken, by calling from the sc layer to the teken
   layer for every char.  IIRC, there are more than 5 but less than 10
   function calls per char, so it is doing OK to be only 5 times slower.

- thus the time on Haswell was about 2.4 seconds for 65MB.  This is for
   text mode.  Only slightly slower for graphics mode.  Screen refresh should
   occur at most about 50 times in 2.4 seconds, so at most 100k of the output
   should actually reach the screen and that shouldn't take long on a 4GHz
   system!  (On a 30 year ET4000, the frame buffer speed was 5.9MB/sec so
   100k must take at least 1/30 seconds in text mode.  Any slower than that
   is bad.)

- I optimized this a little by avoiding 1 or 2 function calls (for attribute
   handling) per character, so syscoons onl takes about 2 seconds for 65MB
   in -current.  This is consistent with your 700 ms (26/65 * 2000 ms =
   800).

- I have syscons mostly fixed in local patches:

   - Method A: restore scterm-sc.c (don't use teken).  This was very easy, and
     fixes many other bugs much more easily than in my committed and local
     patches.  The excessive layering in syscons actually helps here -- the
     API for the layering of the terminal emulator has only small changes,
     so scterm-sc.c from FreeBSD-7 takes only about 10 lines of changes to
     drop back in.

     Also restore optimizations from 1993 as far as possible.  They moved to
     scterm-sc.c and were lost with teken.

     This fixes everything except the slow upper layers, so the speed is
     0.33 seconds for 65MB.
     seconds for

   - Method B: restore only sctermvar.h from FreeBSD-7.  Use only
     sc_term_gen_print() and its infrastructure from this.  This does
     essentially *dst++ = *src++ to the history buffer until it hits an
     escape sequence, and it must not be called while in an escape
     sequence.  Subvert the teken layering so that this can be used.

>  At the same time, the locking is changed in the vt_flush() function
>  which is responsible to draw the text on screen. So instead of
>  (indirectly) using VTBUF_LOCK() just to read and reset the dirty area
>  of the internal buffer, the lock is held for about the entire function,
>  including the drawing part.
>
>  The change is mostly visible while content is scrolling fast: before,
>  lines could appear garbled while scrolling because the internal buffer
>  was accessed without locks (once the scrolling was finished, the output
>  was correct). Now, the scrolling appears correct.
>
>  In the end, the locking model is closer to what syscons(4) does.

This is only very fast because the output never reaches the screen.
Frame buffers aren't much faster than 25 years ago.  Text mode tends to
be limited to a few MB/sec.  Old VGA graphics modes (not supported by
vt) are much slower since they require using PIO registers and there is
more to do.  Direct bitmapped modes might be no slower than text mode,
but again there is a lot of I/O to do.

Old syscons updates so fast that no scrolling is visible when all lines
are the same.  This also requires magic buffer sizes.  clists and the
transfer size of 100 gave suitable magic (e.g., 100 divides 80x25).  Now
tty buffer sizes are normally powers of 2, so the magic lining up rarely
occurs and this looks like artifacts in scrolling.

Bruce


More information about the svn-src-head mailing list