[Bug 205398] [regression] [tty] tty_drain() kernel function lacks timeout support it had before

Fri Dec 18 18:53:37 UTC 2015

On Fri, 18 Dec 2015, Eugene Grosbein wrote:

> 18.12.2015 23:05, Bruce Evans Ð¿Ð¸ÑˆÐµÑ‚:
>> ...
>> There is a hack for last-close that is supposed to give a hard-coded 
>> timeout
>> of 1 second.  Not sure why this doesn't work for you.  My quick fix that
>> restores the timeout uses slightly different logic where this hack was.
>
> I've made a mistake (now corrected) while filling PR: my system is 9.3-STABLE
> and not 10.2-STABLE. It has no "leaving" case hack.

I forgot to check that before.  I first fixed this in FreeBSD-9, and that
till doesn't have the hack.  Later I merged my fix into FreeBSD-10 and
reorganized it to be more similar to the surrounding code.  The patch
in my previous mail is relative to FreeBSD-10.  Not much tty code has
changed since then, and I don't use my complete tty patch set for
-current (only parts for my drivers).

> ...
>> If the device is really disconnected, then the tty should be in a zombie
>> state and should not wait.  I think this still works.  CLOCAL or lack of
>> modem signals may break detection of last-close.
>
> The device does not get disconnected in process, it was not connected
> from the moment of open().
>
>> Did you have CRTSCTS flow control enabled?  This is probably the main
>> source of hangs.  The RTS and CTS signals are not ignored in CLOCAL mode,
>> flow control should be invoked when they go away when th device goes
>> away.
>
> It has both of CRTSCTS flow control and CLOCAL enabled and
> I'd like to keep them both enabled and working.

Reasonable, but I try to do the opposite for various reasons:
- drivers should be able to keep up with low speeds like 115200 bps
   provided the CPU is an i386 at 20 MHz or faster.  OS interrupt latency
   may break this.  Data loss without RTSCTS serves as a warning about
   OS breakage.  However, higher speeds are more of a problem (I test with
   32 times faster and that has an inter-char time of 2.5 usec), and
   sometimes it is the external device that needs the flow control.
- carrier detect (CD) is useful even for local connections.  Sometimes you
   actually want unplugging a cable to cleanly terminate the connection.

Bidirectional devices make CLOCAL mostly unnecessary.  Note that they
are mostly broken in -current.  With non-broken ones, CLOCAL remains
off throughout; CD is ignored for the purpose of opening, but acts
later for the purpose of hanging up if it drops later (to drop, it
must have been raised at some point either earlier or later).  This
works for most uses of modems.  You open the callout device, usually
before carrier is up, and program it.  Then a connection raises carrier
and dropping carrier should terminate the connection.  Perhaps it shouldn't
turn the device into a zombie, and the open fd should remain usable for
further programming of the modem.  But without CLOCAL, there is no way
short of busy-waiting polling to get tell if the connection went away.
The old version of the tty driver has some complications to track the
TS_CONNECTED state independently of the TS_ZOMBIE state.  I forget if
this was needed for further programming of the modem.

CLOCAL is now forced for callout devices, although this ignores the
setting in the initial state device.  This is to work around some other
bugs.  The correct workaround is the change the default for CLOCAL.

>> This is mostly fixed in my version.  I started to cut out the patches,
>> but they were too entwined with other fixes.  Here is the part that
>> replaces the hard-coded 1 second timeout:
>> 
>> X diff -c2 ./kern/tty.c~ ./kern/tty.c
>> X *** ./kern/tty.c~    Thu Mar 19 18:23:08 2015
>> X --- ./kern/tty.c    Sat Aug  8 11:40:23 2015
>> X ***************
>> X *** 133,155 ****
>> X           return (0);
>> X X !     while (ttyoutq_bytesused(&tp->t_outq) > 0) {
>> X           ttydevsw_outwakeup(tp);
>> X           /* Could be handled synchronously. */
>> X           bytesused = ttyoutq_bytesused(&tp->t_outq);
>> X !         if (bytesused == 0)
>> X               return (0);
>> X X           /* Wait for data to be drained. */
>> X !         if (leaving) {
>> X               revokecnt = tp->t_revokecnt;
>> X !             error = tty_timedwait(tp, &tp->t_outwait, hz);
>> X               switch (error) {
>> X               case ERESTART:
>> X                   if (revokecnt != tp->t_revokecnt)
>> X                       error = 0;
>> X                   break;
>> X               case EWOULDBLOCK:
>> X !                 if (ttyoutq_bytesused(&tp->t_outq) < bytesused)
>> X                       error = 0;
>> X                   break;
>> X               }
>> X --- 196,225 ----
>> X           return (0);
>> X X !     while (ttyoutq_bytesused(&tp->t_outq) != 0 || tp->t_flags & 
>> TS_BUSY) {
>
> Strange diff format... Should patch(1) apply this with all those X'es ?

The X is just quoting for mail, to try to prevent corruption of leading
whitspace by mail programs.  Unfortunately, so mail programs corrupt the
X's instead, by joining them when one is on an empty line (the quoting
is to add a leading "X " and that gives a trailing space for empty lines).

patch(1) should indeed work with all those X's.  Only the mailer corruption
prevents it working.  I didn't know this when I started using this quoting,
and thought that the patch needed manual editing.  I used to use a prefix
of "% ".  The prefix is chosen to be different from normal quoting, so
that patch lines can be extracted using "grep '^X '" and then edited using
sed (or use just sed if you know it well).  Apparently patch was designed
to work directly from quoted lines in mail format, or maybe shar format.

I also like to interleave comments with the patch.  The quoting and
manual extraction is necessary for removing these lines.

> Thank you for answer, anyway! I'll try to understand and test patches next 
> week.

Here is a test program with interesting draining.  It is mostly for
debugging revoke(), but this required timing of the draining to get
the revoke() to race with other activity.

If the device is (mis)emulated or possibly if it has large buffers, or
both, then timing test might fail, but it is probably easier to emulate
the timing for large times than small ones.

X /* XXX draining for smaller speeds/buffers is even more broken. */
X #define	SPEED	9600
X #define	ttys
X #define	uart
X 
X #include <fcntl.h>
X #include <stdio.h>
X #include <string.h>
X #include <termios.h>
X #include <unistd.h>
X 
X #ifdef ttys
X #ifdef uart
X #define	CODEV	"/dev/cuau0"
X #define	DEV	"/dev/cuau0"
X #else
X #define	CODEV	"/dev/cuad0"
X #define	DEV	"/dev/cuad0"
X #endif
X #else
X #define	CODEV	"/dev/ptyp0"
X #define	DEV	"/dev/ttyp0"
X #endif
X 
X #define CLOSEDRAINTIME	1
X #define LARGETIME	10
X #define SMALLTIME	3
X 
X char largebuf[SPEED / 10 * LARGETIME];
X char smallbuf[SPEED / 10 * SMALLTIME];
X 
X int
X main(void)
X {
X 	struct termios t;
X 	int fd, fd0;
X 
X #ifdef CODEV
X 	fd0 = open(CODEV, O_RDWR | O_NONBLOCK);
X #endif
X 	fd = open(DEV, O_RDWR | O_NONBLOCK);
X 	tcgetattr(fd, &t);
X 	cfsetspeed(&t, SPEED);
X 	t.c_cflag &= ~CRTSCTS;
X 	tcsetattr(fd, TCSADRAIN, &t);
X 	fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) & ~O_NONBLOCK);
X 	memset(largebuf, 'a', sizeof(largebuf));
X 	memset(smallbuf, 'a', sizeof(smallbuf));
X 	if (fork() == 0) {
X 		write(fd, smallbuf, sizeof(smallbuf));
X 		printf(
X 		"draining should take %d seconds at %d bps (but is broken)\n",
X 		    sizeof(smallbuf) / (SPEED / 10), SPEED);
X 		printf("drain smallbuf start\n");
X 		tcdrain(fd);
X 		printf("drain smallbuf done\n");
X 		write(fd, largebuf, sizeof(largebuf));
X 		printf(
X 		"draining should take %d seconds at %d bps (but is broken)\n",
X 		    sizeof(largebuf) / (SPEED / 10), SPEED);
X #ifdef hack
X 	exit(0);
X #endif
X 		printf("drain largebuf start\n");
X 		tcdrain(fd);
X 		close(fd);
X 		printf("drain largebuf done\n");
X 	} else if (fork() == 0) {
X #ifdef testread
X 	exit(0);
# #endif
X 		printf("read start\n");
X 		read(fd, smallbuf, sizeof(smallbuf));
X 		printf("read done\n");
X 	} else {
X #ifdef hack
X 	close(fd);
X 	close(fd0);
X #endif
X 		usleep(1000000 * SMALLTIME | 1000000 * CLOSEDRAINTIME / 2);
X 		printf("revoke\n");
X 		printf(
X "revoke should flush, but actually drains, so should take a long time, but\n");
X 		printf(
X "it actually takes its hard-coded timeout of 1 second\n");
X 		revoke(DEV);
X 		printf("revoke done\n");
X 		sleep(1);
X 		printf("parent done\n");
X 	}
X 	return (0);
X }

Bruce