svn commit: r345696 - head/lib/libvgl

Tue Sep 3 14:06:18 UTC 2019

On Sun, 31 Mar 2019, Konstantin Belousov wrote:

> On Sun, Mar 31, 2019 at 10:27:54PM +1100, Bruce Evans wrote:
>> On Sat, 30 Mar 2019, Konstantin Belousov wrote:
>>
>>> On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:
>>>> On Fri, 29 Mar 2019, Konstantin Belousov wrote:
>>>>
>>>>> On Fri, Mar 29, 2019 at 03:57:09PM +0000, Bruce Evans wrote:
>>>>>> Author: bde
>>>>>> Date: Fri Mar 29 15:57:08 2019
>>>>>> New Revision: 345696
>>>>>> URL: https://svnweb.freebsd.org/changeset/base/345696
>>>>>>
>>>>>> Log:
>>>>>>   Fix endless loops for handling SIGBUS and SIGSEGV.
>>>>>>
>>>>>>   r80270 has the usual wrong fix for unsafe signal handling -- just set
>>>>>>   a flag and return to let an event loop check the flag and do safe
>>>>>>   handling.  This never works for signals like SIGBUS and SIGSEGV that
>>>>>>   repeat and works poorly for others unless the application has an event
>>>>>>   loop designed to support this.
>>>>>>
>>>>>>   For these signals, clean up unsafely as before, except for arranging that
>>>>>>   nested signals are fatal and forcing a nested signal if the cleanup doesn't
>>>>>>   cause one.
>>>>>>
>>>>>> Modified:
>>>>>>   head/lib/libvgl/main.c
>>>>>>
>>>>>> Modified: head/lib/libvgl/main.c
>>>>>> ==============================================================================
>>>>>> --- head/lib/libvgl/main.c	Fri Mar 29 15:20:48 2019	(r345695)
>>>>>> +++ head/lib/libvgl/main.c	Fri Mar 29 15:57:08 2019	(r345696)
>>>>>> ...
>>>>>> @@ -107,14 +107,22 @@ struct vt_mode smode;
>>>>>>  }
>>>>>>
>>>>>>  static void
>>>>>> -VGLAbort(int arg __unused)
>>>>>> +VGLAbort(int arg)
>>>>>>  {
>>>>>> +  sigset_t mask;
>>>>>> +
>>>>>>    VGLAbortPending = 1;
>>>>>>    signal(SIGINT, SIG_IGN);
>>>>>>    signal(SIGTERM, SIG_IGN);
>>>>>> -  signal(SIGSEGV, SIG_IGN);
>>>>>> -  signal(SIGBUS, SIG_IGN);
>>>>>>    signal(SIGUSR2, SIG_IGN);
>>>>>> +  if (arg == SIGBUS || arg == SIGSEGV) {
>>>>>> +    signal(arg, SIG_DFL);
>>>>>> +    sigemptyset(&mask);
>>>>>> +    sigaddset(&mask, arg);
>>>>>> +    sigprocmask(SIG_UNBLOCK, &mask, NULL);
>>>>>> +    VGLEnd();
>>>>>> +    kill(getpid(), arg);
>>>>> This of course misses the siginfo information from the real fault.
>>>>
>>>> It is in the nested signal frame.
>>>>
>>>>> Why SIGBUS/SIGSEGV are caught at all ?
>>>>
>>>> Otherwise, the screen is left in an application mode that the kernel
>>>> doesn't support (except to not write to the screen, so that the
>>>> application has full control).
>>
>> Also, the keyboard may be left in a strange state.  Usually this is no
>> worse than termios raw mode, but it may be raw scancodes.  In raw scancodes
>> mode, Alt-Fn to switch to another vty to fix the problem.  libvgl disables
>> switching to another vty without going through a libvgl handler anyway.
>> The combination is usually enough to break switching to vty0 to run ddb,
>> though that should work for at least breakpoints in the kernel.  IIRC,
>> sc_cngrab() switches the keyboard mode, and my version of it restores
>> ignoring of the anti-switch flag.
>>
>>> ...
>>> I am more about not doing kill(2) from the handler, instead do plain
>>> return to the context which caused the original signal.
>>
>> That won't work well for SIGBUS's and SIGSEGV's related to the library.
>> If the above cleanup is not done, then it is too hard to debug the problem,
>> and if the above cleanup is done then it is impossible to debug the original
>> problem if it was for and invalid access in libvgl code.
>>
>> The latter is a problem for the above cleanup too -- the cleanup clobbers
>> the libvgl state.
>>
>> However, I could do a more limited cleanup of just restoring the screen and
>> keyboard state using ioctls.  ioctl() isn't async-signal safe in POSIX, but
>> most ioctls are async-signal safe in practice provided they don't use
>> global variables that were set too recently or too non-atomically.
>> VGLEnd() uses main globals established by VGLInit().  It depends on program
>> order which is not guaranteed in signal handlers for testing the flags set
>> by initialization.  The cleanup now also does:
>> - screen clearing.  This breaks debugging and is dangerous if the bug is
>>    in screen clearing.  Screen clearing is also very slow, and belongs
>>    in the kernel.  It is now done 4 times per libvgl use, but its
>>    slowness is limited by the kernel not clearing properly and the
>>    kernel not allowing the application to mmap() the whole physical
>>    frame buffer in some versions of FreeBSD (this breaks panning).
>> - munmap().  This breaks debugging.
>> - free() of backing store buffer and the main VGLDisplay data.  This breaks
>>    debugging and is unnecessary if we are going to exit.
>
> You can only try to do async-safe calls from SIGSEGV handler, to not
> break a limited chances to success. Also you should not modify in-memory
> state for the library, for the same reason, and then it fits your goal
> of keeping the state intact for debugging. Then you should return to
> re-create the original situation causing and not kill(2).

That is what I suggested in "however".  However2, it is extra work to write
a different exit path.  When running the program directly under gdb, you
also have to tell gdb to not stop at any caught signal, since the screen
will be unusable then.

> I find it weird to try to debug mode-switched code without serial console.

Running gdb on a remote system via a remote shell works almost as well.
It is inconvenient to type lots of unobvious redirection commands like
0<>/dev/ttyv0.  (libvgl needs fd 0 open for both reading and writing
for mmap(), while vidcontrol needs fd 0 only open for reading to do
write-like ioctls to it, and fd 2 (actually stdio stderr) open for
writing to write escape sequences to it.)

Serial consoles are not always available.

Better debuggers switch the screen mode as necessary.

I recently noticed another mode switching problem.  On i386, cycling
through about 50 modes to test them all takes a small fraction of a
second for each mode switch (done directly in syscons or by a vm86
BIOS call) even when the monitor takes too long to resync (the monitor
resyncs are coalesced, so 50 resyncs take about 5 seconds instead of
250).  But on amd64, mode switches to VESA mode and back take about
20 seconds.  They are not coalesced, and most of the system is stopped
waiting for them (at least remote network access is stopped).  amd64
can't use vm86, so it emulates BIOS calls.  Why does the emulation stop
the rest of the system and take so long?

Bruce