Re: sshd signal 11 on -current

From: Mark Millard <marklmi_at_yahoo.com>
Date: Wed, 17 Jan 2024 17:34:28 UTC
On Jan 17, 2024, at 08:00, bob prohaska <fbsd@www.zefox.net> wrote:

> A Pi4 running -current reported:
> 
> Jan 13 16:23:10 nemesis kernel: pid 53604 (sshd), jid 0, uid 22: exited on signal 11 (no core dump - bad address)
> repeatedly. 

I assume that the pid changed from message to message, in addition
to the time but the rest of each message text matched exactly.

> There's no obvious  disruption of operation, existing
> ssh connections seem undisturbed.

I'll 1st remind what a process tree for sshd looks like
(you need not be using root and likely would be using tip
instead of ps):

1546  -  Is       0:00.00 |-- sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd)
1628  -  Ss       0:00.10 | |-- sshd: root@pts/1 (sshd)
1642  1  Ss       0:00.04 | | `-- -sh (sh)
9531  1  R+       0:00.00 | |   `-- ps -xd
7512  -  Is       0:00.02 | `-- sshd: root@pts/0 (sshd)
7515  0  Is+      0:00.01 |   `-- -sh (sh)

The lack of disruption indicates that one of the "@pts/"
sshd's got the signal but the other @pts/ ones did not,
nor did the "/usr/sbin/sshd [listener]" sshd get such a
signal, if I understand right.

Too bad there are no core files. Also, the system may lack
symbols or debug information to make backtraces readable
(if there was a core to look at).

> The messages occur in a group of
> about fifteen, one second apart. The machine has been up about
> three days, with only one occurrence so far. 
> 
> Can't tell if this is new or old behavior, I've never manually 
> checked /var/log/messages for sshd errors until now and didn't
> save the security run email from the 13th.. 
> 
> Might it be of significance?

The "exited on signal 11" would possibly lead to the contained
shell (tsch in your case?) being killed. It would not be via
SIGHUP. The tip run might also be killed, leaving the lock file
around. Trying:

# ps -xd

on nemesis before starting up a new tip should indicate if there
is a tip running that is no longer (indirectly) under a
"sshd: root@pts/" process.


It is a very good find. At the moment I do not see a way to
end up with a backtrace showing what was involved when the
signal happened. But it highly likely that you have demonstrated
the presence of an error of some kind:

     Num   Name             Default Action       Description
. . .
     11    SIGSEGV          create core image    segmentation violation

should not happen.

===
Mark Millard
marklmi at yahoo.com