Re: sshd signal 11 on -current

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 18 Jan 2024 22:11:03 UTC
On Jan 18, 2024, at 11:32, bob prohaska <fbsd@www.zefox.net> wrote:

> On Wed, Jan 17, 2024 at 08:22:50PM -0800, Mark Millard wrote:
>> On Jan 17, 2024, at 17:51, bob prohaska <fbsd@www.zefox.net> wrote:
>> 
>>> On Wed, Jan 17, 2024 at 05:09:32PM -0800, Mark Millard wrote:
>>>> 
>>>> So far it sounds like the problem requires pi4 RasPiOS
>>>> workstation behavior to be involved to get the problem.
>>>> Can you do something to avoid all use of RasPiOS, possibly
>>>> using a different OS on that RPi4B for some experiments?
>>>> 
>>> I just tried a Windows 10 laptop wired into the LAN. Ssh to 
>>> ns2.zefox.net and  running 
>>> grep -i /var/log/messages produces five lines of grep matches, 
>>> then "corrupted MAC on input....." 
>>> 
>>> I'm not sure which MAC (as in ethernet MAC) is being referred
>>> to. Might a different kind of MAC exist, unrelated to ethernet? 
>>> 
>>> Running top, or cat /var/log/messages, produces the error
>>> immediately. It seems safe to use ls. Meanwhile, the serial 
>>> console session served by nemesis.zefox.com  is still up 
>>> and usable. 
>>> 
>>> I'm increasingly confused about where the error starts.
>>> 
>> 
>> Note: I'm using unique switch naming below, something
>> your diagram does not provide.
>> 
>> Both the macOS system and the pi4 RasPiOS workstation
>> used the path (or so I assume):
>> 
>> MACHINE<->wifi<->lan<->router<->switchA<->ns2.zefox.net
>> 
>> What about the Windows 10 laptop test? Same path?
>> 
> 
> I've edited http://www.zefox.net/~fbsd/netmap to reflect
> the actual placement of hosts relative to the switches.
> 
> 
>> Could a MACHINE with the problem be moved to be
>> on switchA for EtherNet to see if it still has the
>> problem when there (just for the test)? Testing the
>> macOS system on switchA to be sure it still works
>> could also be of interest.
> 
> If by MACHINE you mean the ssh client, pelorus.zefox.org 
> is already there, along with ns1, ns2 and www.zefox.net.

Given the new chart, I meant MACHINE by position in
either ssh sequence:

MACHINE<->wifi<->lan<->router<->switchA<->ns2.zefox.net
MACHINE<->lan<->router<->switchA<->ns2.zefox.net

You have an example of the later that gets the
failure: MACHINE="Win10 laptop"

It means that involving wifi is not a requirement for the
problem to happen. The fewer devices involved in the
sequence that still show the problem, the better for the
one type of evidence.

This means I'm now focused on just:

MACHINE<->lan<->router<->switchA<->ns2.zefox.net

and possibly eliminating more stages as not required
to get the problem.

Now can lan and/or router be eliminated by moving
one of "Win10 laptp" or "pi4 RaspIS workstation"
temporarily? Moving to switchA would be testing not
having either lan or router involved:

MACHINE<->switchA<->ns2.zefox.net

Does such a move lead to still having the MAC
failure? To no MAC failure?

(Switches, routers, and the like do sometimes have
errors that mess up just some protocol, not
everything.)

> It's somewhat curious that going from RPi4 workstation
> vi ssh to www.zefox.net and then ssh to ns2 does not
> report corrupted MAC, but both machines run armv7
> FreeBSD 12.4.4

So, listing the nested(!) ssh sequence more fully, that was(?):

"pi4 RasPiOS workstation"<->wifi<->lan<->router<->switchA<->www.zefox.net<->switchA<->ns2.zefox.net

Or, being more explicit about the nesting:

"pi4 RasPiOS workstation"<->wifi<->lan<->router<->switchA<->www.zefox.net

then, nested:

www.zefox.net<->switchA<->ns2.zefox.net

And it ends up getting the same result (no failure)
as just doing:

www.zefox.net<->switchA<->ns2.zefox.net

without the involvement of any other MACHINE, if I
understand right.

Another related test would be by temporarily moving
www.zefox.net to form one of:

www.zefox.net<->wifi<->lan<->router<->switchA<->www.zefox.net
or:
www.zefox.net<->lan<->router<->switchA<->www.zefox.net

Does such still not get a failure? Or does it then fail?


> A three hop connection (RPiOS > www.zefox.net > ns2.zefox.net)
> somehow inhibits the corrupted MAC error.  Evidently
> there's something special going on among the hosts.
> 
>> Could you boot a FreeBSD microsd card in the pi4
>> instead and try it as a FreeBSD system to see if
>> it still has the problem (while in its usual
>> place)? I'm still looking for the same hardware
>> context but running a distinct but known OS
>> context to see if the problem persists.
>> 
> 
> Realistically I should probably just set up a microSD using
> 14-Release and configure it as ns2.zefox.net.

That is going a different direction than I asked about.
It does not eliminate RasPiOS from involvement on the
same hardware it was originally used on.

Both types of tests have their uses. But my focus for
now in this area is on the replacement of RasPiOS by
a FreeBSD version to eliminate RasPiOS's involvement
for some tests but using the same hardware as before
the replacement (other than boot media).

> That needs doing
> anyway and should be done for www.zefox.net and ns1.zefox.net
> as a matter of maintenance.  
> 
> The dilemma is then armv7 vs aarch64. Armv7 has served well,
> and used to fit in 1 GB RAM. Now it's getting tighter. 
> Aarch64 is _very_ tight in 1 GB RAM now and will doubtless
> get worse. Is there a concensus on which to choose? I gather 
> armv7's days are numbered but not up yet.

armv7 is Tier 2 for 13.x and 14.x

armv7 is projected to stay tier 2 for the later official 15.x
( stable/15 and such ) but might not. It is possible that
armv7 might only be supported via lib32/chroot/jail use for
aarch64 that also supports EL0 AArch32 --and so AArch32/armv7
could end up not being bootable any more by then.

aarch64 is Tier 1 for 13.x and 14.x
aarch64 is projected to stay tier 1 for the later official 15.x
aarch64 hardware that does not support AArch32 at all would
still be Tier 1.
(The aarch64 tier 1 claims are somewhat strong for embedded
aarch64. A possibility being that, at some point, a 1 GiByte
RAM aarch64 might not be able to self-host buildworld
buildkernel or various port->package builds even with
substantial swap space --but that would not be likely to
change the Tier 1 status of aarch64, for example.)



===
Mark Millard
marklmi at yahoo.com