[Bug 277671] 14-RELEASE/14-STABLE crash with heavy disk IO on AMD Asus x670e motherboard and Intel i225 (igc) breakage NIC non-functioning

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 10 Jun 2024 16:08:45 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277671

--- Comment #9 from Cameron <cam@vasteel.io> ---
Tried running monerod for the first time in a while... And my system no longer
crashes!

This could be resolved by one or more of the following changes:

1. Upgraded to 14.1-RELEASE. I tried 14-STABLE maybe within a few months of
14.1-RELEASE and still had the problem.

2. Started using "zpool trim"... But I have another FreeBSD that had
14.0-RELEASE where I didn't run trim and had no problems.

3. I'm on a beta BIOS for this motherboard that's more recent than current
latest official release.

I notice after monerod has run for a while, I start getting tons of these
messages in dmesg:
Jun  5 02:19:11 hostname kernel: sonewconn: pcb 0xfffff802963b9540
(0.0.0.0:18080 (proto 6)): Listen queue overflow: 193 already in queue awaiting
acceptance (1 occurrences), euid 781, rgid 781, jail 0
Jun  5 02:25:11 hostname kernel: sonewconn: pcb 0xfffff802963b9540 

Increasing kern.ipc.soacceptqueue doesn't seem to help at all. I wonder if IO
is so slow that monerod can't keep up with the connections?

The first few times I ran "zpool trim", it only took a few minutes... But over
time, it has progressively gotten worse, now taking 21+ minutes. Suggesting
there's still some IO issue. Perhaps the same issue I've had in the past when
running monerod, but now it no longer causes my box to completely lockup.

I can now run monerod constantly without locking up my box though, which is a
nice improvement!

In /var/log/monerod.log, I see a lot of traces:
2024-06-10 15:46:31.253 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:134  Exception:
boost::wrapexcept<boost::bad_weak_ptr>
2024-06-10 15:46:31.253 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:135  Unwound call stack:
2024-06-10 15:46:31.385 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:163       1                  0x9ab808 __cxa_throw +
0xc8
2024-06-10 15:46:31.510 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       2                  0x50b05f
2024-06-10 15:46:31.633 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       3                  0x7e1f4a
2024-06-10 15:46:31.757 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       4                  0x7dc205
2024-06-10 15:46:31.879 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       5                  0x788439
2024-06-10 15:46:32.001 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       6                  0x78886c
2024-06-10 15:46:32.122 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       7                  0x7c05e2
2024-06-10 15:46:32.244 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       8                  0x7b2e5b
2024-06-10 15:46:32.365 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       9                  0x7bc49d
2024-06-10 15:46:32.486 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       a                  0x4d9b88
2024-06-10 15:46:32.607 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       b                  0x491100
2024-06-10 15:46:32.728 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       c                  0x48eddd
2024-06-10 15:46:32.849 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       d                  0x48c562
2024-06-10 15:46:32.970 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       e                  0x7e39a5
2024-06-10 15:46:33.091 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159       f                  0x7fd24f
2024-06-10 15:46:33.212 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159      10                  0x7fd118
2024-06-10 15:46:33.333 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159      11                  0x4fb1b2
2024-06-10 15:46:33.453 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159      12                  0x4f03c4
2024-06-10 15:46:33.575 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159      13                  0x4efe94
2024-06-10 15:46:33.695 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159      14                  0x4efbcc
2024-06-10 15:46:33.816 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159      15                  0x7deaa2
2024-06-10 15:46:33.937 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159      16                  0x82bec79bd
2024-06-10 15:46:34.058 [P2P6]  INFO    stacktrace     
src/common/stack_trace.cpp:159      17                  0x8324bcb05


I see similar traces on my other box where monerod has never given me problems,
but the traces become more far more common on the box that does give me
problems once the sonewconn errors start appearing. The sonewconn errors have
never appeared on the other working box.

It seems monerod is mostly or entirely unable to continue syncing the block
chain with constant stacktraces once it gets to this point unless I completely
reboot the system. Completely stopping and starting monerod doesn't help.

Looking at sockstat -c the last time I was in this state, I only had a bit over
200 connections.

-- 
You are receiving this mail because:
You are the assignee for the bug.