[Bug 293382] Dead lock and kernel crash around closefp_impl

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 01 Apr 2026 06:53:33 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=293382

--- Comment #41 from Paul <devgs@ukr.net> ---
(In reply to Kyle Evans from comment #38)

Basically, this is a HTTP/Websocket micro-service app. It handles hundreds of
thousands of long-living incoming connections. Approximately 98% of them are
Websocket and the rest is plain HTTP. Other than that, there are also a lot of
outbound HTTP connections, somewhere in range of 200k-300k. Yeah, to be able to
implement this we need timers, a lot of them.

All of it is based on a standalone asio C++ library. It does the heavy lifting
under the hood. We use several `io_context` for scale. And I believe each of
them has its own kqueue, which seems reasonable.

Not sure about EV_DELETE/EV_ONESHOT, as we're not dealing with kqueue directly.
Looking at source code I see no EV_ONESHOT and even see that earlier they've
abandoned its use
(https://www.boost.org/doc/libs/latest/doc/html/boost_asio/history.html#boost_asio.history.asio_1_4_9___boost_1_46_1).
It does use EV_DELETE, seemingly on every descriptor close, so... a lot.
Connections appear and fall off organically. A lot of them on timeout, as this
is the real world and not all browsers bother to do the clean Websocket
shutdown.

This service is nothing new for us. We've been using it for years. Even today,
we're still keeping a fallback server that runs FreeBSD 13.1-STABLE but, and
this is a big but: with a different server platform : 2x E5-2660. So, not only
have we switched to a newer FreeBSD version, but also dramatically changed the
platform. And the current platform is much, much faster. My gut feeling is:
there was a problem already, it just became apparent. Because, a faster CPU
makes things, both in kernel and userspace, run faster. And some, highly
improbable races, become likely.

We've been running this platform and OS version for months now, under different
kinds of load: not that many sockets, but much more CPU usage overall. And it
runs completely stable. On the other hand, for this specific role, we now have
2 (of 2) servers that exhibit these crashes. Which doesn't rule out some CPU
issues yes, but makes it highly unlikely to be the cause.

-- 
You are receiving this mail because:
You are the assignee for the bug.