Two modest kernel features I wish I had
Ronald F. Guilmette
rfg at monkeys.com
Mon Feb 23 20:12:56 PST 2004
I've been writing a specialized daemon process that will act as a sort-of
intelligent shim/proxy between SMTP clients and some user-designated SMTP
server(s), perhaps located elsewhere. The details of what the shim/proxy
will do aren't really important here, so I'll just skip straight to my
While writing this daemon program, a couple of ideas came to mind relating
to kernel calls which, as far as I know, do not exist, but which would have
been very nice to have, had they existed. (Then again, I'm ignorant, and
maybe something like the kernel calls that I'm about to ask for do exist,
but I just don't know about them. If so, I hope that somebody will tell
Anyway, one feature that I would have liked to have been able to include
in my shim/proxy daemon would have been the ability of the daemon to act
as a kind of multiplexer, i.e. to have it be able to accept incoming con-
nections from one host:port but then proxy each of of those thru to some
other ("real server") host:port selected from a set of "real server"
host:port pairs, where the specific real server selected is any one of
the available ones that can itself accept an incoming connection (from
the shim/proxy daemon) immediately, or else, if none can accept a con-
nection immediately, then whichever one can accept a connection soonest.
Anyway, yes, I could _almost_ get what I want just by initiating a whole
set of outbound "no wait" asynchronous connect() attempts, and by then
doing a select() or poll() on all of the relevant fd's, to see which
one(s) come ready soonest, and then I could just use the "soonest responder"
and just close() all of the _other_ completed connections, but that seems
rather ugly and wasteful, and more importantly, it might even cause the
log files of the various real servers to get all clogged up with error
messages about prematurely-aborted connections.
So anyway, what I was thinking, is that what would REALLY be nice to have
here would be a "multi-connect" kernel call. Let's call it `mconnect'
Basically, and unlike the regular connect(2) call, for mconnect(2) one
would pass an entire list or vector of (struct sockaddr *) pointers
(and also a list or vector of socklen_t length values) to mconnect()
and it would then send out an initial SYN packet to all of the designated
hosts/ports in the vector. Then... and this is the kicker... the calling
process would be stalled until at least one host/port responds with a
SYN+ACK. The first one that does so respond is the one that actually
gets connected to (i.e. by the kernel finishing the three-way TCP hand-
shake, but JUST with that one host:port) and all of the other responses
from all of the other hosts/ports that reply later would instead get back
something like a SYN+NAK or a reset or some ICMP "unreachable" error packet,
or at any rate _something_ to just make them go away.
So? What do y'all think? Is there any merit to this idea?
My own feeling is that an `mconnect' kernel call could be quite useful
for constructing all manner of multiplexer daemons. But then what do I
The important point here is that for `mconnect' the three-way TCP connect
handshake is fully completed for only at most _one_ of the designated
host:port pairs that we have attempted (in parallel, with mconnect) to
connect to. (The parallelism is of course the _other_ important point...
I'd like to be able to effectively _try_ to connect to a whole bunch of
other servers elsewhere, all at the same time. But I want to do that
sometimes *even though* I really only need to complete one of the con-
nections that I'm trying.)
So anyway, the second thing that kind-of would have been nice to have
would have been another kernel-call feature which is pretty much the
exact opposite of what I just described above, i.e. a "multi-bind"
feature, (let's call it `mbind') where mbind would accept a list or
vector of (struct sockaddr *) pointers and then listen for incoming
connects on _all_ of the specified host:port pairs, but just using
one single socket FD, just as the regular bind(2) kernel call does.
In a way, it seems really rather strange to me that we don't have this
exact kind of feature. I mean hay! Isn't this almost what we are doing
anyway when we bind to the address INADDR_ANY (and where the local
machine does have two or more IP addresses associated with it)?
OK, so since the kernel already knows how to listen on multiple host:port
pairs for incoming TCP connects, and since it knows how to do this using
only a single userland socket FD, why not export a more generalized and
flexible form of this same functionality from the kernel and out to where
mere mortals like me could make use of it?
Well, that's my argument anyway, such as it is.
OK, I now brace for the inevitable slings and arrows that almost always
befall any crackpot (such as myself) who goes 'round suggesting new
kernel enhancements without having read the relevant kernel code. :-)
Please try to be nice and don't whack me too hard. I am effectively
pre-shielded by the fact that I have already openly admitted to being
fundamentally kernel-ignorant. (It is not permitted to be mean to
anybody this humble. :-)
P.S. It you try replying to me via personal e-mail, and if that bounces,
please accept my advance apologies for my over-agressive local spam
filters, and then just use the form here:
More information about the freebsd-net