Communications kernel -> userland

Marc Ramirez marc.ramirez at bluecirclesoft.com
Mon Jul 21 07:52:06 PDT 2003


Cool.  Thanks, everyone!  Messrs. Watson and Lambert have convinced me to
go the pseudo-device route.  I think that's really going to clean up a lot
of the code.

I'm so excited!

Thanks!

Marc.

On Sun, 20 Jul 2003, Robert Watson wrote:

>
> On Sat, 19 Jul 2003, Pawel Jakub Dawidek wrote:
>
> > Your choices are:
> > - device,
> > - sysctl,
> > - syscall.
>
> There are actually a few other more obscure ways to push information from
> the kernel to userspace, depending on what you want to accomplish.
>
> Write directly to a file from the kernel.  ktrace, system accounting, and
> ktr with alq all stream data directly to
> a file provided by an authorized user process.  quotas and UFS1
> extended attribute data are also written directly to a file.  On
> other operating systems, audit implementations frequently take the same
> approach -- when the goal is long term storage of data in a
> user-accessible
> form, but you don't want to stream it through a user process live, this
> is usually the preference.  Typically, when taking this approach, a
> special system call is used to notify the kernel of the target file to
> write to -- the file is created by the user process with appropriate
> protections.  Often, but not always, the system call is non-blocking and
> simply returns once the file is hooked up as a target, and continues
> until another system call cancels delivery, or switches it to a new
> target.
>
> Stream it through a device node.  If you need only one or a small number
> of processes to listen for events from the kernel, a common approach
> is a pseudo-device that acts like a file.  For example, syslogd listens
> on /dev/klog for log events from the kernel; some audit implementations
> also take this approach.  Our devd, usbd, and others similarly listen
> for system events that are exposed to user processes as data on a
> blocking pseudo-device.  One nice thing about this approach is that you
> can combine it with select(), kqueue(), et al, to do centralized event
> management in the application.  BPF also does this.  Both Arla and
> Coda take this approach for LPC'ing to userspace to request events
> as a result of VFS operations by processes.
>
> Expose it using a special socket type.  We expose routing data and
> network stack administrative controls as special reads, writes, and
> ioctls on various socket types.  I'm not a big fan of this approach,
> as it special cases a lot of bits, and requires you to get caught
> up in socket semantics.  However, one advantage of this approach is
> it makes the notion of multicast of events to multiple listeners easier
> to deal with, since each socket endpoint has automatic message buffering.
>
> There are some other odd cases in use as well.  The NFS locking code
> opens a specially named fifo (/var/run/lock) and writes messages to
> it, which are picked up by rpc.lockd.  The lock daemon pushes events
> back into the kernel using a special system call.  I don't really
> like this approach, as it has some odd semantics -- especially since
> it reopens the fifo for each operation, and there are credential/
> file system namespace inconsistencies.
>
> Of these approaches, my favorite are writing directly to a file, and using
> a psuedo-device, depending on the requirements.  They have fairly
> well-defined security semantics (especially if you properly cache the
> open-time credentials in the file case).  I don't really like the Fifo
> case as it has to re-look-up the fifo each time, and has some odd blocking
> semantics.  Sockets, as I said, involve a lot of special casing, so unless
> you're already dealing with network code, you probably don't want to drag
> it into the mix.  If you're creating big new infrastructure for a feature,
> I suppose you could also hook it up as a first class object at the file
> descriptor level, in the style of kqueue.  If it's relatively minor event
> data, you could hook up a new kqueue event type.  You could also just use
> a special-purpose system call or sysctl if you don't mind a lot of context
> switching and lack of buffering.
>
> Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
> robert at fledge.watson.org      Network Associates Laboratories
>
>
>
>


--
Marc Ramirez
Blue Circle Software Corporation
513-688-1070 (main)
513-382-1270 (direct)
www.bluecirclesoft.com


More information about the freebsd-hackers mailing list