device driver: cdesw questions?

Wed Jan 21 06:05:30 PST 2009

On Wed, 21 Jan 2009, Andriy Gapon wrote:

> Question 1: I am writing a driver that would use per-open private data 
> (among other features). Do I have to use D_TRACKCLOSE flag in this case? In 
> general I am a little bit confused about when d_close is invoked. Supposing 
> D_TRACKCLOSE is not set and multiple programs concurrently open, use and 
> close a device - when d_close is called - when one program closes its last 
> descriptor tied to the device or when the system-wide last such descriptor 
> is closed?

Kostik has already pointed at the cdevpriv API, but just to reiterate his 
point: most people will find the semantics of D_TRACKCLOSE confusing and 
consider them incorrect, so I would advise against using them.

> I also would like the driver to provide a select capability quite
> similar to that of network (e.g. TCP) sockets using d_poll. I.e. a
> userland program should be able to query when it can write data to the
> device without blocking and when it can read data without blocking, plus
> when an error occurred in the device/driver, so there is no point in
> further waiting.
> At this moment I am thoroughly confused by meaning of various event
> masks described in poll(2).  E.g. what is normal priority, non-zero
> priority and high priority.
> Which flags should I care about if all data is the same priority for me?
> Which flag(s) should I set when I'd like to communicate an error
> condition (e.g. similar to TCP connection reset)?

I find that the description of the polling flags is at best confusing in both 
man pages and specifications.  The best bet is to look at the existing TCP 
semantics, which are basically defined in sopoll_generic():

         if (events & (POLLIN | POLLRDNORM))
                 if (soreadable(so))
                         revents |= events & (POLLIN | POLLRDNORM);

         if (events & POLLINIGNEOF)
                 if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
                     !TAILQ_EMPTY(&so->so_comp) || so->so_error)
                         revents |= POLLINIGNEOF;

         if (events & (POLLOUT | POLLWRNORM))
                 if (sowriteable(so))
                         revents |= events & (POLLOUT | POLLWRNORM);

         if (events & (POLLPRI | POLLRDBAND))
                 if (so->so_oobmark || (so->so_rcv.sb_state & SBS_RCVATMARK))
                         revents |= events & (POLLPRI | POLLRDBAND);

A few observations:

- Neither POLLHUP nor POLLERR appear to be implemented for sockets (all
   protocols use sopoll_generic in practice).  This is surprising given the
   wording in the poll(2) man page.

- Make sure to distinguish POLLIN and POLLINIGNEOF -- the difference between
   soreadable() and the test in POLLIGNEOF is that POLLIN also considers
   SBS_CANTRCVMORE (i.e., at least half-close in the receive direction) but
   POLLIGNEOF doesn't.

I think Kostik's pointer to the pipe_poll() code is a good one, but if you're 
looking to emulate TCP semantics a bit more exactly, these differences should 
be taken into account.

Robert N M Watson
Computer Laboratory
University of Cambridge