sensors framework continued (architecture)

Tue Nov 13 12:51:40 PST 2007

Quoting "Poul-Henning Kamp" <phk at phk.freebsd.dk> (Sun, 11 Nov 2007 11:30:44 +0000):

> In message <20071111113717.4803b3ab at deskjail>, Alexander Leidinger writes:
> 
> >> You use the select(2), poll(2) or kqueue(2) systemcall to wait until
> >> one of the several fd's the sensord(8) process will have to service
> >> becomes ready.
> >
> >This sounds like you propose more than one kernel access point for all
> >sensors. Maybe something like /dev/sensors/senX instead of
> >the /dev/sensor_interface I thought initially?
> 
> One device node is plenty:  /dev/sensors
> 
> >the /dev/sensor_interface I thought initially? What about the
> >hierarchic aspect (/dev/sensors/hierlev1/.../hierlevX/senY, ... this is
> >where I came up with my filesystem comment in the previous mail)?
> 
> There is no need to waste device nodes and vnodes on that, it can
> be more efficiently encoded inband, just like devd does.
> 
> >(the simple sensors)), or you issue an "poll now" ioctl each time you
> >want the data and wait for the return of the select/poll/kqueue. So in
> >the end you do a blocking wait, with slow sensors comming back later
> >than faster ones, and in the monitoring software those get attributed
> >to about the same time slot (if they are all polled at the same time).
> 
> That would only happen if you implement things in a truly pointless
> way.

For an active monitoring program (doing probes on it's own, instead of
waiting that probes connected to the monitoring program and deliver
data), it starts a probe and and polls for data. If it starts probes in
parallel, the returned data has about the same timestamp. If it doesn't
probes in parallel, there will be some seconds or even minutes in
difference, but all of those probes have a timestamp within the current
round of probing, which can be describes as one current timestamp (the
polling round). For a time-guy like you, this description is very
inaccurate, for a normal operator which has to monitor the monitoring
program (or a manger of the boss of the just mentioned operator), this
description fits. The scenario I was talking about (99% of the use
cases) doesn't require a precision to the second. For the remaining 1%
you should use a special software and don't complicate the normal case
for the other 99%.

> >As I don't like the generic poll logic for simple sensors (used in the
> >majority of use cases) in the kernel, let's look at the "poll
> >now"-case: 1 syscall for the "poll now" for each sensor (N calls),
> 
> Why couldn't you tell multiple sensors to poll in one syscall ?

You can do that for the one fd for all sensors case. I forgot to write
about this case at this place, sorry.

> >1-N
> >syscalls for waiting, and 1 syscall to read for each sensor (for only
> >one fd for all sensors, 
> 
> And read all the results in one read(2) operation, if they are ready ?

This is what you can do if all data is ready at the same time. But
what's the point of doing a select/poll/kevent, if you wait in the
kernel until all data is ready before returning something? You can do a
blocking read for this. See also my next sentence below.

> You sound like an IBM mainframe-guy: "There shall be one record per
> syscall only!" :-)

I was talking about the one fd per sensor part here, and you snipped
the part where I've read about the one fd for all sensors part.

> You could, best case, poll _all_ the sensors in two syscalls.

Great... in the best case I can poll all the sensors in one syscall
with the sysctl apporach (nothing prevents me from writting a
sysctl_xxx which returns the data from a subtree). It's just that the
best case will not happen often for sensors when you want to measure
some latency of a probe, in such cases you don't probe all at once and
wait that all data returns at the same time. And please, don't tell me
to do the latency measurement in the kernel, it would contradict your
requirement to do as less work as possible in the kernel (it's already
stretched too much by your suggestion to let the kernel poll the simple
sensors in a configurable time interval). If you get this info back
from a smart sensor, fine, you can have (MIB notation)
x.y.smart.data.value and x.y.smart.data.latency as separate sensors
which you can correlate. But I still think a smart sensor is better
attached to the userland sensor part, than to the kernel sensor part
(where to draw the line between having it in the kernel or not, is up
to the person writting the access code). BTW: you still haven't
answered my question about examples of real work sensors which are
smart.

> >The simple sysctl approach has N calls.
> 
> Which is a terrible waste of syscalls in my mind.

With just one syscall (see above for the sysctl approach) you can not
do latency measurement in userland. And latency measurement of simple
sensors doesn't belong into the kernel.

> >Again: when does it hurt that it takes longer?
> >
> >For sysctls you go directly to a sensor (benefit of the hierarchic
> >property), for the single fd approach you need additional code in the
> >kernel to go dispatch to the sensor,
> 
> You mean, code that isn't hampered by the sysctl semantics and which
> can do so in a very efficient way ?  Yes, that would be a great
> thing indeed.

I'm sill waiting for hard data where you show that sysctl semantics
hurt, and that it is not efficient enough, and that the more complex
development for the fd approach is necessary. You are good in skipping
the questions you don't want/can't answer. I think a lot of the
questions you skipped would show that a sysctl approach instead of
writting fd handling code is sufficient to cover 99% of cases.

> >and in the multiple fd case you
> >need to write some kind of filesystem logic to get the hierarchical
> >benefit.
> 
> Only you talk about one devicenode per sensor, please forget that
> red herring.

I wrote about both cases, a single fd and multiple fd's. Now that you
clarified that you talk about one fd, see my comment regarding
additional code for the single fd case in my last mail.

> >> Remember the userland access API ?  That will need to be serviced
> >> via some kind of interface, most likely a unix domain socket (although
> >> a shared memory based interface might also work).
> >
> >Why? We want a userland library to access it, so all tools which query
> >a sensor need to use this. This library can access the interface
> >directly [...]
> 
> No, then you clearly have not understood what people told you, the
> diagram looks like this:
> 
> 
> 	N * accessing application
> 		|
> 		|
> 	   N * sensor-library
> 		|
> 		|
> 	    1 * sensor daemon ---- N * sensor-library - N * userland sensors
> 		|	
> 		|
> 	     N * kernel sensors

This is what you understood (feel free to explain why you need N
sensor libraries, one is enough). The description allows another
interpretation:

       N * userland applications (a sensorsd, systat, ..)
                          |
                  1 * sensors library
                   |               |
   N * kernel sensors            N * userland sensors

It also allows this interpretation:

            single-system sensors framework (see note 1)
                          |       |
  1 * kernel sensors library     1 * userland sensors library
                   |               |
   N * kernel sensors            N * userland sensors

Note 1: this can be another lib, it can be one daemon, it can be N
applications (if it makes sense or not).

We didn't talk about this part in enough detail to say "the diagram
looks like this".

What we agree upon is, that we want a userland lib to abstract the
kernel interface away from an application programmer. This means that
programs which want to show data from kernel sensors need to use this
lib. You can not depend upon the fact, that there's always a sensor
daemon running. If you are in single user mode and need the data of a
sensor, you should be able to get it even without a sensor daemon
running.

If we extend the kernel sensor lib with stuff so that it also
understands userland sensors or not was not discussed at all. Having no
lib between sensord and the kernel in your drawing let's me thing you
haven't understood what the people where talking about.

> >You propose to write more code with more complex logic to get faster
> >to the sensor data.
> 
> No, I propose to solve the problem, rather than hack up bad code
> using bad interfaces for 20% of the problem.

I asked multiple times that you provide technical facts for the "bad
interface" part. So far you only provided suggestions for changes which
are beneficial for an insignificant amount of cases. Those changes
unnecessary complicate the code for 99% of cases.

> >I've done this. Passing strings down a fd from the kernel is no magic.
> >It's good for the kernel->userland part, but not for the
> >userland->kernel querying of only a subset of the sensors. 
> 
> Here is a straw-man API for the kernel<->userland device:
> 
> 	Kernel sends
> 		"S 32 acpi.cpu.0.temperature bla bla bla\n"
> 
> This means:  I have a sensor which I know as number 32, and it tells me
> it has these properties.
> 
> 	Userland does an ioctl:
> 
> 		SENSOR_POLL(32)
> 
> 	Kernel sends, when the data is ready,
> 		"D 32 34.45\n"
> 
> There you are, can it be any simpler ?

We have this already. It's called sysctl. Ok, the syntax is a little
bit different, but the syntax you provided is just an example, and not a
final decision.

> Amongst the points you totally overlook, is the fact that the sensors
> don't need to be a hierarchy in the kernel, as long as they tell
> sensord about their placement in the hierarchy.

How does a sensor know about their placement in the hierarchy if this
data is not in the kernel? It has to be in the kernel. And either the
sensor needs to know it's parent if you want that it returns its
placement in the hierarchy, or the parent needs to tell that the child
belongs to him. And it sounds like you want to write some additional
code to do this. By using sysctl to access the sensor, you get this for
free.

> In fact, if for no other reason, the tremendous overhead for the
> hierarchy in sysctl is reason not to use it.
> to sysctl for this.

How much is this tremendous overhead of sysctl, and when does it start
to be a bottleneck. As I asked those questions already without getting
an answer from you, I don't expect to get an answer now. As long as we
don't get answers, you talk about premature optimization (yes, you told
me already that you don't think that you talk about premature
optimization).

Warner, John, Robert, others: either I don't understand Poul's
arguments why the fd approach is better / all this things which can be
done in userland need to be done in the kernel / ..., or he doesn't
understand my arguments why the fd approach is not better / why those
things he proposes to be done in the kernel can be done in userland.
Could someone please help out and explain either him or me the parts we
fail to understand? That would be very nice, as it looks like we're
running around in a circle ATM.

Bye,
Alexander.

-- 
Only a mediocre person is always at their best.
http://www.Leidinger.net  Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org     netchild @ FreeBSD.org  : PGP ID = 72077137