cvs commit: src/etc Makefile sensorsd.conf src/etc/defaults rc.conf src/etc/rc.d Makefile sensorsd src/lib/libc/gen sysctl.3 src/sbin/sysctl sysctl.8 sysctl.c src/share/man/man5 rc.conf.5 src/share/man/man9 Makefile sensor_attach.9 src/sys/conf f

Wed Oct 17 08:05:03 PDT 2007

Alexander Leidinger wrote:
> Quoting John Baldwin <jhb at freebsd.org> (from Wed, 17 Oct 2007 09:07:06 
> -0400):
> 
>> On Tuesday 16 October 2007 06:14:34 pm Constantine A. Murenin wrote:
>>> On 16/10/2007 17:01, John Baldwin wrote:
> 
>>> > Basically, by having so little data in hw.sensors if I had to write 
>>> a RAID
>>> > monitoring daemon I would just not use hw.sensors since it's  
>>> easier for me to
>>> > figure out the simple status myself based on the other state I  
>>> already have
>>> > to track (unless you write an event-driven daemon based on  
>>> messages posted by
>>> > the firmware in which case again you wouldn't use hw.sensors for  
>>> that either).
>>>
>>> There is no other daemon that you'd need, you'd simply use sensorsd for
>>> this.  You could write a script that would be executed by sensorsd if a
>>> certain logical disc drive sensor changes state, and then this script
>>> would call the bio framework and give you additional details on why the
>>> state was changed.
>>
>> That's actually not quite good enough as, for example, I want to keep 
>> yelling
>> about a busted volume on a periodic basis until its fixed.  Also,  
>> having a volume
>> change state doesn't tell me if a drive was pulled.  On at least one RAID
>> controller firmware I am familiar with, the only way you can figure  
>> this out is
>> to keep track of which drives are currently present with a  generation 
>> count and
>> use that to determine when a drive goes away.  Even my monitoring 
>> daemon for
>> ata-raid has to do this since the ata(4) driver just detaches and  
>> removes a drive
>> when it fails and you have no way to figure out which drive died as  
>> the kernel
>> thinks that drive no longer exists.
> 
> Note, talking about interaction with bio or similar is not productive 
> ATM. On Sunday I had a discussion with scottl and he identified some 
> things with bio which don't make it a good choice for FreeBSD. 
> Unfortunately I didn't had time to take it off the ideas list so far. 
> Scott also agreed to come up with a description for a similar framework 
> that is is usable with our RAID drivers.
> 

John has the most recent experience of anyone with writing RAID
monitoring and control tools, and he's brought up some very good points
with about some of the specific technical challenges.  A simple sysctl
tree like hw.sensors is stateless, and that doesn't cut it for an
environment where devices can come and go.  More intelligence and state
is needed, and for that you need an event component to your framework.
I still have no strong opinion on whether FreeBSD-specific APIs like
sysctl and devd are the right mechanisms for this.  Maybe when it comes
to the storage side of monitoring, consolidating all information under
GEOM via /dev/geom.ctl is the right path, or maybe it isn't.  But
ultimately, what works for lmsensors or CPU throttling or arbitrary 1
wire or 3 wire buses might not work for a more complex system like
storage.

Scott