cvs commit: src/etc Makefile sensorsd.conf src/etc/defaults rc.conf src/etc/rc.d Makefile sensorsd src/lib/libc/gen sysctl.3 src/sbin/sysctl sysctl.8 sysctl.c src/share/man/man5 rc.conf.5 src/share/man/man9 Makefile sensor_attach.9 src/sys/conf f

Wed Oct 17 06:42:50 PDT 2007

On Tuesday 16 October 2007 06:14:34 pm Constantine A. Murenin wrote:
> On 16/10/2007 17:01, John Baldwin wrote:
> 
> > On Monday 15 October 2007 10:57:48 pm Constantine A. Murenin wrote:
> > 
> >>On 15/10/2007, John Baldwin <jhb at freebsd.org> wrote:
> >>
> >>>On Monday 15 October 2007 09:43:21 am Alexander Leidinger wrote:
> >>>
> >>>>Quoting Scott Long <scottl at samsco.org> (from Mon, 15 Oct 2007
> >>>
> >>>01:47:59 -0600):
> >>>
> >>>>>Alexander Leidinger wrote:
> >>>>>
> >>>>>>Quoting Poul-Henning Kamp <phk at phk.freebsd.dk> (from Sun, 14 Oct
> >>>>>>2007 17:54:21 +0000):
> >>>>
> >>>>>>>listen to the various mumblings about putting RAID-controller status
> >>>>>>>under sensors framework.
> >>>>>>
> >>>>>>What's wrong with this? Currently each RAID driver has to come up
> >>>>>>with his own way of displaying the RAID status. It's like saying
> >>>>>>that each network driver has to implement/display the stuff you can
> >>>>>> see with ifconfig in its own way, instead of using the proper
> >>>>>>network driver interface for this.
> >>>>>>
> >>>>>
> >>>>>For the love of God, please don't use RAID as an example to support 
> > 
> > your
> > 
> >>>>>argument for the sensord framework.  Representing RAID state is 
> > 
> > several
> > 
> >>>>>orders of magnitude more involved than representing network state.
> >>>>>There are also landmines in the OpenBSD bits of RAID support that are
> >>>>>best left out of FreeBSD, unless you like alienating vendors and 
> > 
> > risking
> > 
> >>>>>legal action.  Leave it alone.  Please.  I don't care what you do with
> >>>>>lmsensors or cpu power settings or whatever.  Leave RAID out of it.
> >>>>
> >>>>Talking about RAID status is not talking about alienating vendors. I
> >>>>don't talk about alienating vendors and I don't intent to do. You may
> >>>>not be able to display a full blown RAID status with the sensors
> >>>>framework, but it allows for a generic "wors/works not" or
> >>>>"OK/degraded" status display in drivers we have the source for. This
> >>>>is enough for status monitoring (e.g., nagios).
> >>>
> >>>As I mentioned in the thread on arch@ where people brought up objections 
> > 
> > that
> > 
> >>>were apparently completely ignored, this is far from useful for RAID
> >>>monitoring.  For example, if my RAID is down, which disk do I need to
> >>>replace?  Again, all this was covered earlier and (apparently) ignored.
> >>>Also, what strikes me as odd is that I didn't see this patch posted again 
> > 
> > for
> > 
> >>>review this time around before it was committed.
> >>
> >>This has been addressed back in July. You'd use bioctl to see which
> >>exact disc needs to be replaced. Sensorsd is intended for an initial
> >>alert about something being wrong.
> > 
> > 
> > In July you actually said you weren't sure about bioctl(8). :)  But also, this 
> > model really isn't very sufficient since it doesn't handle things like drives 
> > going away, etc.  You really need to maintain a decent amount of state to 
> > keep all that, and this is far easier done in userland rather than in the 
> > kernel.  However, you can choose to ignore real-world experience if you 
> > choose.
> > 
> > Basically, by having so little data in hw.sensors if I had to write a RAID 
> > monitoring daemon I would just not use hw.sensors since it's easier for me to 
> > figure out the simple status myself based on the other state I already have 
> > to track (unless you write an event-driven daemon based on messages posted by 
> > the firmware in which case again you wouldn't use hw.sensors for that either).
> 
> There is no other daemon that you'd need, you'd simply use sensorsd for 
> this.  You could write a script that would be executed by sensorsd if a 
> certain logical disc drive sensor changes state, and then this script 
> would call the bio framework and give you additional details on why the 
> state was changed.

That's actually not quite good enough as, for example, I want to keep yelling
about a busted volume on a periodic basis until its fixed.  Also, having a volume
change state doesn't tell me if a drive was pulled.  On at least one RAID
controller firmware I am familiar with, the only way you can figure this out is
to keep track of which drives are currently present with a generation count and
use that to determine when a drive goes away.  Even my monitoring daemon for
ata-raid has to do this since the ata(4) driver just detaches and removes a drive
when it fails and you have no way to figure out which drive died as the kernel
thinks that drive no longer exists.

-- 
John Baldwin