"ipmi0: KCS..." whines
John Baldwin
jhb at freebsd.org
Mon Aug 15 17:19:01 UTC 2016
On Friday, August 12, 2016 02:43:40 PM David Wolfskill wrote:
> On Fri, Aug 12, 2016 at 11:54:38AM -0700, John Baldwin wrote:
> > ...
> > So the issue is probably that the BMC controller on your box is sometimes
> > slow in responding. The completion code is the third byte of the reply
> > we wait to read after sending a request to the BMC via KCS. However, the
> > first two bytes just echo back the request ID and command we asked for, so
> > it may be that the BMC echoes those back right away without waiting for
> > whatever work it needs to do to handle the request to complete, but doesn't
> > send the completion code (the status of the request) until the request is
> > fully processed.
> >
> > The driver is complaining that the BMC didn't respond with the completion
> > code before it's timeout expired. The default timeout is MAX_TIMEOUT in
> > sys/dev/ipmi/ipmivars.h which corresponds to 6 seconds. It may be that
> > occasionally some "background" task runs in the BMC OS that delays responses
> > to handling commands. It could also be that whatever work the BMC has to do
> > to read this specific value is actually timing out or having issues in the
> > hardware, etc.
>
> I could easily modify the stress-test loop to run "date" after each
> "ipmitool" invocation. (Pity we don't seem to have a sub-second format
> in strftime().)
>
> So... I tried the above (interspersing "date" commands while running
> "ipmitool dcmi power reading" in a loop within script(1)). I did not
> get a whine at 32 repetitions; I got one at 64.
>
> The total elapsed time was no more than 3 seconds (last timestamp -
> first timestamp difference was 2 seconds).
Hmm, you might see what 'MAX_TIMEOUT' is in sys/dev/ipmi/ipmivars.h in your
tree. It might also be worthwhile wrapping it in ()'s as in HEAD it is just a
bare '6 * hz'. The code to wait for IBF doesn't look like it would break
without the ()'s though.
It was bumped from 3 seconds to 6 seconds back in 10-current in r253812, but
perhaps your box has 3 seconds instead of 6?
--
John Baldwin
More information about the freebsd-hackers
mailing list