Problem with IPMI KCS driver

Doug Ambrisko ambrisko at ambrisko.com
Thu Oct 18 21:44:11 UTC 2012


On Thu, Oct 18, 2012 at 01:44:59PM +0400, Anton Yuzhaninov wrote:
| On 28.09.2012 16:48, John Baldwin wrote:
| >>kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 point to
| >>>  /usr/src/sys/dev/ipmi/ipmi_kcs.c:94
| >>>
| >>>     91                 while (ticks - start<  MAX_TIMEOUT&&
| >>>     92                     !(status&  KCS_STATUS_OBF)) {
| >>>     93                         DELAY(100);
| >>>     94                         status = INB(sc, KCS_CTL_STS);
| >>>     95                 }
| >Hummm.  I'm a bit out of ideas then.  Even the volatile change is a bug 
| >that
| >could have been confirmed (to see if volatile was preventing the compiler
| >from caching the value of 'ticks') by examining the assembly.
| >
| >Well, maybe this.  This just avoids using 'ticks' altogether and depends on
| >DELAY(100) doing what it says:
| 
| New patch also don't solve my problem.
| 
| My guess was wrong. Loop in kcs_wait_for_obf() is not endless, at least 
| with last patch.
| Whole function called in some loop, but because loop in kcs_wait_for_obf() 
| takes much CPU time, backtrace always point to loop kcs_wait_for_obf().

Yep, the IPMI local interfaces are polled so they use a lot of CPU
so it pretty much always going to be checking "are you done yet"
once a command is submitted.  We have local patches here that changes
the DELAY into a tsleep when the system is running.  It has the bad
feature of making it a lot slower but uses far less CPU.  So for us
it is a good trade off.  One reason to put it into a loop is
so things happen in order and are not interrupted.  I guess a different
approach might be to do a "big" lock around the entire submit and
get response code fargment.  Then it would be expensed against the
application thread running in the kernel.

We also have local changes to all it to run in polled mode without
the kernel thread when we are dumping a kernel backtrace into the
IPMI system event log.  That's nice when the kernel core hasn't
worked on a remote machine but we see the back trace in SEL.
 
| This problem need further investigation.

It might be good to instrument the code in ipmi.c in which it
sending a command and then getting status.  If that is actually
looking okay then maybe some application is doing something bad.

Doug A.


More information about the freebsd-stable mailing list