[stable-ish 9] Dell R815 ipmi(4) attach failure

John Baldwin jhb at freebsd.org
Wed Apr 4 12:13:49 UTC 2012


On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
> John Baldwin writes:
> | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
> | > Doug Ambrisko writes:
> | > | John Baldwin writes:
> | > | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
> | > | | > Sean Bruno writes:
> | > | | > | Noting a failure to attach to the onboard IPMI controller with 
this 
> | dell
> | > | | > | R815.  Not sure what to start poking at and thought I'd though 
this 
> | over
> | > | | > | here for comment.
> | > | | > | 
> | > | | > | -bash-4.2$ dmesg |grep ipmi
> | > | | > | ipmi0: KCS mode found at io 0xca8 on acpi
> | > | | > | ipmi1: <IPMI System Interface> on isa0
> | > | | > | device_attach: ipmi1 attach returned 16
> | > | | > | ipmi1: <IPMI System Interface> on isa0
> | > | | > | device_attach: ipmi1 attach returned 16
> | > | | > | ipmi0: Timed out waiting for GET_DEVICE_ID
> | > | | > 
> | > | | > I've run into this recently.  A quick hack to fix it is:
> | > | | > 
> | > | | > Index: ipmi.c
> | > | | > 
===================================================================
> | > | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
> | > | | > retrieving revision 1.14
> | > | | > diff -u -p -r1.14 ipmi.c
> | > | | > --- ipmi.c	14 Apr 2011 07:14:22 -0000	1.14
> | > | | > +++ ipmi.c	31 Mar 2012 19:18:35 -0000
> | > | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
> | > | | >  	if (error == EWOULDBLOCK) {
> | > | | >  		device_printf(dev, "Timed out waiting for 
GET_DEVICE_ID\n");
> | > | | >  		ipmi_free_request(req);
> | > | | > -		return;
> | > | | >  	} else if (error) {
> | > | | >  		device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
> | > | | >  		ipmi_free_request(req);
> | > | | > 
> | > | | > The issue is that the wakeup doesn't actually wake up the msleep
> | > | | > in ipmi_submit_driver_request.  The error being reported is that
> | > | | > the msleep timed out.  This doesn't seem to be critical problem
> | > | | > since after this things seemed to work work.  I saw this on 9.X.
> | > | | > Haven't seen it on 8.2.  Not sure about -current.
> | > | | > 
> | > | | > It doesn't happen on all machines.
> | > | | 
> | > | | Hmm, are you seeing the KCS thread manage the request but the 
wakeup() 
> | is 
> | > | | lost?
> | > | 
> | > | It was a couple of weeks ago that I played with it.  I put printf's
> | > | around the msleep and wakeup.  I saw the wakeup called but the sleep
> | > | not get it.  I can try the test again later today.  Right now my main
> | > | work machine is recovering from a power outage.  This was with 9.0 
> | > | when I first saw it.  This issue seems to only happen at boot time.
> | > | If I kldload the module after the system is booted then it seems to 
work 
> | > | okay.  The KCS part was working fine and got the data okay from the
> | > | request.  I haven't seen or heard any issues with 8.2.
> | > 
> | > With -current I patched ipmi.c with:
> | > Index: ipmi.c
> | > ===================================================================
> | > --- ipmi.c      (revision 233806)
> | > +++ ipmi.c      (working copy)
> | > @@ -523,7 +523,11 @@
> | >          * waiter that we awaken.
> | >          */
> | >         if (req->ir_owner == NULL)
> | > +{
> | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup 
> | %d\n",__FUNCTION__,__LINE__,ticks);
> | >                 wakeup(req);
> | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup 
> | %d\n",__FUNCTION__,__LINE__,ticks);
> | > +}
> | >         else {
> | >                 dev = req->ir_owner;
> | >                 TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, 
> | ir_link);
> | > @@ -543,7 +547,11 @@
> | >         IPMI_LOCK(sc);
> | >         error = sc->ipmi_enqueue_request(sc, req);
> | >         if (error == 0)
> | > +{
> | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep 
> | %d\n",__FUNCTION__,__LINE__,ticks);
> | >                 error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo);
> | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep 
> | %d\n",__FUNCTION__,__LINE__,ticks);
> | > +}
> | >         if (error == 0)
> | >                 error = req->ir_error;
> | >         IPMI_UNLOCK(sc);
> | > @@ -695,8 +703,11 @@
> | >         error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
> | >         if (error == EWOULDBLOCK) {
> | >                 device_printf(dev, "Timed out waiting for 
GET_DEVICE_ID\n");
> | > +               printf("DJA\n");
> | > +/*
> | >                 ipmi_free_request(req);
> | >                 return;
> | > +*/
> | >         } else if (error) {
> | >                 device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
> | >                 ipmi_free_request(req);
> | > 
> | > and get
> | >   # dmesg | grep ipmi
> | >   ipmi0: KCS mode found at io 0xca8 on acpi
> | >   ipmi1: <IPMI System Interface> on isa0
> | >   device_attach: ipmi1 attach returned 16
> | >   ipmi1: <IPMI System Interface> on isa0
> | >   device_attach: ipmi1 attach returned 16
> | >   ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
> | >   ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6201
> | >   ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6263
> | >   ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 6323
> | 
> | Actually, can you compile with:
> | 
> | options  	KTR
> | options  	KTR_COMPILE=KTR_SCHED
> | options 	KTR_MASK=KTR_SCHED
> | 
> | and then add a temporary hack to ipmi.c to set ktr_mask to 0 after
> | ipmi_submit_driver_request() returns in ipmi_startup()?  You can
> | then use 'ktrdump -ct' after boot to capture a log of what the scheduler
> | did including if it timed out the sleep, etc.  I think this would be
> | useful for figuring out what went wrong.  It does seem that it timed
> | out after 3 seconds.
> 
> Assuming I didn't mess up, the log should be at:
> 	http://people.freebsd.org/~ambrisko/ipmi_ktr_dump.txt
> again, I using ipmi(4) as module loaded via the loader.

If you use "-ct" then you get a file you can feed into schedgraph.  However, 
just reading the log, it seems that IRQ 20 keeps preempting the KCS worker 
thread preventing it from getting anything done.  Also, there seem to be a lot 
of threads on CPU 0's runqueue waiting for a chance to run (load average of 12 
or 13 the entire time).  You can try just bumping up the max timeout from 3
seconds to higher perhaps.  Not sure why IRQ 20 keeps firing though.  It
might be related to USB, so you could try fiddling with USB options in the
BIOS perhaps, or disabling the USB drivers to see if that fixes IPMI.

-- 
John Baldwin


More information about the freebsd-stable mailing list