[stable-ish 9] Dell R815 ipmi(4) attach failure
John Baldwin
jhb at freebsd.org
Tue Apr 3 12:55:45 UTC 2012
On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
> Doug Ambrisko writes:
> | John Baldwin writes:
> | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
> | | > Sean Bruno writes:
> | | > | Noting a failure to attach to the onboard IPMI controller with this
dell
> | | > | R815. Not sure what to start poking at and thought I'd though this
over
> | | > | here for comment.
> | | > |
> | | > | -bash-4.2$ dmesg |grep ipmi
> | | > | ipmi0: KCS mode found at io 0xca8 on acpi
> | | > | ipmi1: <IPMI System Interface> on isa0
> | | > | device_attach: ipmi1 attach returned 16
> | | > | ipmi1: <IPMI System Interface> on isa0
> | | > | device_attach: ipmi1 attach returned 16
> | | > | ipmi0: Timed out waiting for GET_DEVICE_ID
> | | >
> | | > I've run into this recently. A quick hack to fix it is:
> | | >
> | | > Index: ipmi.c
> | | > ===================================================================
> | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
> | | > retrieving revision 1.14
> | | > diff -u -p -r1.14 ipmi.c
> | | > --- ipmi.c 14 Apr 2011 07:14:22 -0000 1.14
> | | > +++ ipmi.c 31 Mar 2012 19:18:35 -0000
> | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
> | | > if (error == EWOULDBLOCK) {
> | | > device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
> | | > ipmi_free_request(req);
> | | > - return;
> | | > } else if (error) {
> | | > device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
> | | > ipmi_free_request(req);
> | | >
> | | > The issue is that the wakeup doesn't actually wake up the msleep
> | | > in ipmi_submit_driver_request. The error being reported is that
> | | > the msleep timed out. This doesn't seem to be critical problem
> | | > since after this things seemed to work work. I saw this on 9.X.
> | | > Haven't seen it on 8.2. Not sure about -current.
> | | >
> | | > It doesn't happen on all machines.
> | |
> | | Hmm, are you seeing the KCS thread manage the request but the wakeup()
is
> | | lost?
> |
> | It was a couple of weeks ago that I played with it. I put printf's
> | around the msleep and wakeup. I saw the wakeup called but the sleep
> | not get it. I can try the test again later today. Right now my main
> | work machine is recovering from a power outage. This was with 9.0
> | when I first saw it. This issue seems to only happen at boot time.
> | If I kldload the module after the system is booted then it seems to work
> | okay. The KCS part was working fine and got the data okay from the
> | request. I haven't seen or heard any issues with 8.2.
>
> With -current I patched ipmi.c with:
> Index: ipmi.c
> ===================================================================
> --- ipmi.c (revision 233806)
> +++ ipmi.c (working copy)
> @@ -523,7 +523,11 @@
> * waiter that we awaken.
> */
> if (req->ir_owner == NULL)
> +{
> +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup
%d\n",__FUNCTION__,__LINE__,ticks);
> wakeup(req);
> +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup
%d\n",__FUNCTION__,__LINE__,ticks);
> +}
> else {
> dev = req->ir_owner;
> TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req,
ir_link);
> @@ -543,7 +547,11 @@
> IPMI_LOCK(sc);
> error = sc->ipmi_enqueue_request(sc, req);
> if (error == 0)
> +{
> +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep
%d\n",__FUNCTION__,__LINE__,ticks);
> error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo);
> +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep
%d\n",__FUNCTION__,__LINE__,ticks);
> +}
> if (error == 0)
> error = req->ir_error;
> IPMI_UNLOCK(sc);
> @@ -695,8 +703,11 @@
> error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
> if (error == EWOULDBLOCK) {
> device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
> + printf("DJA\n");
> +/*
> ipmi_free_request(req);
> return;
> +*/
> } else if (error) {
> device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
> ipmi_free_request(req);
>
> and get
> # dmesg | grep ipmi
> ipmi0: KCS mode found at io 0xca8 on acpi
> ipmi1: <IPMI System Interface> on isa0
> device_attach: ipmi1 attach returned 16
> ipmi1: <IPMI System Interface> on isa0
> device_attach: ipmi1 attach returned 16
> ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
> ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6201
> ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6263
> ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 6323
Actually, can you compile with:
options KTR
options KTR_COMPILE=KTR_SCHED
options KTR_MASK=KTR_SCHED
and then add a temporary hack to ipmi.c to set ktr_mask to 0 after
ipmi_submit_driver_request() returns in ipmi_startup()? You can
then use 'ktrdump -ct' after boot to capture a log of what the scheduler
did including if it timed out the sleep, etc. I think this would be
useful for figuring out what went wrong. It does seem that it timed
out after 3 seconds.
Also, it doesn't seem clear if pehaps the IPMI worker thread was
stalled behind another thread during boot. The KTR traces would show
us that if so.
I don't think the ipmi1 probe can cause the problem (it bails out right
away and shouldn't be touching any hardware state).
--
John Baldwin
More information about the freebsd-stable
mailing list