hpasmcli locks up a DL380G3
Mike Holloway
mikhollo at cisco.com
Tue Feb 28 14:38:36 PST 2006
>> Hi!
>>
>> Sorry for being late on this one, found this browsing around.
>>
>> Yes, I have had ONE machine lock up on me once.
>> And older HP Proliant DL380G1 UP. Just as you describe, it had been
>> working great for
>> a couple of weeks, then suddenly when starting hpasmcli it froze.
>> Couldn't even ping the machine.
>>
>> This particualar machine really is not doing anything, and as I
belive
>> it still is running (Moved/changed job) and I could probably
>> recreate the lockup.
>> I still have access to this machine, so if anyone want me to try
>> something, I can do it.
>> The machine is 600km away from me now, so if lockup occurs it can
>> take some time to get it
>> powercycled though.
>>
>> Oh, 5.3 or 5.4 as I recall.
>>
>> Have you seen any other lockup Greg?
>
>I haven't tempted fate that way yet. I always restart the hpasmd
>before using the client on a machine. This seems to avoid the problem.
>
>Thanks for responding to my mail, you're the third person to confirm
>the problem, which given that it locks the machine up hard, is a very
>serious one.
>
>best.
>greg.
Besides the hpasmcli tool hanging just after the banner message, I've
also experienced reboots caused by hpasmd, and have had to remove it
completely from my test lab servers. I was able to find a scenario
which would invariably cause the servers to reboot, I had hpasmd
running on approximately 20 HP DL380 G4 servers all running the same
customized FreeBSD 6.0 release kernel on x86 (intel xeon).
All machines were configured to run hpasmcli -s "show temps;" every 5
minutes, within a perl wrapper around hpasmcli (included below) which
would kill the perl wrapper process (and so hpasmcli) via an ALARM
signal if hpasmcli didn't exit within 45 seconds. Within a few
hours, a few machines would show the hpasmcli tool hanging and only
displaying the banner message. Cron was continuing to run-and-kill
the hung hpasmcli tool every 5 minutes for some period of hours
before I would notice. After commenting out the cron job and
verifying that no hpasmcli processes existed, I could then stop
hpasmd via the init script, which sends a TERM signal to the process
followed by a KILL signal a couple of seconds later. Without
exception those servers would spontaneously reboot a few minutes
(2-5) later. On servers that the hpasmcli tool hadn't yet hung, I
could stop hpasmd with no ill effects to the system.
John, are you still working on this very useful tool? I can provide
access to a DL380 G4 if you need a platform to test on.
-mike
#!/usr/bin/perl
eval {
local $SIG{ALRM} =
sub {
local $SIG{HUP} = 'IGNORE';
kill 1,(-$$);
};
alarm 45;
system ("/usr/sbin/hpasmcli -s \"show temps;\"");
alarm 0;
};
$SIG{HUP} = 'DEFAULT';
exit 0;
More information about the freebsd-proliant
mailing list