hpasmcli locks up a DL380G3
Mike Holloway
mikhollo at cisco.com
Tue Feb 28 14:56:40 PST 2006
These are probably of no use for debugging, but here are a couple of
presumably non-maskable-interrupts that were logged just as 2 servers
rebooted (guess which one of these machines is in Australia):
Feb 28 13:28:36 host3 kernel: <<2<<>22>2>NN>NMNMIMMII I II SISIASA
SA A 33330000,,, E, EI EISEIAISS AA S fAf ffff
Feb 28 13:28:36 host3 kernel:
Feb 28 13:28:36 host3 kernel: f
Feb 28 13:28:36 host3 kernel: f
Mar 1 06:30:51 host14 kernel: NMI INSMAI NINMSMAII
3II0SS3,AA0 ,E IESIAS A 3ff3f00f,
Mar 1 06:30:51 host14 kernel:
Mar 1 06:30:51 host14 kernel:
Mar 1 06:30:51 host14 kernel: <
Mar 1 06:30:51 host14 kernel: 2><
Mar 1 06:30:51 host14 kernel: 2,> EEIISSAA ffff
Mar 1 06:30:51 host14 kernel:
-mike
On Feb 28, 2006, at 4:38 PM, Mike Holloway wrote:
> >> Hi!
> >>
> >> Sorry for being late on this one, found this browsing around.
> >>
> >> Yes, I have had ONE machine lock up on me once.
> >> And older HP Proliant DL380G1 UP. Just as you describe, it had been
> >> working great for
> >> a couple of weeks, then suddenly when starting hpasmcli it froze.
> >> Couldn't even ping the machine.
> >>
> >> This particualar machine really is not doing anything, and as I
> belive
> >> it still is running (Moved/changed job) and I could probably
> >> recreate the lockup.
> >> I still have access to this machine, so if anyone want me to try
> >> something, I can do it.
> >> The machine is 600km away from me now, so if lockup occurs it can
> >> take some time to get it
> >> powercycled though.
> >>
> >> Oh, 5.3 or 5.4 as I recall.
> >>
> >> Have you seen any other lockup Greg?
> >
> >I haven't tempted fate that way yet. I always restart the hpasmd
> >before using the client on a machine. This seems to avoid the
> problem.
> >
> >Thanks for responding to my mail, you're the third person to confirm
> >the problem, which given that it locks the machine up hard, is a very
> >serious one.
> >
> >best.
> >greg.
>
>
> Besides the hpasmcli tool hanging just after the banner message,
> I've also experienced reboots caused by hpasmd, and have had to
> remove it completely from my test lab servers. I was able to find
> a scenario which would invariably cause the servers to reboot, I
> had hpasmd running on approximately 20 HP DL380 G4 servers all
> running the same customized FreeBSD 6.0 release kernel on x86
> (intel xeon).
>
> All machines were configured to run hpasmcli -s "show temps;" every
> 5 minutes, within a perl wrapper around hpasmcli (included below)
> which would kill the perl wrapper process (and so hpasmcli) via an
> ALARM signal if hpasmcli didn't exit within 45 seconds. Within a
> few hours, a few machines would show the hpasmcli tool hanging and
> only displaying the banner message. Cron was continuing to run-and-
> kill the hung hpasmcli tool every 5 minutes for some period of
> hours before I would notice. After commenting out the cron job and
> verifying that no hpasmcli processes existed, I could then stop
> hpasmd via the init script, which sends a TERM signal to the
> process followed by a KILL signal a couple of seconds later.
> Without exception those servers would spontaneously reboot a few
> minutes (2-5) later. On servers that the hpasmcli tool hadn't yet
> hung, I could stop hpasmd with no ill effects to the system.
>
>
> John, are you still working on this very useful tool? I can
> provide access to a DL380 G4 if you need a platform to test on.
>
>
> -mike
>
>
> #!/usr/bin/perl
>
> eval {
> local $SIG{ALRM} =
> sub {
> local $SIG{HUP} = 'IGNORE';
> kill 1,(-$$);
> };
> alarm 45;
> system ("/usr/sbin/hpasmcli -s \"show temps;\"");
> alarm 0;
> };
>
> $SIG{HUP} = 'DEFAULT';
>
> exit 0;
> _______________________________________________
> freebsd-proliant at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-proliant
> To unsubscribe, send any mail to "freebsd-proliant-
> unsubscribe at freebsd.org"
>
More information about the freebsd-proliant
mailing list