Interrupt (SCSI?) hang on 4.x

Jeremy Chadwick koitsu at FreeBSD.org
Tue Jan 2 10:04:40 PST 2007


On Tue, Jan 02, 2007 at 04:39:51PM +0000, Gavin Atkinson wrote:
> On Tue, 2007-01-02 at 07:36 -0800, Jeremy Chadwick wrote:
> > # vmstat -i
> > ata0 irq14                      6          0
> > fxp0 irq10                  14874         28
> > mux irq11                   65028        125
> > fdc0 irq6                       1          0
> > sio0 irq4                     948          1
> > clk irq0                   516187        998
> > rtc irq8                    66071        127
> > Total                      663115       1282
> 
> Do any of these numbers continue to increase after the hang?  You may
> find that if you are already logged in over the serial port before the
> hang and have run vmstat recently, it'll still be runnable due to it
> being cached.

When this problem is happening, at the login: prompt (via serial
console) once one types "root" and hits enter, one never gets a
Password: prompt.  This is likely because getpwent(3) and friends
attempt to read passwd/master.passwd from the disk, which obviously
hung due to the SCSI controller.

Therefore, one cannot log in and run any commands.

> If the serial port is dead, you will probably still find you can get
> output from the serial port, so start "date; vmstat -i" in a loop over
> the serial port before it hangs, and watch the output once it wedges.

Once the machine is hung like described, since running shell
commands (date/vmstat/even spawning sh itself) involves disk I/O,
this won't work.  If date and vmstat could be cached in memory
somewhere, this might work, but I don't know how one would do that.
(A memory filesystem could work, but pretty much all of / would
have to be there for this to work...)

The best I could do would be to have a cronjob or a process running
in a screen session which does date && vmstat -i over and over to a
log file, and examine that log once the machine hung like described.
This wouldn't tell us if the numbers were increasing/fluxuating
*after* the hang, though.  :-(

-- 
| Jeremy Chadwick                                 jdc at parodius.com |
| Parodius Networking                        http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP: 4BD6C0CB |



More information about the freebsd-stable mailing list