sparc64/121539: Interrupt storm booting 7.0-R/sparc64 on ultra5

jpd at dsb.tudelft.nl jpd at dsb.tudelft.nl
Thu Mar 13 00:50:03 UTC 2008


The following reply was made to PR sparc64/121539; it has been noted by GNATS.

From: jpd at dsb.tudelft.nl
To: Marius Strobl <marius at alchemy.franken.de>
Cc: bug-followup at freebsd.org
Subject: Re: sparc64/121539: Interrupt storm booting 7.0-R/sparc64 on ultra5
Date: Thu, 13 Mar 2008 01:13:39 +0100

 On Wed, Mar 12, 2008 at 23:54:45 +0100, Marius Strobl wrote:
 [snip!]
 > Vector 2016 is the ATA controller and the ata(4)/acd(4) apparently
 > has some problems accessing the CD. Could you please check whether
 > the cabling and the drive are ok and functional?
 
 Apologies for the narrative. The answer to your question is in the next
 and the paragraphs before the last interrupt storm. The rest is me
 attempting to be thorough. In short: Yes, overall I think they're ok.
 
 
 I just checked and the cable on the hard drive end said 'click' when I
 pushed on it on the drive's side. A marginal connection seems likely.
 The other connections seem to be ok, if old ata33-only cables. The cdrom
 I swapped with a then-new dvd drive (IE it's not sun-original) and it
 should be ok. It was used for installing 5.4 and solaris 10 from dvd a
 while back. The system has been mostly offline in the meantime.
 
 I'd like to note that booting 5.4 (which I did before and after trying
 to boot 7.0 for the first time) didn't have the problem, but 7.0 did,
 both while booting from cdrom and from hard drive, so whether that was
 an actual marginal connection, I guess we'll find out next (see below).
 
 I probably should've made the connection between the one and the
 other notice, altough not knowing what vector 2016 was, I substituted
 ignorance and went ahead. I noticed that *eventually* it'll go through,
 maybe prodded along by sending a couple of breaks, at which point I
 rolled a 7.0 base+man over the previous one. Once it booted it stopped
 complaining, mostly.
 
 Then I checked out src and built a custom kernel. Installing it would
 get me DMA errors when it got to the twe module, altough (again) brute
 force eventually got around it.
 
 On a lark I checked out the relevant bits of ports and installed
 smartmontools, and ran an offline test. The output looked all green
 except for a non-zero but low (14) Reallocated_Event_Count. So I think
 the hard disk drive and presumably the dvd drive are in reasonable
 shape.
 
 While I'm writing this the machine sat twirling away just as it did
 before, *very slowly* twirling away loading the kernel (which it did do
 much faster even with the interrupt storm messages coming up later) and
 eventually getting to a bootstage, but it will then panic. If this keeps
 up after I get a fresh image on it, I'll ask for help about that.
 
   Consoles: Open Firmware console  
   
   Booting with sun4u support.
   
   FreeBSD/sparc64 bootstrap loader, Revision 1.0
   (root at obrian.cse.buffalo.edu, Sun Feb 24 17:36:50 UTC 2008)
   bootpath="/pci at 1f,0/pci at 1,1/ide at 3/disk at 0,0:a"
   Loading /boot/defaults/loader.conf 
   /boot/kernel/kernel data=0x412648+0x5b2a8 syms=[0x8+0x59340+0x8+0x4e312]
   /
   Hit [Enter] to boot immediately, or any other key for command prompt.
   Booting [/boot/kernel/kernel]...               
   nothing to autoload yet.
   jumping to kernel entry at 0xc0060000.
   Copyright (c) 1992-2008 The FreeBSD Project.
   Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
           The Regents of the University of California. All rights reserved.
   FreeBSD is a registered trademark of The FreeBSD Foundation.
   FreeBSD 7.0-RELEASE #1: Tue Mar 11 21:11:39 UTC 2008
       root at aquablue.local:/usr/src/sys/sparc64/compile/AQUABLUE
   panic: trap: memory address not aligned
   Uptime: 1s
 
 I might very well have forgotten something important in the compile,
 but I can't help but wonder why it started to load so slowly after I
 installed my custom kernel. Compile #0 worked, though. I'll see what
 happens when I get it to boot GENERIC again, compile again, and so
 forth.
 
 
 Now, long story short: I double-checked the connections, closed up
 the case, and booted GENERIC from the install cd again. Booting with
 hw.ata.atapi_dma=0 and .ata_dma=0 makes the interrupt storm go away,
 altough it will still complain:
 
 acd0: FAILURE - READ_BIG ILLEGAL REQUEST asc=0x64 ascq=0x00 
 GEOM_LABEL: Label for provider acd0 is iso9660/FreeBSD_Install.
 acd0: FAILURE - READ_BIG ILLEGAL REQUEST asc=0x64 ascq=0x00 
 
 Only three lines though. atapi_dma=0 and ata_dma=1 does the same.
 atapi_dma=1 and ata_dma=0 brings the interrupt storms back again.
 
 While in an emergency shell booted with hw.ata.atapi_dma=0 I managed to
 trigger an interrupt storm by accessing the cdrom (`ls') anyway:
 
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 ata2: reiniting channel ..
 ata2: reset tp1 mask=03 ostat0=51 ostat1=00
 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00
 ata2: reset tp2 stat0=50 stat1=00 devices=0x1<ATA_MASTER>
 ad0: setting PIO4 on CMD 646 chip
 ad0: setting WDMA2 on CMD 646 chip
 ata2: reinit done ..
 ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=376800
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 ata2: reiniting channel ..
 ata2: reset tp1 mask=03 ostat0=51 ostat1=00
 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00
 ata2: reset tp2 stat0=50 stat1=00 devices=0x1<ATA_MASTER>
 ad0: setting PIO4 on CMD 646 chip
 ad0: setting WDMA2 on CMD 646 chip
 ata2: reinit done ..
 ad0: TIMEOUT - READ_DMA retrying (0 retries left) LBA=376800
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 ata2: reiniting channel ..
 ata2: reset tp1 mask=03 ostat0=51 ostat1=00
 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00
 ata2: reset tp2 stat0=50 stat1=00 devices=0x1<ATA_MASTER>
 ad0: setting PIO4 on CMD 646 chip
 ad0: setting WDMA2 on CMD 646 chip
 ata2: reinit done ..
 ad0: FAILURE - READ_DMA timed out LBA=376800
 g_vfs_done():ad0a[READ(offset=192921600, length=16384)]error = 5
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 ata2: reiniting channel ..
 ata2: reset tp1 mask=03 ostat0=51 ostat1=00
 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00
 ata2: reset tp2 stat0=50 stat1=00 devices=0x1<ATA_MASTER>
 ad0: setting PIO4 on CMD 646 chip
 ad0: setting WDMA2 on CMD 646 chip
 ata2: reinit done ..
 ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=376800
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 interrupt storm detected on "vec2016:"; throttling interrupt source
 ata2: reiniting channel ..
 ata2: reset tp1 mask=03 ostat0=51 ostat1=00
 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00
 ata2: reset tp2 stat0=50 stat1=00 devices=0x1<ATA_MASTER>
 ad0: setting PIO4 on CMD 646 chip
 ad0: setting WDMA2 on CMD 646 chip
 ata2: reinit done ..
 ad0: FAILURE - READ_DMA timed out LBA=376800
 g_vfs_done():ad0a[READ(offset=192921600, length=16384)]error = 5
 ls: firmware: Input/output error
 ls: kernel.generic: Input/output error
 ls: zfs: Input/output error
 boot1           kernel/         loader.4th      loader.rc
 defaults/       kernel.5.4/     loader.conf     modules/
 device.hints    loader*         loader.help     support.4th
 Fixit# mount
 /dev/md0 on / (ufs, local)
 devfs on /dev (devfs, local)
 /dev/acd0 on /dist (cd9660, local, read-only)
 /dev/ad0a on /mnt (ufs, local)
 /dev/ad0d on /mnt/usr/local (ufs, local, soft-updates)
 Fixit# 
 
 I'm not sure why `zfs' reports an i/o error.
 


More information about the freebsd-sparc64 mailing list