Error attempting to read or write to /dev/st0: sense key MediumError

Mon Jan 17 07:26:55 PST 2005

> > > I am a newbie to linux servers and tape backups, and I have a 
> > > problem performing a simple 'tar -cf' backup.
> > > 
> > > The system in question:
> > > Redhat 9.0 on a Dell PowerEdge 600SC with
> > > Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
> > > 
> > > The tape drive is a DELL PowerVault 100T DDS4. It appears 
> as 'Python 
> > > 06408-XXX Rev: 9100' in /proc/scsi/scsi. It is the only 
> SCSI device 
> > > on a Adaptec 3960D Ultra160 SCSI adapter.
> > > 
> > > Using a brand new DDS-4 tape, I try the following:
> > > 
> > > [root at nccserver root]# tar cvf /dev/st0 data/
> > > data/
> > > ...
> > > tar: /dev/st0: Wrote only 0 of 10240 bytes 
> > > tar: Error is not recoverable: exiting now 
> > 
> > Does
> > 
> >   mt -f /dev/nst0 setblk 0
> > 
> > help any?  I've needed this in the past on exabyte 8mm tapes
> > with some drives.
> > 
> 
> Thanks for the tip James, but unfortunately that has does not seem to
> make any difference on this system. the tar command returns the same
> error messages.
> 

I have found out something that I think reveals the cause of the
problem.

First though, I think I should explain more about the situation. We have
two almost identically configured servers - a production server and a
test server. Up to now in this thread I have just been talking about the
production server. The only hardware difference I know of between these
two servers is in the RAM (test server has 256MB, production server has
512MB) and the DVD combo drive (one is LG, the other is Samsung).
Unfortunately the production server is in an office in Beijing, while
the test server is here with me in the UK. I cannot log into the
production server. Up to now I have been pretending that I have access
to the production server, for the sake of simplicity. The Beijing office
does not have much IT expertise, so I cannot easily ask them to open up
the server to check cabling, terminators, etc. But I can ask them to log
in and execute commands as root. Both were built and delivered by DELL,
although possibly from different assembly plants. Still, according to
the specs on the invoice, they are pretty much indentical. Moreover, the
production server has been writing to tape successfully (using tar) for
over a year until just recently, when it developed the fault described
in this thread. The test server is fully functional.

I have asked the IT rep in the Beijing office to send me the system logs
in /var/log/messages* . After comparing them with the system logs from
the test server, I have spotted error messages (apart from the errors
while using tar) that appear in the production server, but not in the
test server. These messages get logged during startup, at the point just
before it logs information about the AIC7XXX driver. The error messages
are the 4 lines that start with ahc_pci:

...
Jan 11 08:48:41 nccserver kernel: Freeing initrd memory: 308k freed
Jan 11 08:48:41 nccserver kernel: VFS: Mounted root (ext2 filesystem).
Jan 11 08:48:41 nccserver kernel: SCSI subsystem driver Revision: 1.00
Jan 11 08:48:41 nccserver kernel: ahc_pci:0:6:0: PCI error Interrupt at
seqaddr = 0x47
Jan 11 08:48:41 nccserver kernel: ahc_pci:0:6:0: Data Parity Error
Detected during address or write data phase
Jan 11 08:48:41 nccserver kernel: ahc_pci:0:6:1: PCI error Interrupt at
seqaddr = 0x46
Jan 11 08:48:41 nccserver kernel: ahc_pci:0:6:1: Data Parity Error
Detected during address or write data phase
Jan 11 08:48:41 nccserver kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI
SCSI HBA DRIVER, Rev 6.2.8
Jan 11 08:48:42 nccserver kernel:         <Adaptec 3960D Ultra160 SCSI
adapter>
Jan 11 08:48:42 nccserver kernel:         aic7899: Ultra160 Wide Channel
A, SCSI Id=7, 32/253 SCBs
Jan 11 08:48:42 nccserver kernel: 
Jan 11 08:48:42 nccserver kernel: scsi1 : Adaptec AIC7XXX EISA/VLB/PCI
SCSI HBA DRIVER, Rev 6.2.8
Jan 11 08:48:42 nccserver kernel:         <Adaptec 3960D Ultra160 SCSI
adapter>
Jan 11 08:48:42 nccserver kernel:         aic7899: Ultra160 Wide Channel
B, SCSI Id=7, 32/253 SCBs
Jan 11 08:48:42 nccserver kernel: 
Jan 11 08:48:42 nccserver kernel: blk: queue c256da14, I/O limit 4095Mb
(mask 0xffffffff)
Jan 11 08:48:42 nccserver kernel:   Vendor: ARCHIVE   Model: Python
06408-XXX  Rev: 9100
Jan 11 08:48:42 nccserver kernel:   Type:   Sequential-Access
ANSI SCSI revision: 03
Jan 11 08:48:42 nccserver kernel: blk: queue c256dc14, I/O limit 4095Mb
(mask 0xffffffff)
Jan 11 08:48:42 nccserver kernel: megaraid: v1.18h (Release Date: Thu
Feb  6 17:25:43 EST 2003)
Jan 11 08:48:42 nccserver kernel: megaraid: found 0x1000:0x1960:idx
0:bus 0:slot 7:func 0
Jan 11 08:48:42 nccserver kernel: scsi2 : Found a MegaRAID controller at
0xe085f000, IRQ: 5
Jan 11 08:48:42 nccserver kernel: scsi2 : Enabling 64 bit support
Jan 11 08:48:42 nccserver kernel: megaraid: [3.28:1.05] detected 1
logical drives
Jan 11 08:48:42 nccserver kernel: megaraid: supports extended CDBs.
Jan 11 08:48:42 nccserver kernel: megaraid: channel[1] is raid.
Jan 11 08:48:42 nccserver kernel: scsi2 : LSI Logic MegaRAID 3.28 254
commands 15 targs 4 chans 7 luns
Jan 11 08:48:42 nccserver kernel: scsi2: scanning virtual channel 0 for
logical drives.
Jan 11 08:48:42 nccserver kernel:   Vendor: MegaRAID  Model: LD0 RAID1
34678R  Rev: 3.28
Jan 11 08:48:42 nccserver kernel:   Type:   Direct-Access
ANSI SCSI revision: 02
Jan 11 08:48:42 nccserver kernel: blk: queue c256de14, I/O limit 4095Mb
(mask 0xffffffff)
Jan 11 08:48:42 nccserver kernel: scsi2: scanning virtual channel 1 for
logical drives.
Jan 11 08:48:42 nccserver kernel: scsi2: scanning virtual channel 2 for
logical drives.
Jan 11 08:48:42 nccserver kernel: scsi2: scanning physical channel 0 for
devices.
Jan 11 08:48:42 nccserver kernel: Attached scsi disk sda at scsi2,
channel 0, id 0, lun 0
Jan 11 08:48:42 nccserver kernel: SCSI device sda: 71020544 512-byte
hdwr sectors (36363 MB)
Jan 11 08:48:42 nccserver kernel: Partition check:
Jan 11 08:48:42 nccserver kernel:  sda: sda1 sda2 sda3 sda4 < sda5 sda6
sda7 sda8 sda9 sda10 >
Jan 11 08:48:42 nccserver kernel: LVM version 1.0.5+(22/07/2002) module
loaded
Jan 11 08:48:42 nccserver kernel: Journalled Block Device driver loaded
Jan 11 08:48:42 nccserver kernel: kjournald starting.  Commit interval 5
seconds
Jan 11 08:48:42 nccserver kernel: EXT3-fs: mounted filesystem with
ordered data mode.
...

ahc_pci sounds as though it is related to the aic7xxx driver. I will
start to investigate on the web what this error message means, but I
thought maybe somebody in this mailing list is able to understand what
is happening by just seeing the error messages.

Regards,
Herminio