Strange system lockups - kernel saying disk error

Dave dave at g8kbv.demon.co.uk
Sat Jun 4 16:38:25 UTC 2011


On 3 Jun 2011 at 15:09, Kaya Saman wrote:

> Hi,
> 
> I have an ancient pre-HT PIV machine with <500MB RAM.
> 
> The system has an extra PCI->SATA card installed so I can  make use of
> modern high capacity drives.
> 
> Everything was running fine until round about 2 days ago when the
> system started locking up on me?
> 
> 
> Current drive configuration for the system is:
> 
> 40GB IDE drive as root (ad2) - UFS2
> 500GB IDE drive for storage (ad3) - EXT3
> 1TB SATA drive for storage (ad4) - UFS2
> 750GB SATA drive for storage (ad8) - EXT3
> 
> I had an issue with the 750GB drive which the file system seemed to
> have got corrupted so I powered down and backed the information up to
> a 2TB SATA drive using ddrescue and the Gentoo Linux based System
> Rescue CD. I put the 2TB drive in place of the 1TB ad4 drive
> physically.
> 
> Once backed up I powered down again and re-installed the 1TB SATA
> drive into ad4 position on system and completely removed the 2TB
> backup.
> 
> When booted back into FreeBSD upon boot I received this error:
> 
> 
>   WARNING:  Kernel Errors Present
>      ad4: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
>      error=4<ABORTED>  LBA=1 ...:  1 Time(s)
>      g_vfs_done():ad4e[WRITE(offset=974444691456, length=16384)]error
>      = 5 ...:  1 Time(s)
> 
> 
> The current status of the disks seemed to be ok though:
> 
>   1 Time(s): ad2: 38166MB<Seagate ST340014A 3.06>  at ata1-master
>   UDMA33 1 Time(s): ad2: DMA limited to UDMA33, controller found
>   non-ATA66 cable 1 Time(s): ad3: 476940MB<Seagate ST3500630A 3.AAF> 
>   at ata1-slave UDMA33 1 Time(s): ad3: DMA limited to UDMA33,
>   controller found non-ATA66 cable 1 Time(s): ad4: 953869MB<SAMSUNG
>   HD103SJ 1AJ10001>  at ata2-master SATA150 1 Time(s): ad8:
>   715404MB<Seagate ST3750640AS 3.AAE>  at ata4-master SATA150 1
>   Time(s): agp0:<SiS 651 host to AGP bridge>  on hostb0 1 Time(s):
>   ata0:<ATA channel 0>  on atapci0 1 Time(s): ata0: [ITHREAD] 1
>   Time(s): ata1:<ATA channel 1>  on atapci0 1 Time(s): ata1: [ITHREAD]
>   1 Time(s): ata2:<ATA channel 0>  on atapci1 1 Time(s): ata2:
>   [ITHREAD] 1 Time(s): ata3:<ATA channel 1>  on atapci1 1 Time(s):
>   ata3: [ITHREAD] 1 Time(s): ata4:<ATA channel 2>  on atapci1 1
>   Time(s): ata4: [ITHREAD] 1 Time(s): ata5:<ATA channel 3>  on atapci1
> 
> 
> In order to test if the error was due to disk failure I powered down
> and disconnected the ad4 and ad3 disks and powered back up.
> 
> 
> The system still seems to be locking on me and I can't understand why?
> 
> 
> Through Google'ing a discovered a post by Jeremy Chadwick about these
> kinds of errors:
> 
> http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting
> 
> however since the system board is pre-SATA is doesn't even have 
> S.M.A.R.T. so I'm totally lost on how to fix this. I mean the best
> remedy would be to get a new computer and migrate the stored
> information (something like this is on the way) but currently I don't
> have access to any of the disks at all and to make matters worse no
> NTP or DNS server as I was running these services on the same machine
> or TFTP boot server for my IP phones. - I do run multiboot UNIX on my
> notebook so Bind9 is naturally installed hence me writing this but I
> only activate in emergencies.
> 
> I mean one way I thought of for fixing this would be to grab a USB ->
> ATA/SATA adapter:
> 
> http://www.startech.com/product/USB2SATAIDE-USB-20-to-IDE-or-SATA-Adap
> ter-Cable
> 
> and hook the drives up to both Linux and FreeBSD in my notebook and
> copy the information across to the new system when it arrives in a few
> months.
> 
> 
> Aside from that is there anyway to fix the kernel error quickly?
> 
> 
> Thanks,
> 
> 
> Kaya
> 

Hmmm...  No backups then?

First, check the drive data cables.  Many do fail with age.  Some SATA 
types are made with Aluminium not copper, and are extremley fragile when 
they age.   If that doenst shed some light...

Take a look at    http://www.grc.com/spinrite.htm

Will often restore a failling drive to full use, if it's not mechanicaly 
damaged.   It can take time though, if any sector corruption is very bad.  
Days, weeks, even months have been see in some cases, but if the software 
keeps going, it usualy does the job.

It's not a Windows program, if anyting it's a DOS program, but comes with 
it's own FreeDOS system to boot and run from, so you don't even need an 
OS on the machine to test!   It will work with IDE or SATA types, even 
over a USB adapter if needed (but then it can't access any SMART data the 
drive may have) but it'll run a lot slower as it won't be aware of the 
drive's detailed physical timing etc.

I've used it on WIndows and Linux machines in anger, and the FreeBSD box 
when I got it (an old Gateway E-1400) to make sure the drive was healthy.

It's the hard drive equivalent of Memtest86, and you know how good that 
is.

Even if it doesn't report any problems found, often it will cause the 
drive to maitain things itself, improving performance as a result.

Even if the recovered drive is still less than 100% happy, or some of 
your data is not recoverable, you can then get the rest of your data off 
it, onto something new, fairly sure you have a good copy.  (How to you 
"*Know*" you have a good copy, if you just do a bitcopy of the suspect 
drive?)

OK, so it's not free, but if it works for you once, it's paid for itself 
many times over.   I (and others) also use it to check new drives before 
putting into use.  Even new-out-the-box drives can have latent problems 
the makers have not found.  This will find and flag them, preventing your 
OS from using bad sectors.

Anyone who has ever repaired hard drives of old (the old 14" types) to 
component level, also physicaly changing platters and re-aligning them, 
will appreciate just what this program can do.

If however the drive does not identify itself to the host PC correctly, 
chances are, it's a "Lobster" (throw it.)  But if you have identical 
drives, swapping the electronics card can sometimes get your data back.

Best Regards

DaveB



More information about the freebsd-questions mailing list