Strange system lockups - kernel saying disk error
dave at g8kbv.demon.co.uk
Sat Jun 4 16:38:25 UTC 2011
On 3 Jun 2011 at 15:09, Kaya Saman wrote:
> I have an ancient pre-HT PIV machine with <500MB RAM.
> The system has an extra PCI->SATA card installed so I can make use of
> modern high capacity drives.
> Everything was running fine until round about 2 days ago when the
> system started locking up on me?
> Current drive configuration for the system is:
> 40GB IDE drive as root (ad2) - UFS2
> 500GB IDE drive for storage (ad3) - EXT3
> 1TB SATA drive for storage (ad4) - UFS2
> 750GB SATA drive for storage (ad8) - EXT3
> I had an issue with the 750GB drive which the file system seemed to
> have got corrupted so I powered down and backed the information up to
> a 2TB SATA drive using ddrescue and the Gentoo Linux based System
> Rescue CD. I put the 2TB drive in place of the 1TB ad4 drive
> Once backed up I powered down again and re-installed the 1TB SATA
> drive into ad4 position on system and completely removed the 2TB
> When booted back into FreeBSD upon boot I received this error:
> WARNING: Kernel Errors Present
> ad4: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR>
> error=4<ABORTED> LBA=1 ...: 1 Time(s)
> g_vfs_done():ad4e[WRITE(offset=974444691456, length=16384)]error
> = 5 ...: 1 Time(s)
> The current status of the disks seemed to be ok though:
> 1 Time(s): ad2: 38166MB<Seagate ST340014A 3.06> at ata1-master
> UDMA33 1 Time(s): ad2: DMA limited to UDMA33, controller found
> non-ATA66 cable 1 Time(s): ad3: 476940MB<Seagate ST3500630A 3.AAF>
> at ata1-slave UDMA33 1 Time(s): ad3: DMA limited to UDMA33,
> controller found non-ATA66 cable 1 Time(s): ad4: 953869MB<SAMSUNG
> HD103SJ 1AJ10001> at ata2-master SATA150 1 Time(s): ad8:
> 715404MB<Seagate ST3750640AS 3.AAE> at ata4-master SATA150 1
> Time(s): agp0:<SiS 651 host to AGP bridge> on hostb0 1 Time(s):
> ata0:<ATA channel 0> on atapci0 1 Time(s): ata0: [ITHREAD] 1
> Time(s): ata1:<ATA channel 1> on atapci0 1 Time(s): ata1: [ITHREAD]
> 1 Time(s): ata2:<ATA channel 0> on atapci1 1 Time(s): ata2:
> [ITHREAD] 1 Time(s): ata3:<ATA channel 1> on atapci1 1 Time(s):
> ata3: [ITHREAD] 1 Time(s): ata4:<ATA channel 2> on atapci1 1
> Time(s): ata4: [ITHREAD] 1 Time(s): ata5:<ATA channel 3> on atapci1
> In order to test if the error was due to disk failure I powered down
> and disconnected the ad4 and ad3 disks and powered back up.
> The system still seems to be locking on me and I can't understand why?
> Through Google'ing a discovered a post by Jeremy Chadwick about these
> kinds of errors:
> however since the system board is pre-SATA is doesn't even have
> S.M.A.R.T. so I'm totally lost on how to fix this. I mean the best
> remedy would be to get a new computer and migrate the stored
> information (something like this is on the way) but currently I don't
> have access to any of the disks at all and to make matters worse no
> NTP or DNS server as I was running these services on the same machine
> or TFTP boot server for my IP phones. - I do run multiboot UNIX on my
> notebook so Bind9 is naturally installed hence me writing this but I
> only activate in emergencies.
> I mean one way I thought of for fixing this would be to grab a USB ->
> ATA/SATA adapter:
> and hook the drives up to both Linux and FreeBSD in my notebook and
> copy the information across to the new system when it arrives in a few
> Aside from that is there anyway to fix the kernel error quickly?
Hmmm... No backups then?
First, check the drive data cables. Many do fail with age. Some SATA
types are made with Aluminium not copper, and are extremley fragile when
they age. If that doenst shed some light...
Take a look at http://www.grc.com/spinrite.htm
Will often restore a failling drive to full use, if it's not mechanicaly
damaged. It can take time though, if any sector corruption is very bad.
Days, weeks, even months have been see in some cases, but if the software
keeps going, it usualy does the job.
It's not a Windows program, if anyting it's a DOS program, but comes with
it's own FreeDOS system to boot and run from, so you don't even need an
OS on the machine to test! It will work with IDE or SATA types, even
over a USB adapter if needed (but then it can't access any SMART data the
drive may have) but it'll run a lot slower as it won't be aware of the
drive's detailed physical timing etc.
I've used it on WIndows and Linux machines in anger, and the FreeBSD box
when I got it (an old Gateway E-1400) to make sure the drive was healthy.
It's the hard drive equivalent of Memtest86, and you know how good that
Even if it doesn't report any problems found, often it will cause the
drive to maitain things itself, improving performance as a result.
Even if the recovered drive is still less than 100% happy, or some of
your data is not recoverable, you can then get the rest of your data off
it, onto something new, fairly sure you have a good copy. (How to you
"*Know*" you have a good copy, if you just do a bitcopy of the suspect
OK, so it's not free, but if it works for you once, it's paid for itself
many times over. I (and others) also use it to check new drives before
putting into use. Even new-out-the-box drives can have latent problems
the makers have not found. This will find and flag them, preventing your
OS from using bad sectors.
Anyone who has ever repaired hard drives of old (the old 14" types) to
component level, also physicaly changing platters and re-aligning them,
will appreciate just what this program can do.
If however the drive does not identify itself to the host PC correctly,
chances are, it's a "Lobster" (throw it.) But if you have identical
drives, swapping the electronics card can sometimes get your data back.
More information about the freebsd-questions