Help debugging DMA_READ errors
clint.olsen at gmail.com
Tue Sep 16 23:17:05 UTC 2008
On Sep 16, Jeremy Chadwick wrote:
> That's very strange then. Something definitely tried to utilise acd0 at
> that hour of the night. What is acd0 connected to, ATA-wise? Again, I
> assume it's PATA, but I'd like to know the primary/secondary and
> master/slave organisation, since you are using a PATA disk too.
What's the best way to give you this? Generally with disks I try to
separate them from DVD/CD drives, so I don't think they are on the same
chain. Is the question whether or not the DVD/CD is a slave to the PATA
acd0: CDRW <Hewlett-Packard DVD Writer 100/1.37> at ata1-master UDMA33
> Looks fine, although I swore ATA controllers listed their IRQs. atapci0
> doesn't appear to have an IRQ associated with it (should be 14 or 15),
> so that's a little odd to me. vmstat -i would help here.
interrupt total rate
irq1: atkbd0 14 0
irq6: fdc0 1 0
irq12: psm0 1624 0
irq14: ata0 410187 14
irq15: ata1 225418 7
irq18: uhci2+ 111881 3
irq22: skc0 260062 9
cpu0: timer 56551841 1999
Total 57561028 2035
> Okay, there are some problems with your disks, but it's going to be
> impossible for me to determine if the below problems caused what you saw.
> First, ad0:
I just freed up a 300G SATA disk, so I can swap out the PATA drive if you
think it's worth the effort.
> 1) Run "smartctl -t short" on /dev/ad0 and /dev/ad4. You can safely use
> the disks during this time. After a few minutes (depends on how much
> disk I/O is happening; the more I/O, the longer the test takes to
> complete), you should see an entry in the SMART self-test log saying
> Completed. Once you see that, you should run smartctl -a on the disk
> again, and see if the attributes labelled "Offline" are different than
> they were before.
> 2) Consider running smartd. I do not normally advocate this, but in
> your case, it may be the only way to see which attribute values are
> actually changing on you if/when the issue happens again. Any time a
> value changes, it'll be logged via syslog. You can set up smartd.conf
> to ignore certain attributes (e.g. temperature, since that has a
> tendency to fluctuate up and down a degree).
I'm looking at that. The sample conf file that comes with it isn't the
easiest on the eyes, so I haven't figure out what configuration I want or
how to set it up yet.
My external hard drive is running around 50 in that small external
enclosure. That sounds bad.
190 Airflow_Temperature_Cel 0x0022 050 043 045 Old_age Always In_the_past 50 (Lifetime Min/Max 32/53)
194 Temperature_Celsius 0x0022 050 057 000 Old_age Always - 50 (0 21 0 0)
> If/when this happens again, you should be able to look at your logs and
> see what counters have changed. For example if you see something like
> Power_Cycle_Count or Stop_Start_Count increase, you have disks which are
> losing power.
> Welcome to the pain of debugging disk problems. :-)
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the freebsd-stable