i386/59253: ata device reset hangs if device is dead

dzuck at 1822direkt.com dzuck at 1822direkt.com
Thu Nov 13 06:20:13 PST 2003


>Number:         59253
>Category:       i386
>Synopsis:       ata device reset hangs if device is dead
>Confidential:   no
>Severity:       critical
>Priority:       low
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Nov 13 06:20:09 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Daniel Zuck
>Release:        FreeBSD 4.9-STABLE i386
>Organization:
1822direkt GmbH
>Environment:
System: FreeBSD dirham.1822direkt.com 4.9-STABLE FreeBSD 4.9-STABLE #3: Wed Nov 5 19:01:59 CET 2003 root at dirham.1822direkt.com:/usr/obj/usr/src/sys/DIRHAM i386
atapci0: <Intel PIIX4 ATA33 controller> port 0xfff0-0xffff at device 7.1 on pci0
ad0: 29325MB <Maxtor 2F030J0> [59582/16/63] at ata0-master UDMA33

	
>Description:
We experience problems with a certain charge of HDDs under different operating
plattforms. So even if the reason is pretty sure a hardware-issue, the ata
driver handles the situation for my understanding not so well.
The hardware-symptom is: The HDD dies while writing. The kernal messages are
like what you expect in that sort of trouble:
ad0: WRITE command timeout tag=0 serv=0 - resetting
ata0: resetting devices ..
However: The reset is never completed, as the device is really dead (until
you do a power-off, power-on reset), which causes - and that is the issue
of my PR - the kernel to freeze.
	
>How-To-Repeat:
Maybe you will not get the apropriate (faulty) hardware, so it may be
difficult to reproduce. But maybe you can HALT the device-electronics
with a certain signal. This should be okay to reproduce the device's 
behaviour.
As I leave together the faulty hardware for some time from now on,
I am willing to run code for testing on this platform and report the
results.
	
>Fix:
For my understanding the driver should never freeze the system, even if
in trouble with some messy faulty piece of hardware. However, if there
are some transactions needed, which require that they need to be done
non-preemtive, they should time-out after a certain period.
	


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-i386 mailing list