Panic during install on Sparc64 - Only with large HDD

Chris Gilbert Chris at LainOS.org
Sun Aug 14 17:57:45 GMT 2005


Well, I've continued looking into this problem as I really _really_ want to 
see it fixed for 6.0-RELEASE.

I did some general device stress-testing to make sure that is was directly 
triggerable and reproducible, and was not just an intermittent failure.

I have successfully created, and installed FreeBSD on (without any errors):

/dev/ad0a
/dev/ad0b
/dev/ad0c
/dev/ad0d
/dev/ad0e
/dev/ad0f

Even though the newfs on it failed, creating the slice itself worked for my 
large partition (/dev/ad0g).

Therefore, I can dd data to it, but I can't write a UFS filesystem to it in 
order to mount.

I then went about writing data to this filesystem for long periods of time to 
try and hit the problem:

# time dd if=/dev/urandom of=/dev/ad0g
143337401+0 records in
143337401+0 records out
73388749312 bytes transferred in 89392.318911 secs (820974 bytes/sec)
614.444u 41826.640s 24:49:52.35 47.4%   244+1708k 0+0io 0pf+0w

After this ran without a single error for about 20 hours, I stopped it and 
started trying to hit the block that triggered the issue manually.

After a few hours of "double and half(ing) " I finally managed to find the 
block:

# dd count=1 obs=1024 seek=93321655 if=/dev/urandom of=/dev/ad0g
1+0 records in
0+1 records out
512 bytes transferred in 0.001470 secs (348278 bytes/sec)

This one was successful... but the very next one:

# dd count=1 obs=1024 seek=93321656 if=/dev/urandom of=/dev/ad0g
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435456
ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435456
ad0: FAILURE - WRITE_DMA timed out LBA=268435456
dd: /dev/ad0g: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 16.453833 secs (0 bytes/sec)

And incrementing this by one block shows:

# dd count=1 obs=1024 seek=93321657 if=/dev/urandom of=/dev/ad0g
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435458
ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435458
ad0: FAILURE - WRITE_DMA timed out LBA=268435458
dd: /dev/ad0g: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 16.452722 secs (0 bytes/sec)

This makes perfect sense because my block size is specified at 1024 on the dd 
command, and the default blocksize is 512. Therefore, incrementing it by a 
single 1024 size block would return 2 blocks further in the LBA.

ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435456
(then...)
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435458

Bingo! We've finally found the wall!

I'm going to look further into the IDE chipset (atapci0: <AcerLabs M5229 
UDMA66 controller>) tonight. Both for it's whitepapers (To see if it has some 
sort of quirk or limitation around this area.) and it's FreeBSD driver, to 
see if something funky is going on.

As I said before, if anyone is interesting in helping me resolve this I would 
appreciate it greatly. This is a bug which has haunted me and several others 
since FreeBSD 5.2-RC2 and it needs to be fixed.

-- 
Thanks,
Chris (Lance) Gilbert
Ph: +45 33 73 29 31 (UTC +0100)


More information about the freebsd-current mailing list