ahc + vinum raid5 deadlocks?

Tony Frank tfrank at optushome.com.au
Fri Feb 20 23:58:05 PST 2004

Cross posting to -scsi as it seems maybe related to my scsi
At least, the problems currently only appear when the scsi parts
are in use.

On Sat, Feb 21, 2004 at 02:47:49PM +1100, Tony Frank wrote:
> Refer earlier thread on -stable for more background.
> 4.9-STABLE (cvsup 20th Feb)
> Kernel was compiled with DDB, INVARIANTS, DIAGNOSTICS.
> All options removed from /etc/make.conf except 'NOPROFILE=TRUE'
> I get the following on the console before system freezes:
> ahc0: WARNING no command for scb 14 (cmdcmplt)
> QOUTPOS = 44

This occured while doing majority of i/o to vinum raid5 volume.

Since that time I have removed the vinum raid5 configuration
and am using the disks directly - ufs mounted on /dev/da0s1h.

When performing the same load benchmark I have received two
separate panic's:

First panic occurs due to KASSERT in ffs_read, refer:

(da0:ahc0:0:0:0): tagged openings now 40
(da1:ahc0:0:1:0): tagged openings now 40
(da2:ahc0:0:2:0): tagged openings now 40
(da3:ahc0:0:3:0): tagged openings now 40
Feb 21 15:48:16 raider su: tony to root on /dev/ttypc
panic: bp->b_resid != 0

syncing disks... Stopped at      siointr1+0x102: movl    $0,brk_state2.757
db> trace
siointr1(c0f98000,ccdc6c64,c027d546,c0f98000,10) at siointr1+0x102
siointr(c0f98000,10,0,0,0) at siointr+0xb
Xfastintr4(c1090800,1000040,600,20002,ccd54300) at Xfastintr4+0x16
lockmgr(c1090800,1030002,ccd5436c,c032c7a0,ccdc6cac) at lockmgr+0x1fc
vop_stdlock(ccdc6cc4,ccdc6cd4,c01bb269,ccdc6cc4,0) at vop_stdlock+0x20
ufs_vnoperatespec(ccdc6cc4) at ufs_vnoperatespec+0x15
vn_lock(ccd54300,20002,c032c7a0,c1129000,0) at vn_lock+0x71
ffs_sync(c1129000,2,c0b2d600,c032c7a0,c1129000) at ffs_sync+0x17f
sync(c032c7a0,0,c02b9660,c02d6048,100) at sync+0x63
boot(100,0,0,ccdc6de0,c02482a2) at boot+0x8a
panic(c02d6048,ccf5cbc0,5b,ccd8f000,400) at panic+0x79
ffs_read(ccdc6df4,0,ccdc6ea8,c02ec1e0,ccf5cbc0) at ffs_read+0x37a
ufs_readlink(ccdc6e38,ccdc6e68,c01b1c8a,ccdc6e38,c7f7bee0) at ufs_readlink+0x6b
ufs_vnoperate(ccdc6e38,c7f7bee0,c7f7bee0,ccdc6f80,c7f7bee0) at ufs_vnoperate+0x15
namei(ccdc6e80,c7f7bee0,2,ccdc6f80,8137400) at namei+0x302
stat(c7f7bee0,ccdc6f80,bfbe7810,bfbe8298,bfbea49c) at stat+0x41
syscall2(c027002f,2f,2f,bfbea49c,bfbe8298) at syscall2+0x209
Xint0x80_syscall() at Xint0x80_syscall+0x25

I have a core from this panic saved.  'trace' is about the extent of
my ddb skills at the moment though.

Checking the archives a similar problem was seen ~6months ago with
particular SCSI disk and having too high tags value.

>From what I can see, the ahc driver is forcing a max of 40 tags.
The SCSI hardware is fairly old but was working without problems in
the old system (Win based)

Second panic occurs in ffs_softdep, refer:

Feb 21 17:16:30 raider su: tony to root on /dev/ttyp8
(da1:ahc0:0:1:0): tagged openings now 40
Feb 21 17:21:29 raider su: tony to root on /dev/ttyp3
(da3:ahc0:0:3:0): tagged openings now 40
(da2:ahc0:0:2:0): tagged openings now 40
panic: handle_written_inodeblock: live inodedep

syncing disks... Stopped at      siointr1+0x102: movl    $0,brk_state2.757
db> trace
siointr1(c0f98000,ccf6bba0,c027d546,c0f98000,10) at siointr1+0x102
siointr(c0f98000,10,0,8,68c040) at siointr+0xb
Xfastintr4(c389c20c,10,c02be0a4,0) at Xfastintr4+0x16
biowait(c389c20c,c106b800,c11d1900,2,c02ec760) at biowait+0x37
bread(ccaaba00,18c040,2000,0,ccf6bc14) at bread+0xb2
ffs_update(ccd51c00,0,68c040,ccd51c00,ccf33580) at ffs_update+0xba
ffs_fsync(ccf6bc78) at ffs_fsync+0x358
ffs_sync(c1069c00,2,c0b2d600,c032c7a0,c1069c00) at ffs_sync+0xdb
sync(c032c7a0,0,c02b9660,c02d56e0,100) at sync+0x63
boot(100,0,c127bb00,ccf6bd18,c024416a) at boot+0x8a
panic(c02d56e0,1,c11a5c80,c38d6178,0) at panic+0x79
handle_written_inodeblock(c11a5c80,c38d6150) at handle_written_inodeblock+0x30e
softdep_disk_write_complete(c38d6150) at softdep_disk_write_complete+0x6a
biodone(c38d6150,1,68c040,c10efa48,c38d6150) at biodone+0x121
complete_rqe(c10efa20,0,c1028c00,f76,c1028d3c) at complete_rqe+0x651
biodone(c10efa20,c1028cb8,c10efa20,c0146d58,c16caac0) at biodone+0xf5
ad_interrupt(c16caac0,c032b7d4,ccf6be38,c0182482,c0fa7900) at ad_interrupt+0x3e7
ata_intr(c0fa7900,c16ccdc0,ccf6be8c,c027e4c2,c032b7d4) at ata_intr+0xd8
add_interrupt_randomness(c032b7d4,0,10,c3870010,c01a0010) at add_interrupt_rando
Xresume15() at Xresume15+0x2b
--- interrupt, eip = 0xc01ab6e5, esp = 0xccf6be80, ebp = 0xccf6be8c ---
bwillwrite(c16ccdc0,ccf6bf80,cce7b2a0,0,0) at bwillwrite+0x75
dofilewrite(cce7b2a0,c16ccdc0,a,824a00c,f76) at dofilewrite+0xa2
write(cce7b2a0,ccf6bf80,824a00c,bfbff7e0,bfbff800) at write+0x36
syscall2(bfbf002f,bfbf002f,822002f,bfbff800,bfbff7e0) at syscall2+0x209
Xint0x80_syscall() at Xint0x80_syscall+0x25

My searches show a few hits for -current and some old items from -stable.
Nothing jumps out at me as a possible solution?
This trace seems to suggest the problem may have been on the IDE disk?

Again the core has been saved so given suitable directions I can do something
with it.

> Hardware is currently Adaptec AHA-2940W S71 (F/w 1.19S8)
> Same issues occured with Adaptec AHA-2940UW/B (F/w 1.32S8)
> I have 4 x 4G IBM SCSI disks on the single internal 68-pin connector.
> Connections are good & termination is correct
> The 4 disks are combined into a single vinum raid5 plex.
> System is well ventilated and cool to the touch.
> Pentium 2 200Mhz, 128mb SDRAM, Asus P2V m/b.
> Currently takes ~2 hours of solid activity to trigger the issue.

These problems are occuring with load after about 30-50 mins of solid
a 'plain' "make -j4 buildworld buildkernel" completes ok if it runs
by itself without the extra load of my test bench.

> My 'test bench':
> Copying files over NFSv3/udp to vinum raid5 volume via cp & tar
> + make -j4 buildworld
> + Copying large dir trees (/usr/ports, /usr/obj, /usr/src)
> + cvsuping second copy of /usr/src
> + extracting tar archive of /usr/obj

Ideas are welcome,


More information about the freebsd-questions mailing list