kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade)

Markus Gebert markus.gebert at hostpoint.ch
Fri Jul 5 08:30:01 UTC 2013


The following reply was made to PR kern/179932; it has been noted by GNATS.

From: Markus Gebert <markus.gebert at hostpoint.ch>
To: bug-followup at FreeBSD.org,
 =?iso-8859-1?Q?Philipp_M=E4chler?= <philipp.maechler at hostpoint.ch>,
 "sean_bruno at yahoo.com" <sean_bruno at yahoo.com>
Cc:  
Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade)
Date: Fri, 5 Jul 2013 10:19:58 +0200

 Hey Sean
 
 I'm glad to hear you're getting the same controller as ours to test. In =
 the meantime it seems that the backported ciss changes from head seem to =
 help a lot on the G8 blades with the p220 controllers. It's quite likely =
 that the G8 problem is already fixed in head. Of course, we can't be =
 sure yet, but still it might be better to focus on the G7 with p410 and =
 storage blade, where the issue has occured even with ciss from head. So =
 it's good your getting a p410.
 
 We discussed your test scenario. ZFS is known to go nuts and do really =
 much IO once a zpool get quite full, so is your goal just to maximise IO =
 to reproduce the problem more reliably? Or is there a specific reason =
 why you want us to fill a zpool?
 
 Our problem is that half of the G7 blades are productive, so filling the =
 zpool is no option there. The second half is where the first half =
 replicates all data to, so they're kind of hot standby and we're more =
 flexibel doing tests there, but we still have to keep the replication =
 running, which makes filling the pool impossible as well.
 
 The day before yesterday we installed the patched kernel that has ciss =
 from head and CISS_DEBUG defined on all these standby systems. We run =
 zpool scrubs non-stop on all of them to generate IO and as they are =
 replication targets, they also receive some amount of write IO. Like =
 that, we hope to get a system to stall more often, so we can progress =
 more quickly debugging the G7 problem. If you think that more write IO =
 would help, we can look into using iozone, but a stated before, we won't =
 be able to do things like filling the zpool.
 
 Also, once a G7 blade stalls, is there any information apart from =
 alltrace and DDB ciss debug print you want as to pull out of the system?
 
 When reading through the ciss driver source I noticed that the DDB print =
 may only outpout information about the first controller. Since the =
 storage blade contains a second p410, do you think it'd be worth to =
 alter the debug function to print out information about any ciss =
 controller in the system?
 
 
 Markus
 


More information about the freebsd-scsi mailing list