ZFS HBAs + LSI chip sets (Was: ZFS hang (system #2))

Dennis Glatting freebsd at penx.com
Fri Oct 26 15:49:27 UTC 2012


On Tue, 2012-10-23 at 01:55 +0000, John wrote:
> ----- Dennis Glatting's Original Message -----
> > On Mon, 2012-10-22 at 09:31 -0700, Freddie Cash wrote:
> > > On Mon, Oct 22, 2012 at 6:47 AM, Freddie Cash <fjwcash at gmail.com> wrote:
> > > > I'll double-check when I get to work, but I'm pretty sure it's 10.something.
> > > 
> > > mpt(4) on alpha has firmware 1.5.20.0.
> > > 
> > > mps(4) on beta has firmware 09.00.00.00, driver 14.00.00.01-fbsd.
> > > 
> > > mps(4) on omega has firmware 10.00.02.00, driver 14.00.00.01-fbsd.
> > > 
> > > Hope that helps.
> > > 
> > 
> > Because one of the RAID1 OS disks failed (System #1), I replaced both
> > disks and downgraded to stable/8. Two hours ago I submitted a job. 
> > 
> > I noticed on boot smartd issued warnings about disk firmware, which I'll
> > update this coming weekend, unless the system hangs before then. 
> > 
> > I first want to see if that system will also hang under 8.3. I have
> > noticed a looping "ls" of the target ZFS directory is MUCH snappier
> > under 8.3 than 9.x. 
> > 
> > My CentOS 6.3 ZFS-on-Linux system (System #3) is crunching along (24
> > hours now). This system under stable/9 would previously spontaneously
> > reboot whenever I sent a ZFS data set too it.
> > 
> > System #2 is hung (stable/9).
> 
> Hi Folks,
> 
>    I just caught up on this thread and thought I toss out some info.
> 
>    I have a number of systems running 9-stable (with some local patches),
> none running 8.
> 
>    The basic architecture is: http://people.freebsd.org/~jwd/zfsnfsserver.jpg
> 
>    LSI SAS 9201-16e  6G/s 16-Port SATA+SAS Host Bus Adapter
> 
>    All cards are up-to-date on firmware:
> 
> mps0: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd 
> mps1: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd   
> mps2: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd
> 
>    All drives a geom multipath configured.
> 
>    Currently, these systems are used almost exclusively for iSCSI.
> 
>    I have seen no lockups that I can track down to the driver. I have seen
> one lockup which I did post about (received no feedback) where I believe
> an active I/O from istgt is interupted by an ABRT from the client which
> causes a lock-up. This one is hard to replicate and on the do-do list.
> 
>    It is worth noting that a few drives were replaced early on
> due to various I/O problems and one with what might be considered a
> lockup. As has been noted elsewhere, watching gstat can be informative.
> Also make sure cables are firmly plugged in.. Seems obvious, I know..
> 
>    I did recently commit a small patch to current to handle a case
> where if the system has greater than 255 disks, the 255th disk
> is hidden/masked by the mps initiator id that is statically coded into
> the driver.
> 
>    I think it might be good to document a bit better the type of
> mount and test job/test stream running when/if you see a lockup.
> I am not currently using NFS so there is an entire code-path I
> am not exercising.
> 
>    Servers are 12 processor, 96GB Ram. The highest cpu load I've
> seen on the systems is about 800%.
> 
>    All networking is 10G via Chelsio cards - configured to
> use isr maxthread 6 with a defaultqlimit of 4096.  I have seen
> no problems in this area.
> 
>    Hope this helps a bit. Happy to answer questions.
> 


I realized this morning that I neglected to ask a question: How big are
your files? Mine are anywhere up to 12T/ea. From one of my servers:

bd3# ls -lh
total 7400750995
drwxr-xr-x  3 root  wheel    12B Oct 26 08:14 ./
drwxr-xr-x  7 root  wheel     7B Aug 14 10:50 ../
drwxr-xr-x  2 root  wheel     2B Oct 25 21:37 Kore/
-rw-r--r--  1 root  wheel    12T Sep  8 10:24 Merged.0.txt
-rw-r--r--  1 root  wheel   1.1T Jul 18 07:30
Merged.2.cleansed.print.txt.gz
-rw-r--r--  1 root  wheel   1.2T Jul 18 04:13
Merged.3.cleansed.print.txt.gz
-rw-r--r--  1 root  wheel   985G Sep  7 17:25 Merged.KoreLogic.1.txt.bz2
-rw-r--r--  1 root  wheel   1.1T Sep 16 00:02 Merged.KoreLogic.3.txt.bz2
-rw-r--r--  1 root  wheel   670G Jul 27 10:01
Merged.outpost9.cleansed.print.txt.bz2
-rw-r--r--  1 root  wheel   639G Aug 30 06:47
Merged.packet.storm.1.print.cleansed.txt.bz2
-rw-r--r--  1 root  wheel   733G Jul 21 03:49
Merged.wordlist.0.cleansed.print.txt.bz2


Trying to work with the 12T file eventually hangs that system.



> Cheers,
> John
> 
> ps: With all that's been said above, it's worth noting that a correctly
>     configured client makes a huge difference.
> 
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"




More information about the freebsd-fs mailing list