ZFS stalls on Heavy I/O

Levent Serinol lserinol at gmail.com
Wed Jun 27 17:08:38 UTC 2012



Hi,

On 27 Haz 2012, at 19:34, Andreas Nilsson <andrnils at gmail.com> wrote:

> 
> 
> On Wed, Jun 27, 2012 at 5:50 PM, Dean Jones <dean.jones at oregonstate.edu> wrote:
> On Wed, Jun 27, 2012 at 2:15 AM, Levent Serinol <lserinol at gmail.com> wrote:
> > Hi,
> >
> >  Under heavy I/O load we face freeze problems on ZFS volumes on both
> > Freebsd 9 Release and 10 Current versions. Machines are HP servers (64bit)
> > with HP Smart array 6400 raid controllers (with battery units). Every da
> > device is a hardware raid5 where each one includes 9x300GB 10K SCSI hard
> > drivers. Main of I/O pattern happens on local system except some small NFS
> > I/O from some other servers (NFS lookup/getattr/ etc.). These servers are
> > mail servers (qmail) with small I/O patterns (64K Read/Write).  Below you
> > can  find procstat output on freeze time. write_limit is set to 200MB
> > because of the huge amount of txg_wait_opens observed before. Every process
> > stops on D state I think due to txg queue and other 2 queues are full. Is
> > there any suggestion to fix the problem ?
> >
> > btw inject_compress is the main process injecting emails to user inboxes
> > (databases). Also, those machines were running without problems on
> > Linux/XFS filesystem. For a while ago, we started  migration from Linux to
> > Freebsd
> >
> >
> > http://pastebin.com/raw.php?i=ic3YepWQ
> > _______________________________________________
> 
> Looks like you are running dedup with only 12 gigs of ram?
> 
> Dedup is very ram hungry and the dedup tables are probably no longer
> fitting entirely in memory and therefore the system is swapping and
> thrashing about during writes.
> 
> Also ZFS really prefers to directly address drives instead of RAID
> controllers.  It can not guarantee or know what the controller is
> doing behind the scenes.
> You might want to read http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe and see if you need more ram.
> 
> And yes, having raid below zfs somewhat defeats the point of zfs.
> 
> Regards
> Andreas

That was one the machines, i'm running several similar machines except few changes. For examplw some of them have 50gb and 20gbs of ram and some of them has direct access every disk itself on poil as you suggested ( pools including 24 disks) some of the machines also running p812 hp raid cards (1gb cache ) , every hp card has battery unit. Every machine wheter rumning on 50gb ram or pools with lots of disks have the same stall problem except one of them which is using hp p6300 san with fc connection . It's running zfs without problems. Do i have to suspect on ciss driver which is common on all machines where problems occur ? Wheter they use 6400 or p812 raid cards all of them  is using same ciss driver except the one connected via fc to san.

Btw when zfs stalls after 1-2 minutes later it contiunes to write and read as usual. 

Do you suspect any problem in procstat  ouput that i provided ?

Thanks,
Levent


More information about the freebsd-fs mailing list