ZFS "stalls" -- and maybe we should be talking about defaults?

Steven Hartland killing at multiplay.co.uk
Thu Mar 7 19:27:11 UTC 2013

----- Original Message ----- 
From: "Karl Denninger" <karl at denninger.net>
To: <freebsd-stable at freebsd.org>
Sent: Thursday, March 07, 2013 7:07 PM
Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults?

On 3/7/2013 12:57 PM, Steven Hartland wrote:
> ----- Original Message ----- From: "Karl Denninger" <karl at denninger.net>
>> Where I am right now is this:
>> 1. I *CANNOT* reproduce the spins on the test machine with Postgres
>> stopped in any way.  Even with multiple ZFS send/recv copies going on
>> and the load average north of 20 (due to all the geli threads), the
>> system doesn't stall or produce any notable pauses in throughput.  Nor
>> does the system RAM allocation get driven hard enough to force paging.
>> This is with NO tuning hacks in /boot/loader.conf.  I/O performance is
>> both stable and solid.
>> 2. WITH Postgres running as a connected hot spare (identical to the
>> production machine), allocating ~1.5G of shared, wired memory,  running
>> the same synthetic workload in (1) above I am getting SMALL versions of
>> the misbehavior.  However, while system RAM allocation gets driven
>> pretty hard and reaches down toward 100MB in some instances it doesn't
>> get driven hard enough to allocate swap.  The "burstiness" is very
>> evident in the iostat figures with spates getting into the single digit
>> MB/sec range from time to time but it's not enough to drive the system
>> to a full-on stall.
>>> There's pretty-clearly a bad interaction here between Postgres wiring
>>> memory and the ARC, when the latter is left alone and allowed to do what
>>> it wants.   I'm continuing to work on replicating this on the test
>>> machine... just not completely there yet.
>> Another possibility to consider is how postgres uses the FS. For example
>> does is request sync IO in ways not present in the system without it
>> which is causing the FS and possibly underlying disk system to behave
>> differently.
> That's possible but not terribly-likely in this particular instance.  
> The reason is that I ran into this with the Postgres data store on a UFS
> volume BEFORE I converted it.  Now it's on the ZFS pool (with
> recordsize=8k as recommended for that filesystem) but when I first ran
> into this it was on a separate UFS filesystem (which is where it had
> resided for 2+ years without incident), so unless the Postgres
> filesystem use on a UFS volume would give ZFS fits it's unlikely to be
> involved.

I hate to say it, but that sounds very familiar to something we experienced
with a machine here which was running high numbers of rrd updates. Again
we had the issue on UFS and saw the same thing when we moved the ZFS.

I'll leave that there as to not derail the investigation with what could
be totally irrelavent info, but it may prove an interesting data point

There are obvious common low level points between UFS and ZFS which
may be the cause. One area which springs to mind is device bio ordering
and barriers which could well be impacted by sync IO requests independent
of the FS in use.

>> One other options to test, just to rule it out is what happens if you
>> use BSD scheduler instead of ULE?
> I will test that but first I have to get the test machine to reliably
> stall so I know I'm not chasing my tail.

Very sensible.

Assuming you can reproduce it, one thing that might be interesting to
try is to eliminate all sync IO. I'm not sure if there are options in
Postgres to do this via configuration or if it would require editing
the code but this could reduce the problem space.

If disabling sync IO eliminated the problem it would go a long way
to proving it isn't the IO volume or pattern per say but instead
related to the sync nature of said IO.


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.

More information about the freebsd-stable mailing list