ZFS "stalls" -- and maybe we should be talking about defaults?

Thu Mar 7 19:07:18 UTC 2013

On 3/7/2013 12:57 PM, Steven Hartland wrote:
>
> ----- Original Message ----- From: "Karl Denninger" <karl at denninger.net>
>> Where I am right now is this:
>>
>> 1. I *CANNOT* reproduce the spins on the test machine with Postgres
>> stopped in any way.  Even with multiple ZFS send/recv copies going on
>> and the load average north of 20 (due to all the geli threads), the
>> system doesn't stall or produce any notable pauses in throughput.  Nor
>> does the system RAM allocation get driven hard enough to force paging.
>> This is with NO tuning hacks in /boot/loader.conf.  I/O performance is
>> both stable and solid.
>>
>> 2. WITH Postgres running as a connected hot spare (identical to the
>> production machine), allocating ~1.5G of shared, wired memory,  running
>> the same synthetic workload in (1) above I am getting SMALL versions of
>> the misbehavior.  However, while system RAM allocation gets driven
>> pretty hard and reaches down toward 100MB in some instances it doesn't
>> get driven hard enough to allocate swap.  The "burstiness" is very
>> evident in the iostat figures with spates getting into the single digit
>> MB/sec range from time to time but it's not enough to drive the system
>> to a full-on stall.
>>
>> There's pretty-clearly a bad interaction here between Postgres wiring
>> memory and the ARC, when the latter is left alone and allowed to do what
>> it wants.   I'm continuing to work on replicating this on the test
>> machine... just not completely there yet.
>
> Another possibility to consider is how postgres uses the FS. For example
> does is request sync IO in ways not present in the system without it
> which is causing the FS and possibly underlying disk system to behave
> differently.
>
That's possible but not terribly-likely in this particular instance.  
The reason is that I ran into this with the Postgres data store on a UFS
volume BEFORE I converted it.  Now it's on the ZFS pool (with
recordsize=8k as recommended for that filesystem) but when I first ran
into this it was on a separate UFS filesystem (which is where it had
resided for 2+ years without incident), so unless the Postgres
filesystem use on a UFS volume would give ZFS fits it's unlikely to be
involved.

> One other options to test, just to rule it out is what happens if you
> use BSD scheduler instead of ULE?
>
>    Regards
>    Steve
>

I will test that but first I have to get the test machine to reliably
stall so I know I'm not chasing my tail.

-- 
-- Karl Denninger
/The Market Ticker ®/ <http://market-ticker.org>
Cuda Systems LLC