Re: Unusual ZFS behaviour

From: Edward Sanford Sutton, III <mirror176_at_hotmail.com>
Date: Wed, 22 Nov 2023 07:27:39 UTC
On 11/22/23 00:04, Eugene Grosbein wrote:
> 22.11.2023 13:49, Jonathan Chen wrote:
>> Hi,
>>
>> I'm running a somewhat recent version of STABLE-13/amd64: stable/13-n256681-0b7939d725ba: Fri Nov 10 08:48:36 NZDT 2023, and I'm seeing some unusual behaviour with ZFS.
>>
>> To reproduce:
>>   1. one big empty disk, GPT scheme, 1 freebsd-zfs partition.
>>   2. create a zpool, eg: tank
>>   3. create 2 sub-filesystems, eg: tank/one, tank/two
>>   4. fill each sub-filesystem with large files until the pool is ~80% full. In my case I had 200 10Gb files in each.
>>   5. in one session run 'md5 tank/one/*'
>>   6. in another session run 'md5 tank/two/*'
>>
>> For most of my runs, one of the sessions against a sub-filesystem will be starved of I/O, while the other one is performant.
>>
>> Is anyone else seeing this?

More details of the disk, disk controller, and FreeBSD version may be 
helpful. If it is SATA, maybe there is impact form its own organizing of 
when to handle the queue of tasks it is given in addition to what 
FreeBSD+ZFS have assigned and OS (+ZFS if not its default) version could 
say what IO balancing is currently present/available.

> 
> Please try repeating the test with atime updates disabled:
> 
> zfs set atime=off tank/one
> zfs set atime=off tank/two

atime's impact is a write and writes get priority so if anything there 
would be 'little' breaks in the reads to write such data. I doubt the 
scenario+hardware in discussion is bottlenecking on writing atime data 
for the access of these 10GB files but it would be interesting. On the 
other hand, I think it is atime that trashes a smooth disk of freshly 
created file structure with many files after default cronjobs pass over 
it due to it + COW fragmenting the data structure to do basic things 
like list disk contents; I have not tested properly what the source of 
that repeatable issue was yet. Accessing data within the file doesn't 
seem impacted the same way as the directory listing though.

I was thinking maybe an impact with sysctl settings involving prefetch 
may impact the sequence. vfs.zfs.prefetch_disable=1 is probably the one 
I am thinking of but I normally don't tweak zfs and related settings if 
I don't have to as I usually later find the tweaks are problems themselves.

I keep my system running smoother when I put it under excessive load 
with idprio and nice being used on the heavier noninteractive load.

> Does it make any difference?
> Does it make any difference, if you import the pool with readonly=on instead?
> 
> Writing to ~80% pool is almost always slow for ZFS.

Been there, done that. It is painful but I think it is more complicated 
than just a total free space counter before that issue shows up. Other 
performance issues also exist as I've had horrible I/O on disks that 
never exceeded 20% used since being formatted.