ZFS txg implementation flaw
Allan Jude
freebsd at allanjude.com
Mon Oct 28 20:50:57 UTC 2013
On 2013-10-28 16:48, Slawa Olhovchenkov wrote:
> On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote:
>
>> On 2013-10-28 14:16, Slawa Olhovchenkov wrote:
>>> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
>>>
>>>> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
>>>>
>>>>> I can be wrong.
>>>>> As I see ZFS cretate seperate thread for earch txg writing.
>>>>> Also for writing to L2ARC.
>>>>> As result -- up to several thousands threads created and destoyed per
>>>>> second. And hundreds thousands page allocations, zeroing, maping
>>>>> unmaping and freeing per seconds. Very high overhead.
>>>>>
>>>>> In systat -vmstat I see totfr up to 600000, prcfr up to 200000.
>>>>>
>>>>> Estimated overhead -- 30% of system time.
>>>>>
>>>>> Can anybody implement thread and page pool for txg?
>>>> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
>>> vfs.zfs.txg.timeout: 5
>>>
>>> Only x5 lowering (less in real case with burst writing). And more fragmentation on writing and etc.
>>> _______________________________________________
>>> freebsd-current at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
>> >From my understanding, increasing the timeout so you are doing fewer
>> transaction groups, would actually be the way to increase performance,
>> at the cost of 'bursty' writing and the associated uneven latency.
> This (increasing the timeout) is dramaticaly decreasing read
> performance by very high IO burst.
It shouldn't affect read performance, except during the flush operations
(every txg.timeout seconds)
If you watch with 'gstat' or 'gstat -f ada.$' you should see the cycle
reading quickly, then every txg.timeout seconds (and for maybe longer),
it flushes the entire transaction group (may be 100s of MBs) to the
disk, this high write load may make reads slow until it is finished.
Over the course of a full 60 seconds, this should result in a higher
total read performance, although it will be uneven, slower during the
write cycle.
--
Allan Jude
More information about the freebsd-current
mailing list