Re: Desperate with 870 QVO and ZFS

Reply: Eugene Grosbein : "Re: Desperate with 870 QVO and ZFS"
In reply to: Eugene Grosbein : "Re: Desperate with 870 QVO and ZFS"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: <egoitz_at_ramattack.net>
Date: Wed, 06 Apr 2022 12:02:57 UTC

Hi Eugene, 

No... I normally don't have many delete operations..... in fact the bast
majority of them are left for the night... they are done at 2,3,4 am in
the morning.... 

We may have 600 deletes/sec at busy times (acording to what I see in
gstat and calculating when having two masters).... 

I don't think is a trim issue, because where removing snapshots in same
disks as these ones, but in another different machines (of another
service) and there are no issues... they hold virtual machines... it's
not mail.... but I assume something should be seen too if trims where
the issue... 

I honestly think, it could have something to do with concurrency.... the
disks have issues when you have perhaps 2200 users for instance and in
peak hours only!.... but how a disk... could be suffering of
concurrency?. The controller should only be able to do an operation at
the own same time... so there's not exist paralelism there... I can't
really understand what happens.... 

Regards, 

El 2022-04-06 13:42, Eugene Grosbein escribió:

> 06.04.2022 18:18, egoitz@ramattack.net wrote:
> 
>> Good morning,
>> 
>> I write this post with the expectation that perhaps someone could help me :)
>> 
>> I am running some mail servers with FreeBSD and ZFS. They use 870 QVO (not EVO or other Samsung SSD disks) disks as storage. They can easily have from 1500 to 2000 concurrent connections. The machines have 128GB of ram and the CPU is almost absolutely idle. The disk IO is normally at 30 or 40% percent at most.
>> 
>> The problem I'm facing is that they could be running just fine and suddenly at some peak hour,
>> the IO goes to 60 or 70% and the machine becomes extremely slow.
> 
> You should run: gstat -adpI3s
> And monitor all values, especially "deletes": d/s, next KBps and ms/d.
> 
> If you have many delete operations (including ZFS snapshort destroying),
> it may result in massive chunks of TRIM operations sent to SSD.
> Some SSD products have abysmal TRIM performance.