From nobody Wed Apr 06 12:02:57 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 87CCB1A8D768 for ; Wed, 6 Apr 2022 12:03:02 +0000 (UTC) (envelope-from egoitz@ramattack.net) Received: from cu01208b.smtpx.saremail.com (cu01208b.smtpx.saremail.com [195.16.151.183]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4KYNSP3NyQz4t4x for ; Wed, 6 Apr 2022 12:03:00 +0000 (UTC) (envelope-from egoitz@ramattack.net) Received: from www.saremail.com (unknown [194.30.0.183]) by sieve-smtp-backend01.sarenet.es (Postfix) with ESMTPA id A003760C658; Wed, 6 Apr 2022 14:02:57 +0200 (CEST) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_e9fc600bd2777d5700eb4010e802ed4d" Date: Wed, 06 Apr 2022 14:02:57 +0200 From: egoitz@ramattack.net To: Eugene Grosbein Cc: freebsd-hackers@freebsd.org Subject: Re: Desperate with 870 QVO and ZFS In-Reply-To: <15a86fae-90fd-951d-50e0-48f9be8b4bbc@grosbein.net> References: <6cf6c03c5a4aa8128575ec4e2f70b168@ramattack.net> <15a86fae-90fd-951d-50e0-48f9be8b4bbc@grosbein.net> Message-ID: <109127fb4e43e70cd548fecde2c1f755@ramattack.net> X-Sender: egoitz@ramattack.net User-Agent: Saremail webmail X-Rspamd-Queue-Id: 4KYNSP3NyQz4t4x X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=reject) header.from=ramattack.net; spf=pass (mx1.freebsd.org: domain of egoitz@ramattack.net designates 195.16.151.183 as permitted sender) smtp.mailfrom=egoitz@ramattack.net X-Spamd-Result: default: False [-3.79 / 15.00]; RCVD_TLS_LAST(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; XM_UA_NO_VERSION(0.01)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:195.16.151.0/24]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; ARC_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; FROM_NO_DN(0.00)[]; DMARC_POLICY_ALLOW(-0.50)[ramattack.net,reject]; MLMMJ_DEST(0.00)[freebsd-hackers]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:3262, ipnet:195.16.128.0/19, country:ES]; RCVD_COUNT_TWO(0.00)[2]; MID_RHS_MATCH_FROM(0.00)[] X-ThisMailContainsUnwantedMimeParts: N --=_e9fc600bd2777d5700eb4010e802ed4d Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Hi Eugene, No... I normally don't have many delete operations..... in fact the bast majority of them are left for the night... they are done at 2,3,4 am in the morning.... We may have 600 deletes/sec at busy times (acording to what I see in gstat and calculating when having two masters).... I don't think is a trim issue, because where removing snapshots in same disks as these ones, but in another different machines (of another service) and there are no issues... they hold virtual machines... it's not mail.... but I assume something should be seen too if trims where the issue... I honestly think, it could have something to do with concurrency.... the disks have issues when you have perhaps 2200 users for instance and in peak hours only!.... but how a disk... could be suffering of concurrency?. The controller should only be able to do an operation at the own same time... so there's not exist paralelism there... I can't really understand what happens.... Regards, El 2022-04-06 13:42, Eugene Grosbein escribió: > 06.04.2022 18:18, egoitz@ramattack.net wrote: > >> Good morning, >> >> I write this post with the expectation that perhaps someone could help me :) >> >> I am running some mail servers with FreeBSD and ZFS. They use 870 QVO (not EVO or other Samsung SSD disks) disks as storage. They can easily have from 1500 to 2000 concurrent connections. The machines have 128GB of ram and the CPU is almost absolutely idle. The disk IO is normally at 30 or 40% percent at most. >> >> The problem I'm facing is that they could be running just fine and suddenly at some peak hour, >> the IO goes to 60 or 70% and the machine becomes extremely slow. > > You should run: gstat -adpI3s > And monitor all values, especially "deletes": d/s, next KBps and ms/d. > > If you have many delete operations (including ZFS snapshort destroying), > it may result in massive chunks of TRIM operations sent to SSD. > Some SSD products have abysmal TRIM performance. --=_e9fc600bd2777d5700eb4010e802ed4d Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8

Hi Eugene,


No... I normally don't have many delete operations..... in fact the bast= majority of them are left for the night... they are done at 2,3,4 am in th= e morning....

We may have 600 deletes/sec at busy times (acording to what I see in gst= at and calculating when having two masters)....

I don't think is a trim issue, because where removing snapshots in same = disks as these ones, but in another different machines (of another service)= and there are no issues... they hold virtual machines... it's not mail..= =2E. but I assume something should be seen too if trims where the issue..= =2E

I honestly think, it could have something to do with concurrency.... the= disks have issues when you have perhaps 2200 users for instance and in pea= k hours only!.... but how a disk... could be suffering of concurrency?. The= controller should only be able to do an operation at the own same time..= =2E so there's not exist paralelism there... I can't really understand what= happens....

Regards,


 


El 2022-04-06 13:42, Eugene Grosbein escribió:

=
06.04.2022 18:18, egoitz@ram= attack.net wrote:

Good morning,

I write this post with the = expectation that perhaps someone could help me :)

I am running= some mail servers with FreeBSD and ZFS. They use 870 QVO (not EVO or other= Samsung SSD disks) disks as storage. They can easily have from 1500 to 200= 0 concurrent connections. The machines have 128GB of ram and the CPU is alm= ost absolutely idle. The disk IO is normally at 30 or 40% percent at most= =2E

The problem I'm facing is that they could be running just = fine and suddenly at some peak hour,
the IO goes to 60 or 70% and the= machine becomes extremely slow.

You should run: gstat -adpI3s
And monitor all values, especial= ly "deletes": d/s, next KBps and ms/d.

If you have many delete= operations (including ZFS snapshort destroying),
it may result in ma= ssive chunks of TRIM operations sent to SSD.
Some SSD products have a= bysmal TRIM performance.


--=_e9fc600bd2777d5700eb4010e802ed4d--