From nobody Wed Apr 06 11:26:24 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id F02D31A84645 for ; Wed, 6 Apr 2022 11:32:11 +0000 (UTC) (envelope-from egoitz@ramattack.net) Received: from cu1208c.smtpx.saremail.com (cu1208c.smtpx.saremail.com [195.16.148.183]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4KYMmn2hFXz4kFZ for ; Wed, 6 Apr 2022 11:32:08 +0000 (UTC) (envelope-from egoitz@ramattack.net) Received: from www.saremail.com (unknown [194.30.0.183]) by sieve-smtp-backend02.sarenet.es (Postfix) with ESMTPA id 2EDBE60C575; Wed, 6 Apr 2022 13:26:24 +0200 (CEST) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_7b94aa3284999df1ebddce202b1f53e2" Date: Wed, 06 Apr 2022 13:26:24 +0200 From: egoitz@ramattack.net To: Freebsd hackers Cc: owner-freebsd-hackers@freebsd.org Subject: Re: Desperate with 870 QVO and ZFS In-Reply-To: <6cf6c03c5a4aa8128575ec4e2f70b168@ramattack.net> References: <6cf6c03c5a4aa8128575ec4e2f70b168@ramattack.net> Message-ID: X-Sender: egoitz@ramattack.net User-Agent: Saremail webmail X-Rspamd-Queue-Id: 4KYMmn2hFXz4kFZ X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=reject) header.from=ramattack.net; spf=pass (mx1.freebsd.org: domain of egoitz@ramattack.net designates 195.16.148.183 as permitted sender) smtp.mailfrom=egoitz@ramattack.net X-Spamd-Result: default: False [-3.79 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; XM_UA_NO_VERSION(0.01)[]; RCVD_TLS_LAST(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:195.16.148.0/24:c]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; FROM_NO_DN(0.00)[]; DMARC_POLICY_ALLOW(-0.50)[ramattack.net,reject]; MLMMJ_DEST(0.00)[freebsd-hackers]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:3262, ipnet:195.16.128.0/19, country:ES]; RCVD_COUNT_TWO(0.00)[2]; MID_RHS_MATCH_FROM(0.00)[] X-ThisMailContainsUnwantedMimeParts: N --=_7b94aa3284999df1ebddce202b1f53e2 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 The most extrange thing is... When machine boots ARC is in 40 value of GB used (for instance), but later decreases to 20GB (and this is not an example... is exact) in all my servers.... it's like if the ARC metadata which is more or less 17GB would limite the whole ARC..... With the traffic of this machines, it should I suppose the ARC should be larger than it is... and ARC in loader.conf is limited to 64GB (the half the ram this machines have) El 2022-04-06 13:18, egoitz@ramattack.net escribió: > ATENCION: Este correo se ha enviado desde fuera de la organización. No pinche en los enlaces ni abra los adjuntos a no ser que reconozca el remitente y sepa que el contenido es seguro. > > Good morning, > > I write this post with the expectation that perhaps someone could help me > > I am running some mail servers with FreeBSD and ZFS. They use 870 QVO (not EVO or other Samsung SSD disks) disks as storage. They can easily have from 1500 to 2000 concurrent connections. The machines have 128GB of ram and the CPU is almost absolutely idle. The disk IO is normally at 30 or 40% percent at most. > > The problem I'm facing is that they could be running just fine and suddenly at some peak hour, the IO goes to 60 or 70% and the machine becomes extremely slow. ZFS is all by default, except the sync parameter which is set disabled. Apart from that the ARC is limited to 64GB. But even this is extremely odd. The used ARC is near 20GB. I have seen, that meta cache in arc is very near to the limit that FreeBSD automatically sets depending on the size of the ARC you set. It seems that almost all ARC is used by meta cache. I have seen this effect in all my mail servers with this hardware and software config. > > I do attach a zfs-stats output, but from now that the servers are not so loaded as described. I do explain. I run a couple of Cyrus instances in these servers. One as master, one as slave on each server. The commented situation from above, happens when both Cyrus instances become master, so when we are using two Cyrus instances giving service in the same machine. For avoiding issues, know we have balanced and we have a master and a slave in each server. You know, a slave instance has almost no io and only a single connection for replication. So the zfs-stats output is from now we have let's say half of load in each server, because they have one master and one slave instance. > > As said before, when I place two masters in same server, perhaps all day works, but just at 11:00 am (for example) the IO goes to 60% (it doesn't increase) but it seems like if the IO where not being able to be served, let's say more than a limit. More than a concrete io limit (I'd say 60%). > > I don't really know if, perhaps the QVO technology could be the guilty here.... because... they say are desktop computers disks... but later... I have get a nice performance when copying for instance mailboxes from five to five.... I can flood a gigabit interface when copying mailboxes between servers from five to five.... they seem to perform.... > > Could anyone please shed us some light in this issue?. I don't really know what to think. > > Best regards, > > ATENCION: Este correo se ha enviado desde fuera de la organización. No pinche en los enlaces ni abra los adjuntos a no ser que reconozca el remitente y sepa que el contenido es seguro. --=_7b94aa3284999df1ebddce202b1f53e2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8

The most extrange thing is... When machine boots ARC is = in 40 value of GB used (for instance), but later decreases to 20GB (and thi= s is not an example... is exact) in all my servers.... it's like if the ARC= metadata which is more or less 17GB would limite the whole ARC.....

 

With the traffic of this machines, it should I suppose t= he ARC should be larger than it is... and ARC in loader.conf is limited to = 64GB (the half the ram this machines have)

 


El 2022-04-06 13:18, egoitz@ramattack.net escribió:


ATENCION: Este correo se ha enviado = desde fuera de la organización. No pinche en los enlaces ni abra los= adjuntos a no ser que reconozca el remitente y sepa que el contenido es se= guro.

Good morning,

I write this post with = the expectation that perhaps someone could help me 3D":)"

I am running some mail servers with FreeBSD and ZFS. They use 870 QVO (n= ot EVO or other Samsung SSD disks) disks as storage. They can easily have f= rom 1500 to 2000 concurrent connections. The machines have 128GB of ram and= the CPU is almost absolutely idle. The disk IO is normally at 30 or 40% pe= rcent at most.

The problem I'm facing is that they could be ru= nning just fine and suddenly at some peak hour, the IO goes to 60 or 70% an= d the machine becomes extremely slow. ZFS is all by default, except the syn= c parameter which is set disabled. Apart from that the ARC is limited to 64= GB. But even this is extremely odd. The used ARC is near 20GB. I have seen,= that meta cache in arc is very near to the limit that FreeBSD automaticall= y sets depending on the size of the ARC you set. It seems that almost all A= RC is used by meta cache. I have seen this effect in all my mail servers wi= th this hardware and software config.

I do attach a zfs-stats = output, but from now that the servers are not so loaded as described. I do = explain. I run a couple of Cyrus instances in these servers. One as master,= one as slave on each server. The commented situation from above, happens w= hen both Cyrus instances become master, so when we are using two Cyrus inst= ances giving service in the same machine. For avoiding issues, know we have= balanced and we have a master and a slave in each server. You know, a slav= e instance has almost no io and only a single connection for replication. S= o the zfs-stats output is from now we have let's say half of load in each s= erver, because they have one master and one slave instance.

As= said before, when I place two masters in same server, perhaps all day work= s, but just at 11:00 am (for example) the IO goes to 60% (it doesn't increa= se) but it seems like if the IO where not being able to be served, let's sa= y more than a limit. More than a concrete io limit (I'd say 60%).
I don't really know if, perhaps the QVO technology could be the guilty = here.... because... they say are desktop computers disks... but later... I = have get a nice performance when copying for instance mailboxes from five t= o five.... I can flood a gigabit interface when copying mailboxes between s= ervers from five to five.... they seem to perform....

Could an= yone please shed us some light in this issue?. I don't really know what to = think.

Best regards,
 




ATENCION: Este correo se ha enviado = desde fuera de la organización. No pinche en los enlaces ni abra los= adjuntos a no ser que reconozca el remitente y sepa que el contenido es se= guro.
--=_7b94aa3284999df1ebddce202b1f53e2--