From nobody Sat Sep 14 02:24:14 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X5FP32rLMz5Wkgd for ; Sat, 14 Sep 2024 02:24:23 +0000 (UTC) (envelope-from gavin@gavinhoward.com) Received: from mail-40136.proton.ch (mail-40136.proton.ch [185.70.40.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "protonmail.com", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4X5FP15r1Rz45TT for ; Sat, 14 Sep 2024 02:24:21 +0000 (UTC) (envelope-from gavin@gavinhoward.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gavinhoward.com header.s=protonmail3 header.b="cBcz/GhV"; dmarc=pass (policy=quarantine) header.from=gavinhoward.com; spf=pass (mx1.freebsd.org: domain of gavin@gavinhoward.com designates 185.70.40.136 as permitted sender) smtp.mailfrom=gavin@gavinhoward.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gavinhoward.com; s=protonmail3; t=1726280658; x=1726539858; bh=rv/M0fuhY1RtO1/JMtWK7Hk/UK6Pk97E98a8DHT7NZ8=; h=Date:To:From:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=cBcz/GhVa8o4Q7v1asp8LpOaMQFNF9nA3kcqrwNWLmhqflNxaq3fH0qT0ea2MuCCB Atpya46zf7ZS7okU80rgR3yiytgKaQXuq8qFBHMhEvMFSxlN1Y/hpNBNs/2WmE9woV Vno63FjzMjHIAm3XTVj7chwjEnnZBNuAHYViyn/rnoCneZxyNabILerOxlDpfLZQD1 FnSi586dFTdbd4QnjRpc1IXziQBsYwrfmQTi1vJIqAt8XrdCDxOq8DPFOPq343kgmi ZEG9Q0xpIiPBv/cUlZee4Zo/G1PzihfI+GMmCo4nkMS+qMpgga0O7q1qE5AVUK0l67 7FpCzwj63AWiQ== Date: Sat, 14 Sep 2024 02:24:14 +0000 To: "freebsd-hackers@FreeBSD.org" From: "Gavin D. Howard" Subject: Re: The Case for Rust (in any system) Message-ID: In-Reply-To: <4902a4c4-3c3f-4dd9-8022-49dd6b7e585b@gmail.com> References: <2EE309BF-CE1D-48AD-9C53-D4C87998B4A0@freebsd.org> <434910a3-e832-40d1-8fdd-c46739b3e7fe@gmail.com> <4902a4c4-3c3f-4dd9-8022-49dd6b7e585b@gmail.com> Feedback-ID: 18790518:user:proton X-Pm-Message-ID: 9ff912eb77ab37b9735f419a4ca5d7797ec1e5c3 List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.19 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.988]; DMARC_POLICY_ALLOW(-0.50)[gavinhoward.com,quarantine]; R_SPF_ALLOW(-0.20)[+ip4:185.70.40.0/24]; R_DKIM_ALLOW(-0.20)[gavinhoward.com:s=protonmail3]; RWL_MAILSPIKE_VERYGOOD(-0.20)[185.70.40.136:from]; MIME_GOOD(-0.10)[text/plain]; FREEFALL_USER(0.00)[gavin]; ASN(0.00)[asn:62371, ipnet:185.70.40.0/24, country:CH]; MIME_TRACE(0.00)[0:+]; FROM_HAS_DN(0.00)[]; MISSING_XM_UA(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; TO_DN_EQ_ADDR_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hackers@FreeBSD.org]; FROM_EQ_ENVFROM(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; ARC_NA(0.00)[]; DKIM_TRACE(0.00)[gavinhoward.com:+] X-Rspamd-Queue-Id: 4X5FP15r1Rz45TT > Try and explain this for example: > > Sorting int array with clang++18 and subscripts... > User time =3D 4.74 seconds (.07900 minutes) (.00131 hours). > RSS =3D 4204 KB > > Sorting long array with clang++18 and subscripts... > User time =3D 2.22 seconds (.03700 minutes) (.00061 hours). > RSS =3D 4608 KB A new, curious participant here. My guess is that the ints are being extended to longs inside the loop, which would require an extra sign extension instruction. I don't think that explains the time doubling, but simply running that one instruction may not be the only cause of performance loss from an extra instruction. That one instruction may actually be the straw that broke the L1 camel's back; without it, the L1 instruction cache may not overflow, but with it, the L1 instruction cache may overflow, causing cache misses into L2 on every iteration of the loop. It would also occupy one of the arithmetic units, which could lead to less instruction level parallelism or give the compiler less room for unrolling the loop. Just a theory; I have no clue. If you have code to share, I'd love to see it and try to reproduce the effect. Gavin Howard