From nobody Wed Nov 24 21:19:16 2021 X-Original-To: arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 41EBD18A2B80 for ; Wed, 24 Nov 2021 21:19:31 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic305-21.consmr.mail.gq1.yahoo.com (sonic305-21.consmr.mail.gq1.yahoo.com [98.137.64.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Hzv5t29bjz3pmg for ; Wed, 24 Nov 2021 21:19:30 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1637788763; bh=Ou+P6n9KgO4atKbMDo7CKwoZFLXqQ4734h1kwHEVmW0=; h=From:Subject:Date:References:To:In-Reply-To:From:Subject:Reply-To; b=EgmpDXuYilJBCWi8llQgw/FN1GkL0h8AABubZ7DPRp1l0q3JfBmlB9g8GWU4FZCeA8YLtMAN32uDyM6StFvmm59QQ5hjg6A2C2Gi41N2kWIrImGCLJVPGKhs5pD9Wkz907FvMYhBt46aDEcPzJYC8BsW5D32d7ms2cfN5gGnBRw8N1cEkWsZCHHt7FIQ4UGw0mpA6fYAjpD8AAJzqSpqABcisX5lIjP2I9W2PxsFiUxh5k4V726rHV6Mq2qkkMhRN5LKK2QI9YaTz13RgR6rav7ByvpXroABiowEK2JT8CJPMPUkEY3igqfPlqkyhSa1cCMgKEpoyFf4x/8DtCmzFw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1637788763; bh=J4qGR9WGo/Qxh2/xs2K49Bs+2foSiiaswTTEAd0c6OO=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=HI/iOr7udXwYpOXvIVx2Cdqb137g0h0m4YBXksMb4Iss6GQZiva6YnijwtQE0/2aaLsXd+tErjhoCo0d41L19cgh0yplb45QuVyj03JPnqGuOQe6q95frH9iTzaJxORMeXGqoB6VA+f2fiaH+F4kcXsI4CgGOMRQ4nrTkrm4QblUg7w8b/sZYSFlNTxbHe5dk1PZtzvnrMqL7ojTvjZt/XzJ5+9+zBCY/DhcVriw2kvaXDlAMjL1d6WGCL/Vzkx/4qo21ZP6M+wqxSyW9UREIyY90kWQ9nYwH0JFPS8BXJy1GntaLtWr1XQ2z+SVM1XJLNWwjKG99MkmsF+mlfuzCg== X-YMail-OSG: STys2RoVM1m.lsf9WW6nKHJRwpIE3BJIQTvYf1UcX64GuygFZJgrpkkuVOfUwum HqhSxBuy_ap1mbZ3f71RqN_q0S1hbhs4njHLh3tkmjjt.1mkthXtPROlam.yfgaBD4KapLsk1DZj Tmk5W38toRqnThAD4co4Ei5vflRSc.ZSueJEgsHtLIYwDZH59jG.ZFoBSYLbYBEBEW_Sp4Ipp2vo jOvIVrKFWRJ0MK36SXVCzDiXdohg8r2oJXyDnklOBGXdWFWnyEtQHnPMCAU2sSoBKvhXlzydgZSS .woyxvTyAcXIgfXfUKRgVSKGAqRXG6NF433mIZC7zmEkn83JBsf9RcqAIJdl6cbwxuYp3EVbzqXc bk7uC3fDj1uwHE89pR.pDwvReECOHEnhkvkuv9qeTxJS97n5RfzYWri4Co37JwCvqKCxi0iwg7i3 3NLeYEsFxe8Hy3L9gzWDrwh_5ch06wPpmXf28EZB_b8IioFC6JACK2yLwTk.8QzaH8Kl1vQlOzH7 YPCsft1Jw7Zh8Z.hAxiCAstmbX4jcx2JBPynh1BoWIbv_fPkQWESBL.tDoDcwXQdlLTJFFY2gkXg PuGwWgpkhEN0DKy0HxKD78CCcbqxiM4OwHSeHD3VNnE5FU0Kf__dvkyZ8607KlG1p3XJkvwxBLbZ 5hPIuxnuJhdJiPImwgc.5TefB8JhGB0Beoc9iRkL.tEnFHM2_DBAiKtCVq7sawN_veoSDtLmsEod C_uaeaqgchWaAYsC2nAIJQaAl9k6EeFNo.v5z6cxW5Ii_iwaa4LzoOfdPvTjiXlTMvpt_xn9mA4m iy8X.qsvRv4qk4xU78DTjzQJSVLslF.Nj0sOXKn9SrkCnyK1Ycz0y8bats3lCrmuDLUGj8QORdC7 m9xc1j9DQ1INII73fmpXwEApiC4qDvtuIFDEZbYPhO18yRQCLAWzRIDcd6H7EyL5aSmes6EgWSLR _SQ8S0d6CDf2SWV0cmajcvvRql0cklIGtM7z_6R4c8CDWvCFOL6_OHEcppQ7lQlVGez9soE3HLTh i3wR2djsvqmcBElK69INQEXE6cdVnfsiK2lApzu3rDDGO_SEVxHDwtvbS9BtPjgGhgnHRl.CYoA_ svTsCwAJMj3pkPCMpcnO5XwVYppogicFmTUELGe2guWVbc9gVAfuVujic3SyjLENsdWYD3f3CJSm G_Qhxf_rb7vfYFpA4fu7f2FNTb24hvHck5YZpqbrH6STnuMq4oyIrl4BlQky1H2mlsr_YzfuSq6c qOt5dUS.XIuYLFGwB582ui1kfnFpOv_h6.N9fUuTNrzSkuIcL713sLdTjK9PeCG_E2kWw7Ggd8ys PgrG6_qxmSQBFAjdyQ0X_42tW8O5l8t3BRam_hlu5yVBOopoXOP6XHNETKSQI37y1ShrLOxNhWGi xOYhFLS1cJb4s9sA_B6v7iSiV4b5L4ngsQNSwSl6fPdlZhEsjurYJl9kBQX1MBnufRxHTmj1_Van 7TzKm485bNeRdr4O6v5rFvfB95NWVDSwTk5UHwAYOSqA0a3uBN1o3pf9JmRY5n1WRNe1_gIreaq_ eRaT81ghiPyfoFY4zP6Qg5MKS7N1Mh18vI9IuA2qQpBaxUdp2Q5grZr19yLue7LvHdyfInXq7ytB Osryx6rgPTwPTAe_90QOVeBMSNWRPicE8ICLUUel11yGDIcNr6ps_m2gaNjA5WHvX78qClLrn4sZ LQ7p7LvqtJJ3uSW3cSF5bDC36B584D7_3gM0REoJGJxMCqMI5HRy13l9dAiTg1xFl8uKKbgBZzVo tgsOFpsg3gNccn4JJxMgWaklqDGbEFKubRfXbTDkhg3EiZMuHxg50gkN5X7tnmDBTYxn_cWOPdKL RqWESqxY.i2qijK8m83LL.wK9iB1GLA0549Vnq_n3Ghu63VtlP6dXcn7cETvtIihib7KrCRl_P5L YZJatn6si10llYRy01PjEz3Q8sGL2v97TQYKZexPbDneUqVdUPZObdi4vxEXikh35frXgamtCkif JWU7dek6Cb2FV0zw9BK.CkacxPvPKLaIrh3Fy.xW0UBLrCRAQlVmFLTqqRLz8e8xOL3Zz1A9OWE2 LwUHBsquYTCasdKDfYnxn3tws_ylTtqClo0ihpVJ4YZRj7DO.M2TDuVh8VrEH1nRRmEkfRmyfK2r AJVaJ_sQ_mqA_P7R5CFD0PQkmHKiz9SCHwoKpCic.vDDkPk3zBwdIz6CHDELN0y2V9R9OO2e_XKJ r3uD2wVNwiGYujZjogR5xObjhHcvpqAaWHm8q46Vj4RzK X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.gq1.yahoo.com with HTTP; Wed, 24 Nov 2021 21:19:23 +0000 Received: by kubenode530.mail-prod1.omega.ne1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 6a0e54fe85e44fe56f90ddf70201b456; Wed, 24 Nov 2021 21:19:17 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features Date: Wed, 24 Nov 2021 13:19:16 -0800 References: <0CEA37B8-CE7F-4BAE-92B7-E71C5FD1BC22@yahoo.com> To: allanjude@freebsd.org, "freebsd-arm@freebsd.org" In-Reply-To: <0CEA37B8-CE7F-4BAE-92B7-E71C5FD1BC22@yahoo.com> Message-Id: X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4Hzv5t29bjz3pmg X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=EgmpDXuY; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.84 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-2.26 / 15.00]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_SPAM_MEDIUM(0.24)[0.242]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.84:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RCVD_COUNT_TWO(0.00)[2]; MID_RHS_MATCH_FROM(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.64.84:from] Reply-To: marklmi@yahoo.com From: Mark Millard via arm X-Original-From: Mark Millard X-ThisMailContainsUnwantedMimeParts: N On 2021-Nov-24, at 01:51, Mark Millard wrote: > [Actually, the main [so: 14] equivalent.] >=20 > All Cortex-A72 based . . . >=20 > First, older system versions (before that update) > then after the update: >=20 >=20 > RPi4B 8 GiByte (older FreeBSD first, otherwise new), > Cortex-A72's: >=20 > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 51925.92k 58449.46k 60430.32k 61050.13k = 61180.98k 61482.75k >=20 > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 28880.07k 30837.33k 31630.29k 31855.62k = 31921.54k 32034.53k >=20 > So: slowed down, unlike the other examples below. >=20 > # env OPENSSL_armcap=3D0 openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 51894.33k 58540.45k 60815.22k 61534.47k = 61906.84k 62042.10k >=20 > So: back to the prior speed. >=20 > But all these are based on config.txt containing: >=20 > over_voltage=3D6=20 > arm_freq=3D2000=20 > sdram_freq_min=3D3200=20 > force_turbo=3D1 >=20 > (The RPi4B has a heat-sink and a fan.) >=20 > Note: See later about the RPi4B CPU features. >=20 >=20 > MACCHIATObin Double Shot (older first), Cortex-A72's: >=20 > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 50808.49k 58466.08k 60769.11k 61444.92k = 61767.94k 61707.61k >=20 > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 163579.14k 456319.27k 786544.01k 940234.41k = 1003230.55k 1005671.31k >=20 >=20 > HoneyComb (older first), Cortex-A782's: >=20 > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 57659.60k 64599.05k 67719.81k 68373.74k = 68724.24k 68793.80k >=20 > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 177925.57k 502311.65k 866287.95k 1036500.35k = 1106598.06k 1106721.91k >=20 > Rock64 (older first), Cortex-A53's: >=20 > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 18378.23k 23401.45k 24834.99k 25206.10k = 25337.86k 25258.19k >=20 > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 52711.29k 163586.49k 318738.69k 420277.93k = 461373.44k 463192.06k >=20 >=20 > OPi+2E (older first), Cortex-A7's (so armv7): >=20 > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 9343.10k 11156.39k 11827.64k 11995.30k = 12025.86k 12031.32k >=20 > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 11013.41k 13598.44k 14034.26k 15045.97k = 15262.90k 15302.66k >=20 >=20 >=20 > For reference: >=20 > For the RPi4B examples (2 notes added): >=20 > CPU 0: ARM Cortex-A72 r0p3 affinity: 0 > Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> > Instruction Set Attributes 0 =3D > *** NOTE the lack of ",SHA2,SHA1,AES+PMULL" above *** > Instruction Set Attributes 1 =3D <> > Processor Features 0 =3D > Processor Features 1 =3D <> > Memory Model Features 0 =3D > Memory Model Features 1 =3D <8bit VMID> > Memory Model Features 2 =3D <32bit CCIDX,48bit VA> > Debug Features 0 =3D > Debug Features 1 =3D <> > Auxiliary Features 0 =3D <> > Auxiliary Features 1 =3D <> > AArch32 Instruction Set Attributes 5 =3D > *** NOTE the lack of ",SHA2,SHA1,AES+VMULL" above *** > AArch32 Media and VFP Features 0 =3D > AArch32 Media and VFP Features 1 =3D >=20 > For the MACCHIATObin Double Shot examples: >=20 > CPU 0: ARM Cortex-A72 r0p1 affinity: 0 0 > Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> > Instruction Set Attributes 0 =3D > Instruction Set Attributes 1 =3D <> > Processor Features 0 =3D > Processor Features 1 =3D <> > Memory Model Features 0 =3D > Memory Model Features 1 =3D <8bit VMID> > Memory Model Features 2 =3D <32bit CCIDX,48bit VA> > Debug Features 0 =3D > Debug Features 1 =3D <> > Auxiliary Features 0 =3D <> > Auxiliary Features 1 =3D <> > AArch32 Instruction Set Attributes 5 =3D = > AArch32 Media and VFP Features 0 =3D > AArch32 Media and VFP Features 1 =3D >=20 >=20 > For the HoneyComb examples: >=20 > CPU 0: ARM Cortex-A72 r0p3 affinity: 0 0 > Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> > Instruction Set Attributes 0 =3D > Instruction Set Attributes 1 =3D <> > Processor Features 0 =3D > Processor Features 1 =3D <> > Memory Model Features 0 =3D > Memory Model Features 1 =3D <8bit VMID> > Memory Model Features 2 =3D <32bit CCIDX,48bit VA> > Debug Features 0 =3D > Debug Features 1 =3D <> > Auxiliary Features 0 =3D <> > Auxiliary Features 1 =3D <> > AArch32 Instruction Set Attributes 5 =3D = > AArch32 Media and VFP Features 0 =3D > AArch32 Media and VFP Features 1 =3D >=20 >=20 >=20 >=20 > For the Rock64 examples: >=20 > CPU 0: ARM Cortex-A53 r0p4 affinity: 0 > Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG> > Instruction Set Attributes 0 =3D > Instruction Set Attributes 1 =3D <> > Processor Features 0 =3D > Processor Features 1 =3D <> > Memory Model Features 0 =3D > Memory Model Features 1 =3D <8bit VMID> > Memory Model Features 2 =3D <32bit CCIDX,48bit VA> > Debug Features 0 =3D > Debug Features 1 =3D <> > Auxiliary Features 0 =3D <> > Auxiliary Features 1 =3D <> > AArch32 Instruction Set Attributes 5 =3D = > AArch32 Media and VFP Features 0 =3D > AArch32 Media and VFP Features 1 =3D > C >=20 >=20 > For the OPi+2E examples: >=20 > CPU: ARM Cortex-A7 r0p5 (ECO: 0x00000000) > CPU Features:=20 > Multiprocessing, Thumb2, Security, Virtualization, Generic Timer, = VMSAv7, > PXN, LPAE, Coherent Walk > Optional instructions:=20 > SDIV/UDIV, UMULL, SMULL, SIMD(ext) > LoUU:2 LoC:3 LoUIS:2=20 > Cache level 1: > 32KB/64B 4-way data cache WB Read-Alloc Write-Alloc > 32KB/32B 2-way instruction cache Read-Alloc > Cache level 2: > 512KB/64B 8-way unified cache WB Read-Alloc Write-Alloc Note: as the issue applies to stable/13 and main [so: 14] (for example), I continue to use the freebsd-arm list instead of a list that reports commits to stable/* but not to main. Relative to: #define HWCAP_FP 0x00000001 #define HWCAP_ASIMD 0x00000002 #define HWCAP_EVTSTRM 0x00000004 #define HWCAP_AES 0x00000008 #define HWCAP_PMULL 0x00000010 #define HWCAP_SHA1 0x00000020 #define HWCAP_SHA2 0x00000040 #define HWCAP_CRC32 0x00000080 The single-bit enabled OPENSSL_armcap that gets the slow result is: # env OPENSSL_armcap=3D1 openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 28427.04k 30712.32k 31446.00k 31683.40k = 31829.10k 31839.55k The illegal instruction ones for aes-256-gcm were: # env OPENSSL_armcap=3D4 openssl speed -evp aes-256-gcm Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) env OPENSSL_armcap=3D32 openssl speed -evp aes-256-gcm Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) (sha256 does not match for what is illegal.) Ignoring the illegal-instruction producing bits, HWCAP_FP mixed with any one of the other bits was also similarly slow. As for all the non-illegal-instruction producing bits: also similarly slow: # env OPENSSL_armcap=3D219 openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 28922.63k 30711.51k 31522.15k 31722.15k = 31788.97k 31845.03k Disabling just HWCAP_FP from that got the fast category of result: # env OPENSSL_armcap=3D218 openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 49543.14k 58068.22k 60236.56k 60724.37k = 61216.09k 61212.99k As for sha256 . . . # env OPENSSL_armcap=3D0 openssl speed -evp sha256 . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes sha256 22434.19k 59895.91k 117258.16k 156264.31k = 172624.81k 173848.52k (I'll not list all the similar performing ones but will list all illegal-instruction producing ones.) # env OPENSSL_armcap=3D4 openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: 4082055 sha256's in 2.99s Doing sha256 for 3s on 64 size blocks: 2752520 sha256's in 3.02s Doing sha256 for 3s on 256 size blocks: 1372584 sha256's in 3.03s Doing sha256 for 3s on 1024 size blocks: 470215 sha256's in 3.11s Doing sha256 for 3s on 8192 size blocks: 64700 sha256's in 3.07s Doing sha256 for 3s on 16384 size blocks: 31847 sha256's in 3.00s Illegal instruction (core dumped) # env OPENSSL_armcap=3D16 openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: Illegal instruction (core dumped) (16 worked for aes-256-gcm but 32 did not.) So: no significantly slower examples of single enabled bit cases. No (non-illegal-instruction) 2-enabled-bits examples were dissimilar for the speed. For reference (avoiding illegal-instructions): # env OPENSSL_armcap=3D235 openssl speed -evp sha256 . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes sha256 23185.66k 62689.73k 125814.72k 167981.88k = 187833.65k 188968.95k So: also similar speed. Need any other specific bit combinations? =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)