From nobody Fri Jul 16 19:32:49 2021 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D073C1274E62 for ; Fri, 16 Jul 2021 19:32:55 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from drew.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "Sectigo RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GRLxM4nnlz4bBZ for ; Fri, 16 Jul 2021 19:32:55 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from smtpclient.apple (ip1f100e9c.dynamic.kabel-deutschland.de [31.16.14.156]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTPSA id 2ADEF721E282D; Fri, 16 Jul 2021 21:32:50 +0200 (CEST) Content-Type: text/plain; charset=utf-8 List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\)) Subject: Re: register x18 From: Michael Tuexen In-Reply-To: Date: Fri, 16 Jul 2021 21:32:49 +0200 Cc: Mark Millard , freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <06B96A5D-AF14-4EEC-8D11-B91F9683A0E8@freebsd.org> References: <86EC9C12-F90C-4D0C-BFA3-41986C9F07B5@freebsd.org> <32C24DDC-C8A1-43CD-9220-8009B229E452@freebsd.org> <4361A215-BB47-4166-BC3F-386E7834B788@freebsd.org> To: Andrew Turner X-Mailer: Apple Mail (2.3654.100.0.2.22) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-Rspamd-Queue-Id: 4GRLxM4nnlz4bBZ X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N > On 16. Jul 2021, at 17:53, Andrew Turner wrote: >=20 >=20 >> On 16 Jul 2021, at 17:07, Michael Tuexen wrote: >>=20 >>> On 16. Jul 2021, at 14:51, Andrew Turner = wrote: >>>=20 >>>=20 >>>> On 16 Jul 2021, at 13:08, tuexen@freebsd.org wrote: >>>>=20 >>>>> On 16. Jul 2021, at 04:06, Mark Millard wrote: >>>>>=20 >>>>>=20 >>>>>=20 >>>>> On 2021-Jul-15, at 17:40, Michael Tuexen = wrote: >>>>>=20 >>>>>> Dear all, >>>>>>=20 >>>>>> register x18 seems to be special. What is it used for in FreeBSD? >>>>>>=20 >>>>>> Best regards >>>>>> Michael >>>>>=20 >>>>> = https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64-bit-A= rchitecture/Register-use-in-the-AArch64-Procedure-Call-Standard/Parameters= -in-general-purpose-registers >>>>>=20 >>>>> reports: >>>>>=20 >>>>> QUOTE >>>>> =E2=80=A2 X18 is the platform register and is reserved for the = use of platform ABIs. This is an adional temporary register on platforms = that don't assign a special meaning to it. >>>>> END QUOTE >>>>>=20 >>>>> So, special, yes. But I do not know what the "platform ABI" usage >>>>> for it might be on FreeBSD. So, for the most part, this does not >>>>> well-answer your question. Sorry. >>>> Yepp, I found the above text. However, x18 seems to be used when = accessing >>>> global variables. I am looking at a panic, where the system panics = on accessing >>>> global variable, which can be controlled by sysctl. >>>> It seems that x18 does not have the expected value, but it is also = not set in >>>> the function... >>>=20 >>> X18 is used to store the pointer to the pcpu data It should only = ever be set when we enter the kernel from userland by the exception = handler. >> Hi Andrew, >>=20 >> thanks for the response. Hmm. I was hoping that the answers helps me = to understand >> a panic that I'm observing when stress testing the TCP RACK stack. = I'm transferring >> 10GB via scp and at some point of time (not right at the beginning), = the machine panics. >> The machine is an eMAG system. >>=20 >> Here is what I know: >>=20 >> Initially it panics multiple times (always at the same place) in >> = https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16540 >> when it is trying to read V_tcp_map_entries_limit. >>=20 >> I discussed this with rrs@ and since we had no clue, I tried to just = compile >> out the if condition. >>=20 >> Then is paniced in >> = https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16928 >> at >> = https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n15664 >> which is basically the next place where a V_ variable is accessed. >>=20 >> Please note that for debugging I'm using a kernel without VIMAGE = support, >> since we initially thought that it might be related a VNET bug. >>=20 >> So I decided to look at the disassembly of rack_sndbuf_autoscale (I = added some comments): >>=20 >> 0xffff000001388a6c <+0>: stp x29, x30, [sp, #-32]! >> 0xffff000001388a70 <+4>: str x19, [sp, #16] >> 0xffff000001388a74 <+8>: mov x29, sp >> 0xffff000001388a78 <+12>: ldr x9, [x0, #24] = // x9 =3D rack->tp; >> 0xffff000001388a7c <+16>: ldr w8, [x0, #188] = // w8 =3D rack->r_ctl.cwnd_to_use >> 0xffff000001388a80 <+20>: adrp x12, 0xffff0000013ac000 >> 0xffff000001388a84 <+24>: ldr w10, [x9, #52] = // w10 =3D tp->snd_wnd; >> 0xffff000001388a88 <+28>: ldr x11, [x18] >> 0xffff000001388a8c <+32>: ldr x11, [x11, #1256] >> 0xffff000001388a90 <+36>: cmp w8, w10 >> 0xffff000001388a94 <+40>: csel w10, w8, w10, cc // cc =3D lo, = ul, last // min(rack->r_ctl.cwnd_to_use, tp->snd_wnd); >> =3D> 0xffff000001388a98 <+44>: ldr x11, [x11, #40] >> 0xffff000001388a9c <+48>: ldr x12, [x12, #2752] >> 0xffff000001388aa0 <+52>: ldr w11, [x11, x12] = // w11 =3D V_tcp_do_autosndbuf ??? >> 0xffff000001388aa4 <+56>: cbz w11, 0xffff000001388be0 = >> 0xffff000001388aa8 <+60>: ldr x8, [x0, #32] = // x8 =3D rack->rc_inp >> 0xffff000001388aac <+64>: ldr x19, [x8, #120] = // x19 =3D so =3D x8->inp_socket >> 0xffff000001388ab0 <+68>: ldrb w8, [x19, #817] = // w8 =3D (x19->so_snd.sb_flags << 8) & 0ff >> 0xffff000001388ab4 <+72>: tbz w8, #3, 0xffff000001388be0 = so->so_snd.sb_flags & SB_AUTOSIZE =3D=3D 0 >> 0xffff000001388ab8 <+76>: ldr w11, [x9, #52] = // w11 =3D tp->snd_wnd >> 0xffff000001388abc <+80>: ldr w8, [x19, #740] = // w8 =3D so->so_snd.sb_hiwat >> 0xffff000001388ac0 <+84>: lsr w11, w11, #2 >> 0xffff000001388ac4 <+88>: add w11, w11, w11, lsl #2 >> 0xffff000001388ac8 <+92>: cmp w11, w8 >> 0xffff000001388acc <+96>: b.cc 0xffff000001388be0 = // b.lo, b.ul, b.last >> 0xffff000001388ad0 <+100>: ldr w11, [x19, #736] >> 0xffff000001388ad4 <+104>: lsr w8, w8, #3 >> 0xffff000001388ad8 <+108>: lsl w12, w8, #3 >> 0xffff000001388adc <+112>: sub w8, w12, w8 >> 0xffff000001388ae0 <+116>: cmp w11, w8 >> 0xffff000001388ae4 <+120>: b.cc 0xffff000001388be0 = // b.lo, b.ul, b.last >> 0xffff000001388ae8 <+124>: ldr x8, [x18] >> 0xffff000001388aec <+128>: ldr x8, [x8, #1256] >> 0xffff000001388af0 <+132>: ldr x12, [x8, #40] >> 0xffff000001388af4 <+136>: adrp x8, 0xffff0000013ac000 >> 0xffff000001388af8 <+140>: ldr x8, [x8, #2760] >> 0xffff000001388afc <+144>: ldr w12, [x12, x8] >> 0xffff000001388b00 <+148>: cmp w11, w12 >>=20 >> So it seems that the code accessing V_tcp_do_autosndbuf is: >>=20 >> 0xffff000001388a80 <+20>: adrp x12, 0xffff0000013ac000 >> ... >> 0xffff000001388a88 <+28>: ldr x11, [x18] >> 0xffff000001388a8c <+32>: ldr x11, [x11, #1256] >> ... >> =3D> 0xffff000001388a98 <+44>: ldr x11, [x11, #40] >> 0xffff000001388a9c <+48>: ldr x12, [x12, #2752] >> 0xffff000001388aa0 <+52>: ldr w11, [x11, x12] = // w11 =3D V_tcp_do_autosndbuf ??? >>=20 >> and for V_tcp_autosndbuf_max it is: >> 0xffff000001388ae8 <+124>: ldr x8, [x18] >> 0xffff000001388aec <+128>: ldr x8, [x8, #1256] >> 0xffff000001388af0 <+132>: ldr x12, [x8, #40] >> 0xffff000001388af4 <+136>: adrp x8, 0xffff0000013ac000 >> 0xffff000001388af8 <+140>: ldr x8, [x8, #2760] >> 0xffff000001388afc <+144>: ldr w12, [x12, x8] >>=20 >> The #2752 versus #2760 could be the offset of the variable. >>=20 >> Does the above code makes sense to you? The code relevant for the = crash seems to be: >>=20 >> 0xffff000001388a88 <+28>: ldr x11, [x18] >> 0xffff000001388a8c <+32>: ldr x11, [x11, #1256] >> 0xffff000001388a98 <+44>: ldr x11, [x11, #40] >>=20 >> Since it is crashing at 0xffff000001388a98 <+44>, my assumption was = that x18 is wrong... >> But does this use fit to your description? >=20 > This code is loading curthread from the pcpu data, then loading = whatever is 1256 bytes within struct thread. I checked the offset of = td_vnet and found it was at the correct location so it would appear to = be using VIMAGE and has a bad vnet pointer. >=20 > The other assembly above also looks like it=E2=80=99s using VIMAGE as = they have similar code with the same offsets. >=20 >>=20 >> I'm trying to debug this on arm64, since I can reproduce it on arm64. = But there is >> also a bug report that this happens on amd64: = https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D257195 >>=20 >> Any idea what can be wrong? Any hint how to progress? >=20 > If you can reproduce of amd64 it might pay to test with KASAN. >=20 > How stable is the bad pointer value? It might pay to add KASSERTS to = the code to check curvnet (the macro to get td_vnet) is not the bad = value, or at least greater than VM_MIN_KERNEL_ADDRESS. Thank you very much! I double checked my kernel config, and after disabling VIMAGE, it was = enabled again. So, yes this is a VIMAGE kernel and I guess the problem is related to = it. Your explanations were very helpful. Best regards Michael >=20 > Andrew