Re: register x18

From: Michael Tuexen <tuexen_at_freebsd.org>
Date: Fri, 16 Jul 2021 17:07:42 +0200
> On 16. Jul 2021, at 14:51, Andrew Turner <andrew_at_fubar.geek.nz> wrote:
> 
> 
>> On 16 Jul 2021, at 13:08, tuexen_at_freebsd.org wrote:
>> 
>>> On 16. Jul 2021, at 04:06, Mark Millard <marklmi_at_yahoo.com> wrote:
>>> 
>>> 
>>> 
>>> On 2021-Jul-15, at 17:40, Michael Tuexen <tuexen at freebsd.org> wrote:
>>> 
>>>> Dear all,
>>>> 
>>>> register x18 seems to be special. What is it used for in FreeBSD?
>>>> 
>>>> Best regards
>>>> Michael
>>> 
>>> https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64-bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Standard/Parameters-in-general-purpose-registers
>>> 
>>> reports:
>>> 
>>> QUOTE
>>> 	• X18 is the platform register and is reserved for the use of platform ABIs. This is an adional temporary register on platforms that don't assign a special meaning to it.
>>> END QUOTE
>>> 
>>> So, special, yes. But I do not know what the "platform ABI" usage
>>> for it might be on FreeBSD. So, for the most part, this does not
>>> well-answer your question. Sorry.
>> Yepp, I found the above text. However, x18 seems to be used when accessing
>> global variables. I am looking at a panic, where the system panics on accessing
>> global variable, which can be controlled by sysctl.
>> It seems that x18 does not have the expected value, but it is also not set in
>> the function...
> 
> X18 is used to store the pointer to the pcpu data It should only ever be set when we enter the kernel from userland by the exception handler.
Hi Andrew,

thanks for the response. Hmm. I was hoping that the answers helps me to understand
a panic that I'm observing when stress testing the TCP RACK stack. I'm transferring
10GB via scp and at some point of time (not right at the beginning), the machine panics.
The machine is an eMAG system.

Here is what I know:

Initially it panics multiple times (always at the same place) in
https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16540
when it is trying to read V_tcp_map_entries_limit.

I discussed this with rrs_at_ and since we had no clue, I tried to just compile
out the if condition.

Then is paniced in
https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16928
at
https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n15664
which is basically the next place where a V_ variable is accessed.

Please note that for debugging I'm using a kernel without VIMAGE support,
since we initially thought that it might be related a VNET bug.

So I decided to look at the disassembly of rack_sndbuf_autoscale (I added some comments):

   0xffff000001388a6c <+0>:	stp	x29, x30, [sp, #-32]!
   0xffff000001388a70 <+4>:	str	x19, [sp, #16]
   0xffff000001388a74 <+8>:	mov	x29, sp
   0xffff000001388a78 <+12>:	ldr	x9, [x0, #24]				// x9 = rack->tp;
   0xffff000001388a7c <+16>:	ldr	w8, [x0, #188]				// w8 = rack->r_ctl.cwnd_to_use
   0xffff000001388a80 <+20>:	adrp	x12, 0xffff0000013ac000
   0xffff000001388a84 <+24>:	ldr	w10, [x9, #52]				// w10 = tp->snd_wnd;
   0xffff000001388a88 <+28>:	ldr	x11, [x18]
   0xffff000001388a8c <+32>:	ldr	x11, [x11, #1256]
   0xffff000001388a90 <+36>:	cmp	w8, w10
   0xffff000001388a94 <+40>:	csel	w10, w8, w10, cc  // cc = lo, ul, last	// min(rack->r_ctl.cwnd_to_use, tp->snd_wnd);
=> 0xffff000001388a98 <+44>:	ldr	x11, [x11, #40]
   0xffff000001388a9c <+48>:	ldr	x12, [x12, #2752]
   0xffff000001388aa0 <+52>:	ldr	w11, [x11, x12]				// w11 = V_tcp_do_autosndbuf ???
   0xffff000001388aa4 <+56>:	cbz	w11, 0xffff000001388be0 <rack_sndbuf_autoscale+372>
   0xffff000001388aa8 <+60>:	ldr	x8, [x0, #32]				// x8 = rack->rc_inp
   0xffff000001388aac <+64>:	ldr	x19, [x8, #120]				// x19 = so = x8->inp_socket
   0xffff000001388ab0 <+68>:	ldrb	w8, [x19, #817]				// w8 = (x19->so_snd.sb_flags << 8) & 0ff
   0xffff000001388ab4 <+72>:	tbz	w8, #3, 0xffff000001388be0 <rack_sndbuf_autoscale+372> so->so_snd.sb_flags & SB_AUTOSIZE == 0
   0xffff000001388ab8 <+76>:	ldr	w11, [x9, #52]				// w11 = tp->snd_wnd
   0xffff000001388abc <+80>:	ldr	w8, [x19, #740]				// w8 = so->so_snd.sb_hiwat
   0xffff000001388ac0 <+84>:	lsr	w11, w11, #2
   0xffff000001388ac4 <+88>:	add	w11, w11, w11, lsl #2
   0xffff000001388ac8 <+92>:	cmp	w11, w8
   0xffff000001388acc <+96>:	b.cc	0xffff000001388be0 <rack_sndbuf_autoscale+372>  // b.lo, b.ul, b.last
   0xffff000001388ad0 <+100>:	ldr	w11, [x19, #736]
   0xffff000001388ad4 <+104>:	lsr	w8, w8, #3
   0xffff000001388ad8 <+108>:	lsl	w12, w8, #3
   0xffff000001388adc <+112>:	sub	w8, w12, w8
   0xffff000001388ae0 <+116>:	cmp	w11, w8
   0xffff000001388ae4 <+120>:	b.cc	0xffff000001388be0 <rack_sndbuf_autoscale+372>  // b.lo, b.ul, b.last
   0xffff000001388ae8 <+124>:	ldr	x8, [x18]
   0xffff000001388aec <+128>:	ldr	x8, [x8, #1256]
   0xffff000001388af0 <+132>:	ldr	x12, [x8, #40]
   0xffff000001388af4 <+136>:	adrp	x8, 0xffff0000013ac000
   0xffff000001388af8 <+140>:	ldr	x8, [x8, #2760]
   0xffff000001388afc <+144>:	ldr	w12, [x12, x8]
   0xffff000001388b00 <+148>:	cmp	w11, w12

So it seems that the code accessing V_tcp_do_autosndbuf is:

   0xffff000001388a80 <+20>:	adrp	x12, 0xffff0000013ac000
...
   0xffff000001388a88 <+28>:	ldr	x11, [x18]
   0xffff000001388a8c <+32>:	ldr	x11, [x11, #1256]
...
=> 0xffff000001388a98 <+44>:	ldr	x11, [x11, #40]
   0xffff000001388a9c <+48>:	ldr	x12, [x12, #2752]
   0xffff000001388aa0 <+52>:	ldr	w11, [x11, x12]				// w11 = V_tcp_do_autosndbuf ???

and for V_tcp_autosndbuf_max it is:
   0xffff000001388ae8 <+124>:	ldr	x8, [x18]
   0xffff000001388aec <+128>:	ldr	x8, [x8, #1256]
   0xffff000001388af0 <+132>:	ldr	x12, [x8, #40]
   0xffff000001388af4 <+136>:	adrp	x8, 0xffff0000013ac000
   0xffff000001388af8 <+140>:	ldr	x8, [x8, #2760]
   0xffff000001388afc <+144>:	ldr	w12, [x12, x8]

The #2752 versus #2760 could be the offset of the variable.

Does the above code makes sense to you? The code relevant for the crash seems to be:

0xffff000001388a88 <+28>:	ldr	x11, [x18]
0xffff000001388a8c <+32>:	ldr	x11, [x11, #1256]
0xffff000001388a98 <+44>:	ldr	x11, [x11, #40]

Since it is crashing at 0xffff000001388a98 <+44>, my assumption was that x18 is wrong...
But does this use fit to your description?

I'm trying to debug this on arm64, since I can reproduce it on arm64. But there is
also a bug report that this happens on amd64: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257195

Any idea what can be wrong? Any hint how to progress?

Thank you very much for your help!

Best regards
Michael
> 
> Andrew
Received on Fri Jul 16 2021 - 15:07:42 UTC

Original text of this message