Re: amd64 syscall ABI (vs. Darwin)

From: Damian's Proton Mail <damian_at_dmcyk.xyz>
Date: Mon, 17 Jan 2022 22:31:09 UTC
> On 17 Jan 2022, at 14:38, Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Mon, Jan 17, 2022 at 12:41:59PM +0000, Damian Malarczyk wrote:
>> Hello,
>>
>> I'm hacking on a toy project to run Darwin (MachO) binaries on FreeBSD.
>> Currently I'm at a stage of syscalls support, and I've noticed a difference in the amd64 ABI that I didn't expect.
>>
>> FreeBSD is changing values of some registers that aren't used as the syscall output. e.g., r8-r11 are changed, while r12-r15 don't seem to be affected.
>> That's not the case on Darwin, from what I've seen onlyrax, rdx used as syscall results are changed.
>> It looks like FreeBSD's syscalls calling convention is more like standard function calling, and r8-r11 should be always caller saved.
> It is not 'more like'. FreeBSD follows C ABI for amd64 for syscall
> registers handling. An additional twist is that the registers which are
> declared as calleee-clobered are zeroed to avoid kernel data leakage to
> userspace.
Oh I see, this explains it then.

>>
>> At a first glance Darwin approach seems more optimal, as less registers get clobbered. Is there any specific reason why this isn't also the case on FreeBSD?
>> I'm also wondering where exactly the register values are changed. When I look at thetrapframe contents in the sv_set_syscall_retvalsystem vector callback the r8 register value is same as on the input, so it must be changed somewhere later. Does anyone know where exactly this happens?
>
> Look at the sys/amd64/amd64/exceptions.S.  The fast_syscall entry point
> is where we receive control after the syscall instruction.
A lot of new things in there for me, but the flow is clear. I was able to find corresponding logic in XNU’s sources too. Earlier I said:

> At a first glance Darwin approach seems more optimal
But it’s instead the opposite/no difference at all, as in Darwin, they explicitly restore/set all registers, including callee saved r12-r15.

Explicitly preserving registers would prevent kernel data leakage too. Doing so in FreeBSD would also be an ABI compatible change I think, since users shouldn’t rely on values in those registers.
I’m curious if you see any obvious pros/cons with either approach, or is it just a more arbitrary implementation choice?

Not that I’d propose changing the ABI though, I also want my toy project to work as a plug-in kernel module.
I guess the only other option to emulate Darwin's behaviour would be to intercept syscalls in userspace somehow first and manually preserve the register values?