Re: amd64 syscall ABI (vs. Darwin)

From: Damian's Proton Mail <damian_at_dmcyk.xyz>
Date: Mon, 17 Jan 2022 23:14:55 UTC
> On 17 Jan 2022, at 23:51, Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Mon, Jan 17, 2022 at 10:31:09PM +0000, Damian's Proton Mail wrote:
>
>>> On 17 Jan 2022, at 14:38, Konstantin Belousov <kostikbel@gmail.com> wrote:
>>
>>> Look at the sys/amd64/amd64/exceptions.S. The fast_syscall entry point
>>> is where we receive control after the syscall instruction.
>>
>> A lot of new things in there for me, but the flow is clear. I was able to find corresponding logic in XNU’s sources too. Earlier I said:
>>
>>> At a first glance Darwin approach seems more optimal
>>
>> But it’s instead the opposite/no difference at all, as in Darwin, they explicitly restore/set all registers, including callee saved r12-r15.
>>
>> Explicitly preserving registers would prevent kernel data leakage too. Doing so in FreeBSD would also be an ABI compatible change I think, since users shouldn’t rely on values in those registers.
>> I’m curious if you see any obvious pros/cons with either approach, or is it just a more arbitrary implementation choice?
>
> We preserve everything on syscall entry, it is the SYSCALL instruction
> behavior that makes it look somewhat convoluted. I suggest you to read
> the SDM description of the SYSCALL instruction to understand the registers
> manipulations on entry.
>
> On the other hand, on the fast syscall return, we indeed not restore
> everything. If you want to restore full frame, use PCB_FULL_IRET pcb
> flag to request iretq return path.
>
>> Not that I’d propose changing the ABI though, I also want my toy project to work as a plug-in kernel module.
>> I guess the only other option to emulate Darwin's behaviour would be to intercept syscalls in userspace somehow first and manually preserve the register values?
>
> To emulate Darwin, you would need specific ABI personality (sysent) in the
> kernel, which would also provide sv_syscall_ret method. The method can
> do whatever is needed to the return frame, and set PCB_FULL_IRET to indicate
> that kernel should load it into CPU GPR file as is.
>
> BTW, does Darwin use SYSCALL instruction for syscall entry on amd64?

Yes, it also uses SYSCALL. Also rax/rdx for return values and the carry bit to indicate errors.
Even the syscall numbers are similar. They use different masks to distinguish BSD/Mach syscalls, but the effective BSD syscall numbers seem to be the same so far.
So I already had sysent hooks, and PCB_FULL_IRET works indeed, thanks!