Re: git: 95f773e59482 - main - i386 copyout_fast: improve detection of a fault on accessing userspace

From: Tijl Coosemans <tijl_at_FreeBSD.org>
Date: Thu, 08 Sep 2022 12:49:09 UTC
On Wed, 7 Sep 2022 23:18:32 +0300 Konstantin Belousov
<kostikbel@gmail.com> wrote:
> On Wed, Sep 07, 2022 at 06:38:04PM +0200, Tijl Coosemans wrote:
>> So interrupts must have been reenabled somehow, probably by the page
>> fault handler, and this allows context switches and then another process
>> can call copyout or copyin and corrupt the trampoline stack and
>> copyout_buf.  
> I do not see where the interrupts could be reenabled in copyout_fast path,
> without or with page fault on the userspace access.

The problem is not with userspace page faults.  Those are treated
specially by the page fault handler in exception.s causing copyout_fast
and copyin_fast to return immediately with EFAULT so copyout and copyin
fall back to doing a slow copy.

The problem is with page faults on the kernel space accesses.  Before
this commit they were also treated specially, and now they are not.  Now 
the page fault handler in exception.s calls trap() which calls
trap_pfault() etc.

> Also I am not sure what should your patch demonstrate.

My patch adds a trylock after cli and an unlock before sti.  If the
trylock fails panic("") is called.  I put the lock at the base of the
trampoline stack so it's per cpu.  The code between cli and sti is
supposed to be uninterruptible so the trylock should never fail, but it
does!  Somehow there's a context switch to another thread that calls
copyout or copyin while the first thread was still busy copying when it
was switched out.  This corrupts the trampoline stack and/or copyout_buf
of the first thread.

If this isn't caused by interrupts being reenabled then the context
switch must be happening in some other way.  And this must be happening
because page faults on kernel space accesses call trap() now.

> Also, pmap_trm_alloc() puts guard page between consequent allocations,
> so trampoline stack overflow must be very careful to not really overflow,
> but just touch enough bytes to give the effect.

I no longer think stack overflow happens.  I tried increasing the
trampoline stack to 4 pages (=KSTACK_PAGES) and the panic still looked
the same, only with stack addresses shifted by 3 pages.

> But lets check the hypothesis.  If interrupts are enabled somehow, then
> processor would execute interrupt/fault handler on the same stack, which
> is trampoline stack and not the kstack.  I added an INVARIANTS check that
> verifies that both trap() and syscall() use kstack.  Could you, please,
> fetch my ast branch and see what it the outcome?

I have not done this yet, but I don't think this will ever trigger.
For copyin_fast, interrupts are guaranteed to be disabled for copying
from user to copyout_buf.  In your ast branch subsequent copying from
copyout_buf to kernel space runs on the kstack so any interrupts here
won't use the trampoline stack.  For copyout_fast, page faults when
copying from the kernel to copyout_buf must be extremely rare because
the kernel just prepared the data to be copied.  And without page faults
interrupts stay disabled for subsequent copying from copyout_buf to user
space.  I've never seen the problem with copyout_fast, only with
copyin_fast.