Re: git: 95f773e59482 - main - i386 copyout_fast: improve detection of a fault on accessing userspace

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Wed, 07 Sep 2022 20:18:32 UTC
On Wed, Sep 07, 2022 at 06:38:04PM +0200, Tijl Coosemans wrote:
> On Tue, 6 Sep 2022 23:17:45 +0200 Tijl Coosemans <tijl@FreeBSD.org> wrote:
> > On Tue, 6 Sep 2022 18:30:01 +0300 Konstantin Belousov
> > <kostikbel@gmail.com> wrote:
> >> I suspect you see that leftover panics, which I am working on right now.  
> > 
> > Yes, it looks like this:
> > 
> > panic: vm_fault_lookup: fault on nofault entry, addr: 0
> > 
> > GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD]
> > Copyright (C) 2022 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.
> > Type "show copying" and "show warranty" for details.
> > This GDB was configured as "i386-portbld-freebsd14.0".
> > Type "show configuration" for configuration details.
> > For bug reporting instructions, please see:
> > <https://www.gnu.org/software/gdb/bugs/>.
> > Find the GDB manual and other documentation resources online at:
> >     <http://www.gnu.org/software/gdb/documentation/>.
> > 
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from /boot/kernel/kernel...
> > Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
> > 
> > Unread portion of the kernel message buffer:
> > panic: vm_fault_lookup: fault on nofault entry, addr: 0
> > time = 1662487582
> > KDB: stack backtrace:
> > db_trace_self_wrapper(5,ffffffff,0,0,0,...) at db_trace_self_wrapper+0x28/frame 0xffc09cf4
> > kdb_backtrace(1a716ae0,100,e340e0,ffc09db4,ffc09e00,...) at kdb_backtrace+0x2b/frame 0xffc09d4c
> > vpanic(bef316,ffc09d88,ffc09d88,ffc09dac,b3181a,...) at vpanic+0xe9/frame 0xffc09d68
> > panic(bef316,be68d1,0,ffc09dc4,480f308,...) at panic+0x14/frame 0xffc09d7c
> > vm_fault_lookup(0,0,1a710701,0,0,...) at vm_fault_lookup+0x13a/frame 0xffc09dac
> > vm_fault(e340e0,0,1,0,0) at vm_fault+0x79/frame 0xffc09e30
> > vm_fault_trap(e340e0,3b,1,0,0,0) at vm_fault_trap+0x44/frame 0xffc09e58
> > trap_pfault(3b,0,0) at trap_pfault+0x119/frame 0xffc09ea0
> > trap(ffc09f6c,8,28,28,e247000,...) at trap+0x2d4/frame 0xffc09f60
> > calltrap() at 0xffc031ff/frame 0xffc09f60
> > --- trap 0xc, eip = 0x3b, esp = 0xffc09fac, ebp = 0xffc033dc ---
> > (null)() at 0x3b/frame 0xffc033dc
> > KDB: enter: panic
> > 
> > 0x009a1dfd in dump_savectx ()
> >     at /home/tijl/freebsd/base/main/sys/kern/kern_shutdown.c:404
> > (kgdb) #0  0x009a1dfd in dump_savectx ()
> >     at /home/tijl/freebsd/base/main/sys/kern/kern_shutdown.c:404
> > Backtrace stopped: Cannot access memory at address 0xffc09af4
> 
> The eip = 0x3b here looks like the size of a copy so I added a test on
> top of your patch to see if copyout_fast or copyin_fast was called a
> second time (see attached patch) because that would push the size where
> eip is in the trapframe.  This triggered after running for a while with
> this backtrace:
> 
> Unread portion of the kernel message buffer:
> panic: 
> time = 1662560141
> KDB: stack backtrace:
> db_trace_self_wrapper(5,ffffffff,0,0,0,...) at db_trace_self_wrapper+0x28/frame 0x1b0bca0c
> kdb_backtrace(1c1f93a0,100,1b0bcb1a,ffc06ff0,141e000,...) at kdb_backtrace+0x2b/frame 0x1b0bca64
> vpanic(b8da7d,1b0bcaa0,1b0bcaa0,1b0bcaac,ffc042af,...) at vpanic+0xe9/frame 0x1b0bca80
> panic(b8da7d,fb028f96,2,0,1b0bcadc,...) at panic+0x14/frame 0x1b0bca94
> __stop_set_sysinit_set(1b0bcb1a,fb028f96,2,141e000) at 0xffc042af/frame 0x1b0bcaac
> copyout(1b0bcb1a,fb028f96,2) at copyout+0x58/frame 0x1b0bcadc
> pollout(fb028f90,2) at pollout+0x3c/frame 0x1b0bcb04
> kern_poll(1c1f93a0,fb028f90,2,1b0bcc3c,0) at kern_poll+0x83/frame 0x1b0bcc20
> sys_poll(1c1f93a0,1c1f9644) at sys_poll+0x54/frame 0x1b0bcc48
> syscallenter(1,0,1c1f93a0,1b0bcd30,480f318,...) at syscallenter+0xc0/frame 0x1b0bcc78
> syscall(1b0bcce8,3b,3b,3b,3dfb1300,...) at syscall+0x28/frame 0x1b0bccdc
> Xint0x80_syscall() at 0xffc03419/frame 0x1b0bccdc
> --- syscall (4, FreeBSD ELF32, sys_write), eip = 0x2058c67b, esp = 0xffbfc524, ebp = 0x1 ---
> KDB: enter: panic
> 
> 
> So interrupts must have been reenabled somehow, probably by the page
> fault handler, and this allows context switches and then another process
> can call copyout or copyin and corrupt the trampoline stack and
> copyout_buf.
I do not see where the interrupts could be reenabled in copyout_fast path,
without or with page fault on the userspace access.  Also I am not sure
what should your patch demonstrate.

Also, pmap_trm_alloc() puts guard page between consequent allocations,
so trampoline stack overflow must be very careful to not really overflow,
but just touch enough bytes to give the effect.

But lets check the hypothesis.  If interrupts are enabled somehow, then
processor would execute interrupt/fault handler on the same stack, which
is trampoline stack and not the kstack.  I added an INVARIANTS check that
verifies that both trap() and syscall() use kstack.  Could you, please,
fetch my ast branch and see what it the outcome?