Adding a V=R mapping for amd64?

Kostik Belousov kostikbel at gmail.com
Thu Sep 30 08:17:10 UTC 2010


On Wed, Sep 29, 2010 at 01:22:56PM -0700, Matthew Fleming wrote:
> On Wed, Sep 29, 2010 at 1:12 PM, Kostik Belousov <kostikbel at gmail.com> wrote:
> > On Wed, Sep 29, 2010 at 12:40:57PM -0700, Matthew Fleming wrote:
> >> I'm hacking around with making a "fast reboot" that puts a copy of the
> >> MBR from disk into address 0x7c00 and, after disabling various
> >> translation bits and stopping other CPUs, branches to it, to skip the
> >> hardware self test that normally happens on boot.
> >>
> >> I haven't gotten to the point of attempting to run the code at 0x7c00
> >> because I'm first hitting a different error.  Despite my attempts to
> >> enter a translation into the hardware page table, I get a panic trying
> >> to write to address 0x7000, where I intended to put the trampoline
> >> code that turns off translation.
> >>
> >> Rebooting...
> >> Attempt to reset to MBR...
> >> XXX attempting pmap_kenter()...
> >> XXX copying bootstrap code...
> >> panic @ time 1285760103.939, thread 0xffffff000775d960: Fatal trap 12:
> >> page fault while in kernel mode
> >>
> >> cpuid = 0
> >> Panic occurred in module kernel loaded at 0xffffffff80100000:
> >>
> >> Stack: --------------------------------------------------
> >> kernel:trap_fatal+0xac
> >> kernel:trap_pfault+0x24c
> >> kernel:trap+0x42e
> >> kernel:bcopy+0x16
> >> kernel:shutdown_reset+0x48
> >> kernel:boot+0x317
> >> kernel:reboot+0x60
> >> kernel:ia32_syscall+0x1cd
> >> --------------------------------------------------
> >> cpuid = 0; apic id = 00
> >> fault virtual address   = 0x7000
> >> fault code              = supervisor write data, page not present
> >> stack pointer           = 0x10:0xffffff8059e07670
> >> frame pointer           = 0x10:0xffffff8059e07780
> >>
> >> Here's what I think is the relevant snippets of code.  Note that I
> >> reserved the vm_page_t for physical page 7 as mbr_page early in boot,
> >> so I know the memory is free.
> >>
> >> void
> >> pmap_kenter_VR(vm_paddr_t pa)
> >> {
> >>       pmap_t pmap = kernel_pmap;
> >>       vm_page_t mpte;
> >>       pd_entry_t *pde;
> >>       pt_entry_t *pte;
> >>
> >>       vm_page_lock_queues();
> >>       PMAP_LOCK(pmap);
> >>       mpte = pmap_allocpte(pmap, pa, M_WAITOK);
> >>
> >>       pde = pmap_pde(pmap, pa);
> >>       if (pde == NULL || (*pde & PG_V) == 0)
> >>               panic("%s: invalid page directory va=%#lx", __func__, pa);
> >>       if ((*pde & PG_PS) != 0)
> >>               panic("%s: attempted pmap_enter on 2MB page", __func__);
> >>       pte = pmap_pde_to_pte(pde, pa);
> >>       if (pte == NULL)
> >>               panic("%s: no pte va=%#lx", __func__, pa);
> >>
> >>       if (*pte != 0) {
> >>               /* Remove extra pte reference. */
> >>               mpte->wire_count--;
> >>       }
> >>         pte_store(pte, pa | PG_RW | PG_V | PG_G | pg_nx);
> >>
> >>       vm_page_unlock_queues();
> >>       PMAP_UNLOCK(pmap);
> >> }
> >>
> >> Then in cpu_reset():
> >>
> >>       /*
> >>        * Establish a V=R mapping for the MBR page, and copy a
> >>        * reasonable guess at the size of the bootstrap code into the
> >>        * beginning of the page.
> >>        */
> >>       printf("XXX attempting pmap_kenter()...\n");
> >>       pmap_kenter_VR(trunc_page(mbaddr));
> >>       printf("XXX copying bootstrap code...\n");
> >>       to_copy = (uintptr_t)xxx_reset_end - (uintptr_t)xxx_reset_real;
> >>       if (to_copy > mbaddr - trunc_page(mbaddr))
> >>               to_copy = mbaddr - trunc_page(mbaddr);
> >>       bcopy(xxx_reset_real, (void *)trunc_page(mbaddr), to_copy);  /* die here */
> >>       printf("XXX attempting to turn off xlation and re-run MBR...\n");
> >>       xxx_reset_real(mbaddr);
> >>
> >>
> >> My first attempt was a call to
> >>       pmap_kenter(trunc_page(0x7c00), trunc_page(0x7c00));
> >> which failed trying to dereference the non-existent PDE.
> >>
> >> My second attempt called
> >>       pmap_enter(kernel_pmap, trunc_page(0x7c00), VM_PROT_WRITE, mbr_page,
> >>           VM_PROT_ALL, 0);
> >> That failed with the same crash as the attempt using pmap_kenter_VR().
> >>
> >> So... any thoughts as to why, after an apparently successful
> >> installation of an xlation, I still get a panic as though there were
> >> no xlation?
> >
> > Weird formatting of backtrace. Is this some proprietary code ?
> 
> Yeah, there's some Isilon local modifications.
> 
> > Why do you try to create 1-1 mapping at all ? The MBR code should be
> > executing in real mode anyway, and the mapping address at the moment of
> > bcopy does not matter at all. I think that the use of PHYS_TO_DMAP()
> > should give you direct mapping.
> 
> I assumed I need trampoline code.  The moment I turn off the xlation
> bits, if the instruction pointer I'm running from is the normal kernel
> addresses, won't I die horribly trying to access 0xffffffff80000000 or
> so in real mode?
Then, there is more then identity mapping. You need to gradually turn
off protected mode, in particular, load segment registers with 64KB
limited descriptors. After turning off paging, for far jumps to work,
you would need also identity-mapped GDT.

The procedure is documented in Intel IA32 arch manual, see vol 3A 9.9.3
"Switching Back to Real-Address Mode".

In fact, there is some magic location in the RAM that causes BIOS to
skip initialization and memory cleanup after reset, and instead make
it to execute jmp to fixed location in real mode. See Ralf Brown
interrupt list. In my copy, there is:

MEM 0040h:0067h - RESET RESTART ADDRESS
Size:	DWORD
Desc:	this address stores the address at which to resume execution after a
CPU reset (or jump to F000h:FFF0h) when certain magic values are
stored at 0040h:0072h or in CMOS RAM location 0Fh

This is typically used to avoid switching to real mode at all, instead
cpu is reset and BIOS helps you to get the control after reset.
> 
> > About the #pf that you see. I think that this is due to the fact that
> > you are modifying kernel pmap, while the active one is the pmap of
> > the user process which context issued reboot().
> 
> Isn't the kernel pmap active in the kernel, since I've entered via the
> syscall reboot(2) ?
No, we do not switch pmap and CR3 on the kernel<->user mode transition.
Both amd64 and i386 user pmaps have a copy of the top-level kernel directory
pointers (pml4 in amd64 case). This way the upper part of the address space
of every process has identical kernel mapping. See pmap_init().
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20100930/d6d80640/attachment.pgp


More information about the freebsd-hackers mailing list