Re: Still did not succeed to boot on Lenovo Yoga C630

From: Warner Losh <imp_at_bsdimp.com>
Date: Sun, 18 Dec 2022 20:33:23 UTC
On Sun, Dec 18, 2022 at 4:31 AM Hiroo Ono (小野寛生) <
hiroo.ono+freebsd@gmail.com> wrote:

> Hello,
>
> I investigated a little more.
> I thought it was the kernel that did not run, but still it did not get
> through the loader.
>

Keep at it...


> The loader freezed in efi_do_vmap(), so I needed to add
> efi_disable_vmap="YES" in loader.conf.
>

No. The code for this needs to be fixed...  More on that in a second...


> At last, in elf64_exec, it tried to run (*entry)(modulep) where the
> address entry is calculated from ehdr->e_entry by efi_translate() in
> stand/efi/loader/copy.c. There,
> ehdr->e_entry was 0xffff000000020000 (should be 0xffff000000000800,
> but I modified a little) and
> stage_offset was 0x10000b33e0000.
> The sum of two which efi_translate() returns is 0x100000000b3400000.
> It overflows uint64_t and becomes 0xb3400000.
>

Yea, I think this is wrong.


> Currently, I do not understand well what the functions in
> stand/efi/loader/copy.c do, and do not know how to workaround this
> problem.
>

So there's two things going on here. First is that on arm64 we should
*NEVER* copy the
kernel. It loads at a specific address and we jump to where it loaded in
RAM (in your case,
I think it should be stage_offset + (ehdr->e_entry - KERNBASE). Kernbase is
0xffff000000000000
so we should jump to 0x10000brre0000 + 0x20000 (or maybe 0x800 is you
suggest). The kernel
code that's there should do some tricks to find out where it was loaded,
turn on the MMU and
then jump to the VA to continue starting up the kernel. The arm64 kernel is
linked with a VA. Old amd64
kernels expected to start at a fixed physical address, but the UEFI spec
allows memory to be mapped anywhere
which means it was recently switched to create a page table in the boot
loader, then jump to the right
VA, and use the page table to find what PA that is and use that to
bootstrap the pmap. This works great on
amd64, but sometimes goes astray on arm64 (though the way it does for you
doesn't make sense
to me). The amd64 code used to start at a PA, and that's what the 'copy'
routine is supposed to do:
copy the kernel down that fixed address and jump to it. I don't think we'll
ever want that on arm64, though,
and that might also be getting in your way (thought I'm doing this from
memory w/o careful study of
the code because it's fresh in my mind due to getting arm64 working with
linuxboot).

Also, vmap *MUST* be called in the boot loader. The trouble is, it assumes
VA == PA, but that's not
strictly true. If you boot via LinuxBoot, for example, it has a memory
mapping that's not VA == PA so
at least some parts of the kernel fail their VA == PA asserts. the vmap
code in the loader currently
blindly assumes VA == PA, but it should, IMHO, only do that if the VA from
entry from the table from
the get memory map call is 0. Today it blindly overwrites it. You might try
changing that, and removing
the bit in the kernel that checks for VA == PA and bails out if there's a
mismatch. Here's the patch I'm
temporarily using until I have the time to do more than a quick,
superficial analysis of the issue:

diff --git a/sys/arm64/arm64/efirt_machdep.c
b/sys/arm64/arm64/efirt_machdep.c
index 727c9c37f94d..075174d164d8 100644
--- a/sys/arm64/arm64/efirt_machdep.c
+++ b/sys/arm64/arm64/efirt_machdep.c
@@ -193,8 +193,8 @@ efi_create_1t1_map(struct efi_md *map, int ndesc, int
descsz)
                        continue;
                if (p->md_virt != 0 && p->md_virt != p->md_phys) {
                        if (bootverbose)
-                               printf("EFI Runtime entry %d is mapped\n",
i);
-                       goto fail;
+                               printf("EFI Runtime entry %d is mapped PA
%#lx VA %#lx\n", i, p->md_phys, p->md_virt);
+//                     goto fail;
                }
                if ((p->md_phys & EFI_PAGE_MASK) != 0) {
                        if (bootverbose)

clearly, not suitable for upstreaming, eh? And I have about 2 dozen commits
in my queue ahead of that
one that need refinement, review and upstreaming before I jump into this
issue. It will be after the first
of the year at least before I'll look at it since I just started my
year-end vacation...

Warner


> 2022年12月9日(金) 9:25 Hiroo Ono (小野寛生) <hiroo.ono+freebsd@gmail.com>:
> >
> > 2022年12月9日(金) 3:19 Warner Losh <imp@bsdimp.com>:
> > >
> > >
> > >
> > > On Wed, Dec 7, 2022 at 4:21 PM Hiroo Ono (小野寛生) <
> hiroo.ono+freebsd@gmail.com> wrote:
> > >>
> > >> 2022年12月7日(水) 1:18 Warner Losh <imp@bsdimp.com>:
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Dec 6, 2022 at 7:59 AM Hiroo Ono (小野寛生) <
> hiroo.ono+freebsd@gmail.com> wrote:
> > >>
> > >> >> OK, I (and the subject) was wrong. The loader boots, and show
> > >> >> following log at last:
> > >> >>
> > >> >> Loading kernel...
> > >> >> /boot/kernel/kernel text=0x2a8 text=0x8bcbf0 text=0x1f97ac
> > >> >> data=0x1a6ac0 data=0x0+0x381000 syms=[0x8+0x11f6a0+0x8+0x1439ea]
> > >> >> Loading configured modules...
> > >> >> can't find '/boot/entropy'
> > >> >> can't find '/etc/hostid'
> > >> >> No valid device tree blob found!
> > >> >> WARNING! Trying to fire up the kernel, but no device tree blob
> found!
> > >> >> EFI framebuffer information
> > >> >> addr, size        0x80400000, 0x7e9000
> > >> >> dimensions     1920 x 1080
> > >> >> stride             1920
> > >> >> masks            0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
> > >> >>
> > >> >> and it stops here. No "<<BOOT>>" line is displayed.
> > >> >> So, it seems that the kernel is loaded but could not be started.
> > >> >
> > >> >
> > >> > There are several causes of this.
> > >> >
> > >> > Most likely is that the console is setup to go somewhere else.
> Though if you are on the video display and getting that framebuffer output,
> it won't not go there w/o some setting to override (say to force serial).
> > >>
> > >> In the loader, when comconsole->c_init() is called for the second
> > >> time, the function does not return. (I commented out comconsole to
> > >> make the loader work, but it is rather brutal and is not a proper
> > >> solution).
> > >> But the function parse_uefi_con_out() in stand/efi/loader/main.c
> > >> always returns RB_SERIAL, so the loader tries to use the serial
> > >> console.
> > >
> > >
> > > I wonder why that is. Is this -current or -stable? I have a rather
> large backlog of MFC-able loader changes. If it is with stable, then it
> makes sense: I fixed a bug where parse_uefi_con_out would return serial if
> '8be4df61-93ca-11d2-aa0d-00e098032b8c-ConOut' was unset. Is it set?  Now we
> return Video console if we fine evidence there's a video console.
> >
> > It is stable/13.
> > I tried 14-current, and the same change to loader was needed (merging
> > OpenBSD's start.S and ldscript.arm64, and commenting out comconsole).
> > Even with these change, the console defaults to serial, so I changed
> > parse_uefi_con_out() to always return 0.
> > Still, it stops at the same point. The kernel does not seem to boot.
> >
> > Running efi-show from the loader prompt did not show
> > '8be4df61-93ca-11d2-aa0d-00e098032b8c-ConOut'
> > The variable name containing 'ConOut' were:
> >
> > global NV,BS,RS ConOut =
> >
> VenHw(9042A9DE-23DC-4A38-96FB-7ADED080516A),/VenHw(857A8741-0EEC-43BD-0482-27D14ED47D72)/Uart(115200,8,N,1)
> > global NV,BS,RS ConOutDev =
> >
> VenHw(9042A9DE-23DC-4A38-96FB-7ADED080516A),/VenHw(857A8741-0EEC-43BD-0482-27D14ED47D72)/Uart(115200,8,N,1)
> >
> > > Now, why it fails the second time, I don't know.
> > >
> > >>
> > >> If a similar thing happens with the kernel, it may be stopping at
> > >> serial console initialization.
> > >
> > >
> > > The kernel doesn't use the EFI routines to initialize the serial
> console. But if the kernel is being told the wrong console, then it could
> also be booting just fine or almost fine and hitting some bug later.
> > >
> > >>
> > >> > Next most likely is that FreeBSD doesn't cope well with having both
> FDT and ACPI information available. But since not DTB is being passed in
> (per that message) that's not likely at play here.
> > >>
> > >> I managed to load the dtb file and the boot process stopped at the
> > >> same point. The problem is not here?
> > >
> > >
> > > Yea, I don't think so.
> > >
> > > Warner
> > >
> > >>
> > >> > Finally, the loader passes a large number of tables, etc to the
> kernel. It's quite possible that, for reasons still unknown, that data is
> wrong or if standard conforming not expected by the kernel. this leads to a
> crash before we've setup the console in the kernel which looks a lot like a
> hang.
> > >> >
> > >> > Warner
> > >> >
> > >> >
> > >> >>
> > >> >> > . . .
> > >> >> >
> > >> >> > Such also happens for stable/13, releng/13.* based installations
> > >> >> > as well --and likely others too.
> > >> >> >
> > >> >> > ACPI booting does not use Device Tree information but the
> messages
> > >> >> > are output anyway about the lack. Only if you know that the
> context
> > >> >> > is a Device Tree style of boot are the messages actually
> reporting
> > >> >> > a problem.
> > >> >> >
> > >> >> >
> > >> >> > ===
> > >> >> > Mark Millard
> > >> >> > marklmi at yahoo.com
> > >> >> >
> > >> >>
>