Re: Still did not succeed to boot on Lenovo Yoga C630

Reply: Warner Losh : "Re: Still did not succeed to boot on Lenovo Yoga C630"
In reply to: Warner Losh : "Re: Still did not succeed to boot on Lenovo Yoga C630"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Hiroo Ono (小野寛生) <hiroo.ono+freebsd_at_gmail.com>
Date: Sat, 24 Dec 2022 00:48:57 UTC
The current status of FreeBSD 14-current on Lenovo Yoga C630 is as follows:

 1) Merging from OpenBSD's loader code made the loader boot apart from
3 points (#2 to 4 ).
 2) when comconsole->c_init() runs the 2nd time, it seems to freeze.
(might be C630 specific)
 3) SetVirtualAddressMap() in efi_do_vmap() freezes. (might also
affect other snapdragon systems like Microsoft Arm Developer Kit)
 4) The kernel is kicked but does not start.

1) is quite straightforward. What needs to be changed is
stand/efi/loader/arch/arm64/start.S.
For 2), I do not know what to do. Currently, I commented out
comconsole from struct console *consoles[] in stand/efi/loader/conf.c
as a workaround. Maybe, I should write a fault handler that helps
returning from the fault.
3), I dumped each memory map's VirtualStart and PhysicalStart. All
VirtualStart were 0. So overwriting VirutalStart by the value of
PhysicalStart and running SetVirtualAddressMap should work. But in
reality, it doesn't.
  OpenBSD does not use SetVirtualAddress for arm64 and Linux seems to
have abandoned it for arm64 in 2019.
    https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=4e46c2a956215482418d7b315749fb1b6c6bc224
  Maybe, we also can avoid SetVirtualAddressMap (by
efi_disable_vmap="YES"). In this case, (as you wrote) I should change
the kernel to treat VA==PA if VirtualStart is 0. (OpenBSD seems to do
so).
About 4), I am completely confused. In elf64_exec in
stand/efi/loader/arch/arm64/exec.c, the memory address that the kernel
was loaded is calculated as:
    entry = efi_translate(ehdr->e_entry);  // which becomes 0xb340000
and later kicked as:
    (*entry)(modulep);
I wrote that this calculation is doubtful, but it was right. I dumped
the data at the address 0xb340000 and compared it with the output of
objdump -D loader_lua.syms. It turned out that it matched with the
kernel's _start code in locore.S.
Putting some code that jumps to loader's ImageBase address at the
start of kernel's _start did not change anything, so I judged that the
kernel is not started at all.
The excerpt of the loader's memmap command output is as follows:
    Type           Physical           Virtual              #Pages     Attr
    LoaderData 0000b33ea000  000000000000  00000000  UC WC WT WB WP RP XP
    LoaderCode 0000bb909000  000000000000  000000d1 UC WC WT WB WP RP XP
From this output, I wonder if the memory attributes on Yoga C630 is
properly implemented, but as XP (exec protect) bit is on, I tried to
set it off by DXE services' SetMemoryAttributes() (with a lot of
transcription from the standards...).It succeeded, but the kernel
still did not run.

From this tweet:
https://twitter.com/canadianbryan/status/1598053941270679552 and its
replies, the Microsoft Arm Developer Kit seems to have similar
problem, so if somebody succeeded to run FreeBSD on it, please share
the information how to do it.

2022年12月19日(月) 5:33 Warner Losh <imp@bsdimp.com>:

>
>
>
> On Sun, Dec 18, 2022 at 4:31 AM Hiroo Ono (小野寛生) <hiroo.ono+freebsd@gmail.com> wrote:
>>
>> Hello,
>>
>> I investigated a little more.
>> I thought it was the kernel that did not run, but still it did not get
>> through the loader.
>
>
> Keep at it...
>
>>
>> The loader freezed in efi_do_vmap(), so I needed to add
>> efi_disable_vmap="YES" in loader.conf.
>
>
> No. The code for this needs to be fixed...  More on that in a second...
>
>>
>> At last, in elf64_exec, it tried to run (*entry)(modulep) where the
>> address entry is calculated from ehdr->e_entry by efi_translate() in
>> stand/efi/loader/copy.c. There,
>> ehdr->e_entry was 0xffff000000020000 (should be 0xffff000000000800,
>> but I modified a little) and
>> stage_offset was 0x10000b33e0000.
>> The sum of two which efi_translate() returns is 0x100000000b3400000.
>> It overflows uint64_t and becomes 0xb3400000.
>
>
> Yea, I think this is wrong.
>
>>
>> Currently, I do not understand well what the functions in
>> stand/efi/loader/copy.c do, and do not know how to workaround this
>> problem.
>
>
> So there's two things going on here. First is that on arm64 we should *NEVER* copy the
> kernel. It loads at a specific address and we jump to where it loaded in RAM (in your case,
> I think it should be stage_offset + (ehdr->e_entry - KERNBASE). Kernbase is 0xffff000000000000
> so we should jump to 0x10000brre0000 + 0x20000 (or maybe 0x800 is you suggest). The kernel
> code that's there should do some tricks to find out where it was loaded, turn on the MMU and
> then jump to the VA to continue starting up the kernel. The arm64 kernel is linked with a VA. Old amd64
> kernels expected to start at a fixed physical address, but the UEFI spec allows memory to be mapped anywhere
> which means it was recently switched to create a page table in the boot loader, then jump to the right
> VA, and use the page table to find what PA that is and use that to bootstrap the pmap. This works great on
> amd64, but sometimes goes astray on arm64 (though the way it does for you doesn't make sense
> to me). The amd64 code used to start at a PA, and that's what the 'copy' routine is supposed to do:
> copy the kernel down that fixed address and jump to it. I don't think we'll ever want that on arm64, though,
> and that might also be getting in your way (thought I'm doing this from memory w/o careful study of
> the code because it's fresh in my mind due to getting arm64 working with linuxboot).
>
> Also, vmap *MUST* be called in the boot loader. The trouble is, it assumes VA == PA, but that's not
> strictly true. If you boot via LinuxBoot, for example, it has a memory mapping that's not VA == PA so
> at least some parts of the kernel fail their VA == PA asserts. the vmap code in the loader currently
> blindly assumes VA == PA, but it should, IMHO, only do that if the VA from entry from the table from
> the get memory map call is 0. Today it blindly overwrites it. You might try changing that, and removing
> the bit in the kernel that checks for VA == PA and bails out if there's a mismatch. Here's the patch I'm
> temporarily using until I have the time to do more than a quick, superficial analysis of the issue:
>
> diff --git a/sys/arm64/arm64/efirt_machdep.c b/sys/arm64/arm64/efirt_machdep.c
> index 727c9c37f94d..075174d164d8 100644
> --- a/sys/arm64/arm64/efirt_machdep.c
> +++ b/sys/arm64/arm64/efirt_machdep.c
> @@ -193,8 +193,8 @@ efi_create_1t1_map(struct efi_md *map, int ndesc, int descsz)
>                         continue;
>                 if (p->md_virt != 0 && p->md_virt != p->md_phys) {
>                         if (bootverbose)
> -                               printf("EFI Runtime entry %d is mapped\n", i);
> -                       goto fail;
> +                               printf("EFI Runtime entry %d is mapped PA %#lx VA %#lx\n", i, p->md_phys, p->md_virt);
> +//                     goto fail;
>                 }
>                 if ((p->md_phys & EFI_PAGE_MASK) != 0) {
>                         if (bootverbose)
>
> clearly, not suitable for upstreaming, eh? And I have about 2 dozen commits in my queue ahead of that
> one that need refinement, review and upstreaming before I jump into this issue. It will be after the first
> of the year at least before I'll look at it since I just started my year-end vacation...
>
> Warner
>
>>
>> 2022年12月9日(金) 9:25 Hiroo Ono (小野寛生) <hiroo.ono+freebsd@gmail.com>:
>> >
>> > 2022年12月9日(金) 3:19 Warner Losh <imp@bsdimp.com>:
>> > >
>> > >
>> > >
>> > > On Wed, Dec 7, 2022 at 4:21 PM Hiroo Ono (小野寛生) <hiroo.ono+freebsd@gmail.com> wrote:
>> > >>
>> > >> 2022年12月7日(水) 1:18 Warner Losh <imp@bsdimp.com>:
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Tue, Dec 6, 2022 at 7:59 AM Hiroo Ono (小野寛生) <hiroo.ono+freebsd@gmail.com> wrote:
>> > >>
>> > >> >> OK, I (and the subject) was wrong. The loader boots, and show
>> > >> >> following log at last:
>> > >> >>
>> > >> >> Loading kernel...
>> > >> >> /boot/kernel/kernel text=0x2a8 text=0x8bcbf0 text=0x1f97ac
>> > >> >> data=0x1a6ac0 data=0x0+0x381000 syms=[0x8+0x11f6a0+0x8+0x1439ea]
>> > >> >> Loading configured modules...
>> > >> >> can't find '/boot/entropy'
>> > >> >> can't find '/etc/hostid'
>> > >> >> No valid device tree blob found!
>> > >> >> WARNING! Trying to fire up the kernel, but no device tree blob found!
>> > >> >> EFI framebuffer information
>> > >> >> addr, size        0x80400000, 0x7e9000
>> > >> >> dimensions     1920 x 1080
>> > >> >> stride             1920
>> > >> >> masks            0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
>> > >> >>
>> > >> >> and it stops here. No "<<BOOT>>" line is displayed.
>> > >> >> So, it seems that the kernel is loaded but could not be started.
>> > >> >
>> > >> >
>> > >> > There are several causes of this.
>> > >> >
>> > >> > Most likely is that the console is setup to go somewhere else. Though if you are on the video display and getting that framebuffer output, it won't not go there w/o some setting to override (say to force serial).
>> > >>
>> > >> In the loader, when comconsole->c_init() is called for the second
>> > >> time, the function does not return. (I commented out comconsole to
>> > >> make the loader work, but it is rather brutal and is not a proper
>> > >> solution).
>> > >> But the function parse_uefi_con_out() in stand/efi/loader/main.c
>> > >> always returns RB_SERIAL, so the loader tries to use the serial
>> > >> console.
>> > >
>> > >
>> > > I wonder why that is. Is this -current or -stable? I have a rather large backlog of MFC-able loader changes. If it is with stable, then it makes sense: I fixed a bug where parse_uefi_con_out would return serial if '8be4df61-93ca-11d2-aa0d-00e098032b8c-ConOut' was unset. Is it set?  Now we return Video console if we fine evidence there's a video console.
>> >
>> > It is stable/13.
>> > I tried 14-current, and the same change to loader was needed (merging
>> > OpenBSD's start.S and ldscript.arm64, and commenting out comconsole).
>> > Even with these change, the console defaults to serial, so I changed
>> > parse_uefi_con_out() to always return 0.
>> > Still, it stops at the same point. The kernel does not seem to boot.
>> >
>> > Running efi-show from the loader prompt did not show
>> > '8be4df61-93ca-11d2-aa0d-00e098032b8c-ConOut'
>> > The variable name containing 'ConOut' were:
>> >
>> > global NV,BS,RS ConOut =
>> > VenHw(9042A9DE-23DC-4A38-96FB-7ADED080516A),/VenHw(857A8741-0EEC-43BD-0482-27D14ED47D72)/Uart(115200,8,N,1)
>> > global NV,BS,RS ConOutDev =
>> > VenHw(9042A9DE-23DC-4A38-96FB-7ADED080516A),/VenHw(857A8741-0EEC-43BD-0482-27D14ED47D72)/Uart(115200,8,N,1)
>> >
>> > > Now, why it fails the second time, I don't know.
>> > >
>> > >>
>> > >> If a similar thing happens with the kernel, it may be stopping at
>> > >> serial console initialization.
>> > >
>> > >
>> > > The kernel doesn't use the EFI routines to initialize the serial console. But if the kernel is being told the wrong console, then it could also be booting just fine or almost fine and hitting some bug later.
>> > >
>> > >>
>> > >> > Next most likely is that FreeBSD doesn't cope well with having both FDT and ACPI information available. But since not DTB is being passed in (per that message) that's not likely at play here.
>> > >>
>> > >> I managed to load the dtb file and the boot process stopped at the
>> > >> same point. The problem is not here?
>> > >
>> > >
>> > > Yea, I don't think so.
>> > >
>> > > Warner
>> > >
>> > >>
>> > >> > Finally, the loader passes a large number of tables, etc to the kernel. It's quite possible that, for reasons still unknown, that data is wrong or if standard conforming not expected by the kernel. this leads to a crash before we've setup the console in the kernel which looks a lot like a hang.
>> > >> >
>> > >> > Warner
>> > >> >
>> > >> >
>> > >> >>
>> > >> >> > . . .
>> > >> >> >
>> > >> >> > Such also happens for stable/13, releng/13.* based installations
>> > >> >> > as well --and likely others too.
>> > >> >> >
>> > >> >> > ACPI booting does not use Device Tree information but the messages
>> > >> >> > are output anyway about the lack. Only if you know that the context
>> > >> >> > is a Device Tree style of boot are the messages actually reporting
>> > >> >> > a problem.
>> > >> >> >
>> > >> >> >
>> > >> >> > ===
>> > >> >> > Mark Millard
>> > >> >> > marklmi at yahoo.com
>> > >> >> >
>> > >> >>