Re: rock64 verbose boot hangs

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Thu, 23 Sep 2021 17:46:04 UTC
On 20/09/2021 20:02, Emmanuel Vadot wrote:
> 
>   Hi Andriy,
> 
> On Sat, 18 Sep 2021 15:58:00 +0300
> Andriy Gapon <avg@FreeBSD.org> wrote:
> 
>>
>> Normal boot works every time, but with boot_verbose="YES" it hanged on all
>> attempts so far.
>>
>> Last messages on the console:
>> cpulist0: <Open Firmware CPU Group> on ofwbus0
>> cpu0: <Open Firmware CPU> on cpulist0
>> cpu0: Nominal frequency 600Mhz
>> cpufreq_dt0: <Generic cpufreq driver> on cpu0
>> cpufreq_dt0: 408.000 Mhz (950000 uV)
>> cpufreq_dt0: 600.000 Mhz (950000 uV)
>> cpufreq_dt0: 816.000 Mhz (1000000 uV)
>> cpufreq_dt0: 1008.000 Mhz (1100000 uV)
>> cpufreq_dt0: 1200.000 Mhz (1225000 uV)
>> cpufreq_dt0: 1296.000 Mhz (1300000 uV)
>> cpu1: <Open Firmware CPU> on cpulist0
>> cpu1: Nominal frequency 600Mhz
>> cpufreq_dt1: <Generic cpufreq driver> on cpu1
>>
>> The kernel is totally unresponsive after that.
> 
>   Can't reproduce here, I'm running 548a706608d with latest DTB and
> latest u-boot/atf
> 
>> Any suggestions on how to debug this?
> 
>   Not really sure how to start, that seems weird that the kernel will
> hang at the cpufreq attach but maybe try modifying the DTB to remove
> this node ?
>   Also did that happens with my recent commit on clock or was this the
> same before ?

Thank you and every one else who responded with information and suggestions.

Some extra details.
I've been having this problem since I've got this board 9 months ago.
It's been through several FreeBSD and U-Boot and stuff in the ESP partition 
upgrades.  And the problem was always present.

Now I've done more extensive testing with a couple of dozen reboots in a row and 
some additional debug prints (like, for example, DEBUG in subr_bus.c).

I actually see several variations of the problem.
Sometimes it's a hang, but sometimes it's a crash.
A hang can happen in different places and a crash can happen in different places 
too.
Some crashes happens during AP startup and the information I am getting is not 
very usable.
Some crashes happen during a driver probing when the bus code searches the hints 
memory space.  Those crashes look like a memory corruption happens there at random.

Given those variations plus some other differences that I have comparing to 
other Rock64 users (like needing special setup for eMMC and for the watchdog), I 
am inclined to think that the board I have has something special either in the 
hardware (like a different configuration via some fuses) or in the BootROM.
Even though the PCB has the standard markings.

And I would not be surprised about that (that it could be a customized 
production) as I got my Rock64-s via a special / unusual deal on Amazon. 
Iconikal and Recon Sentinal are keywords to search for, for those interested.
Some news articles from the time:
https://liliputing.com/2020/09/this-10-single-board-computer-is-faster-than-a-raspberry-pi-3.html
https://www.tomshardware.com/news/raspberry-pi-sized-iconikal-rockchip-sbc-only-dollar8-on-amazon

So, in the end, I still do not know what causes the verbose boot to hang / crash.
Maybe there is some (not fully working) watchdog that gets armed and disarmed by 
some hardware accesses and the verbose boot is too slow to complete in time.

Here is a small subset of panics and hangs that I saw:
https://people.freebsd.org/~avg/rock64-verbose-boot-panic.txt

-- 
Andriy Gapon