Re: u-boot debug, was: Re: U-boot on RPI3, sees disk but won't boot it

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 30 Sep 2022 02:14:44 UTC
On 2022-Sep-29, at 17:27, bob prohaska <fbsd@www.zefox.net> wrote:

> On Thu, Sep 29, 2022 at 12:55:00PM -0700, Mark Millard wrote:
>> On 2022-Sep-29, at 10:09, bob prohaska <fbsd@www.zefox.net> wrote:
>> 
>>> . . .
>> 
>> In part I was going by your indication to avoid "bloat",
>> which I had guessed was a reference to the logging output
>> instead of the size of u-boot.bin :
>> 
>> QUOTE
>> The maximum logging level was 4 out of 7, fearing bloat.
>> END QUOTE
> 
> No, actually I was then worried about getting a u-boot binary
> too large to let me keep a few usable versions in /boot/msdos.
> I had at the time absolutely no concept of output volume.

Ahh, okay.

It turns out your capture of an automatic reset needed the
read activity details. It has:

. . .
6 USB Device(s) found
       scanning usb for storage devices... 1 Storage Device(s) found
U-Boot> run bootcmd_usb0

Device 0: 
usb_read: udev 0

usb_read: dev 0 startblk 0, blccnt 1 buffer 3af42c00
read10: start 0 blocks 1
COMMAND phase
DATA phase
usb_bulk_msg error status 0
BBB_reset
usb_control_msg: request: 0xFF, requesttype: 0x21, value 0x0 index 0x0 length 0x0
BBB_reset result -110: status 0 reset
usb_control_msg: request: 0x1, requesttype: 0x2, value 0x0 index 0x81 length 0x0
BBB_reset result -22: status 0 clearing IN endpoint
usb_control_msg: request: 0x1, requesttype: 0x2, value 0x0 index 0x2 length 0x0
BBB_reset result -22: status 0 clearing OUT endpoint
BBB_reset done
Read ERROR
COMMAND phase
DATA phase





Raspberry Pi Bootcode

Read File: config.txt, 265

Read File: start.elf, 2952960 (bytes)

Read File: fixup.dat, 7314 (bytes)

MESS:00:00:21.193325:0: brfs: File read: /mfs/sd/config.txt
MESS:00:00:21.197789:0: brfs: File read: 265 bytes
. . .



So, apparently, the RPi3B (not just USB) reset during a
phase of a Read's activity over the USB bus. It never
got as far as reporting the STATUS phase of the read.
I've no clue if it is some form of watchdog based reset
or a power based reset. But one of the two seem likely.

I'm not familiar with the result codes, endpoint clearing,
or the like. But at some point I'll try to figure out what
-110 and -22 have for names. That might give a clue.



>> . . .
>> 
> 
> There might be another reason for making changes. Apart from the
> failure captured last night and posted, every graceful shutdown -r
> has succeeded. The count is now 22. It's understood that debugging
> code can alter the behavior of bugs, but I didn't expect it to
> _fix_ an apparent bug. It's up to 22 successful reboots in a row. 

Serial console output takes signficant time and can significantly
change the timing of what is before vs. after the output request
in the code. The extra time can make various type of problems
either more or less likely to be observed.

> Ah, until now. The 23rd reboot got stuck at the outset, with
> Syncing disks, vnodes remaining... Waiting (max 60 seconds) for system process `syncer' to stop... 2 1 0 0 done
> All buffers synced.
> Uptime: 1m16s
> Resetting system ...
> and then nothing. Plug-pulling time.
> Came up just fine.
> 
> Is it possible the append a change to patch file
> names, like "no.verbose.patch-common_usb__storage.c" as
> a way to de-activate them with minimal typing? If it
> won't fail to detect the disk some kind of incremental
> backout of the diagnostics looks necessary, or at least
> useful.  

One way to deactivate patch files that are not explicit
in the Makefile is to move them one level up into ../
( so out of files/ ), as I understand.

Another way is to change names so they do not start with:

patch-

as I understand.

>> 
>> On a different issue . . .
>> 
>> I'm also hoping that you will happen to also capture
>> examples of automatic reset/start-over sequences. It
>> may be that those requiring getting to the U-Boot prompt
>> and typing commands. (Has it ever happened otherwise?)
>> 
> Are you referring to cases where "assisted" boots fail?
> 
> There have been quite a few cases where on reboot u-boot
> finds zero storage devices, but usb reset finds one device.
> Typing run bootcmd_usb0 then results in a pause, a re-run
> of u-boot and a successful boot.

The explicit evidence so far is that "a re-run of u-boot"
is better described as "the RPi3B rebooting", starting with
execution of the RPi3B firmware and later again getting to
the stage of u-boot.bin being loaded and started.

> Sometimes the machine
> silently hangs with the Device 0: message. Sometimes it
> boots. 
> 
> I just captured an example now. Because of the file's
> size it's at 
> http://nemesis.zefox.com/~fbsd/ 
> in a file named 
> 
> pelorus_console.txt
> 
> By my count the interesting part will be in the last 10% of the  
> ~9 MB file. 
> 
> For some reason the file is littered with non-ascii characters.

So far it looks like a binary capture that has every byte
sent out in it, including escape characters (ASCII 27
decimal) that start escape sequences for controlling a
display. Also, every Carriage Return (ASCII 13 decimal)
and every line feed is explicit.

There is some text at the beginning (after the first
"login:" that looks to be junk so all the following bytes
up to the "Password:" after the junk should probably be
replaced by one newline or by nothing.

There is a sequence (using ESC to indicate an escape
character):

ESC[?25h

that occurs 73225 times in the file. In my normal binary
captures of one boot sequence I get somewhat over 4000
of them as I remember. There is interesting text between
the sequences, commonly one letter of some overall text.
I delete all instances of the escape sequence to read
the text with the letters next to each other. The delete
is via a find-and-replace-all that replaces them with an
empty string.

There are 522 more escape characters that were not the
beginning of that specific escape sequence. I've not
analyzed all the sequences so deleting the 522 ESC
might leave some not-normally-displayed text around.

The Carriage Returns, 344121 of them, can be similarly
removed.

There are some ASCII 8 (decimal) characters in the file
as well, 1348 of them. ASCII 9 (decimal), 377 of them.

I think that is all the characters that are less than
a space in the ASCII encoding.

But I use some text editors that are good at dealing with
such content (but I do not normally deal with such via the
headless or plain console FreeBSD machines).

> It can be viewed and searched with more, but the chrome browser
> is only willing to download it. Attempts to view it fail with
> "invalid byte sequence in conversion input". I'm posting it 
> anyway because it's a rare event. If you've got any ideas for
> cleaning it up so it'll display in a browser please say so.
> Script seemed to work fine for the first capture, maybe the
> huge size is the issue. Perhaps split would help, but I hope
> there's something better.


===
Mark Millard
marklmi at yahoo.com