Re: u-boot debug, was: Re: U-boot on RPI3, sees disk but won't boot it

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Fri, 30 Sep 2022 00:27:42 UTC
On Thu, Sep 29, 2022 at 12:55:00PM -0700, Mark Millard wrote:
> On 2022-Sep-29, at 10:09, bob prohaska <fbsd@www.zefox.net> wrote:
> 
> > On Thu, Sep 29, 2022 at 09:38:04AM -0700, Mark Millard wrote:
> >> On 2022-Sep-29, at 08:19, bob prohaska <fbsd@www.zefox.net> wrote:
> >> 
> >>> 
> >>> I'll try to gather a few more examples.
> >>> 
> >> 
> >> FYI:
> >> 
> >> Building based on the latest patch-common_usb__storage.c
> >> that I sent (by itself) would avoid the huge number of
> >> sequences like:
> >> 
> >> usb_read: udev 0
> >> 
> > [snip]
> >> 
> >> and make the log files much smaller. Any other messages
> >> from usb_storage.c would still still be present.
> >> 
> > 
> > 
> > From my point of view the size isn't a serious problem, the
> > script command makes capturing output easy. It seemed more
> > valuable to capture maximum detail until the problem is 
> > better understood. If it's understood well enough now just
> > say the word and I'll recompile.
> > 
> > Size might become an issue, but there are larger capacity 
> > webservers available. I'm using www.zefox.net out of habit.
> 
> In part I was going by your indication to avoid "bloat",
> which I had guessed was a reference to the logging output
> instead of the size of u-boot.bin :
> 
> QUOTE
> The maximum logging level was 4 out of 7, fearing bloat.
> END QUOTE

No, actually I was then worried about getting a u-boot binary
too large to let me keep a few usable versions in /boot/msdos.
I had at the time absolutely no concept of output volume.

> 
> The first "usb_read:" is after the "1 Storage Device(s)
> found". Prior to that there is one example of a:
> 
> COMMAND phase
> DATA phase
> STATUS phase
> 
> sequence and one of a:
> 
> COMMAND phase
> STATUS phase
> COMMAND phase
> DATA phase
> STATUS phase
> 
> sequence. I'd be surprised if these 8 lines proved useful
> for the "0 Storage Device(s) found" issue.
> 
> For issues after "Storage Device(s) found" the status is
> less clear and the output would be more likely to
> contribute.
> 
> As long as you are happy with the size of the output, it
> should be fine either way for "Storage Device(s) found"
> observations.
>

There might be another reason for making changes. Apart from the
failure captured last night and posted, every graceful shutdown -r
has succeeded. The count is now 22. It's understood that debugging
code can alter the behavior of bugs, but I didn't expect it to
_fix_ an apparent bug. It's up to 22 successful reboots in a row. 

Ah, until now. The 23rd reboot got stuck at the outset, with
Syncing disks, vnodes remaining... Waiting (max 60 seconds) for system process `syncer' to stop... 2 1 0 0 done
All buffers synced.
Uptime: 1m16s
Resetting system ...
and then nothing. Plug-pulling time.
Came up just fine.

Is it possible the append a change to patch file
names, like "no.verbose.patch-common_usb__storage.c" as
a way to de-activate them with minimal typing? If it
won't fail to detect the disk some kind of incremental
backout of the diagnostics looks necessary, or at least
useful.  

> 
> On a different issue . . .
> 
> I'm also hoping that you will happen to also capture
> examples of automatic reset/start-over sequences. It
> may be that those requiring getting to the U-Boot prompt
> and typing commands. (Has it ever happened otherwise?)
>
Are you referring to cases where "assisted" boots fail?

There have been quite a few cases where on reboot u-boot
finds zero storage devices, but usb reset finds one device.
Typing run bootcmd_usb0 then results in a pause, a re-run
of u-boot and a successful boot. Sometimes the machine
silently hangs with the Device 0: message. Sometimes it
boots. 

I just captured an example now. Because of the file's
size it's at 
http://nemesis.zefox.com/~fbsd/ 
in a file named 

pelorus_console.txt

By my count the interesting part will be in the last 10% of the  
~9 MB file. 

For some reason the file is littered with non-ascii characters.
It can be viewed and searched with more, but the chrome browser
is only willing to download it. Attempts to view it fail with
"invalid byte sequence in conversion input". I'm posting it 
anyway because it's a rare event. If you've got any ideas for
cleaning it up so it'll display in a browser please say so.
Script seemed to work fine for the first capture, maybe the
huge size is the issue. Perhaps split would help, but I hope
there's something better.

Thanks for reading!

bob prohaska