Re: MMCCAM hang

From: Bjoern A. Zeeb <bzeeb-lists_at_lists.zabbadoz.net>
Date: Tue, 09 Jan 2024 16:48:53 UTC
On Tue, 9 Jan 2024, Bjoern A. Zeeb wrote:

> On Tue, 9 Jan 2024, Emmanuel Vadot wrote:
>
>> On Tue, 9 Jan 2024 11:36:32 +0100
>> Søren Schmidt <soren.schmidt@gmail.com> wrote:
>> 
>>>> On 28 Dec 2023, at 02.08, Warner Losh <imp@bsdimp.com> wrote:
>>>> On Wed, Dec 27, 2023, 4:55?PM Bjoern A. Zeeb 
>>>> <bzeeb-lists@lists.zabbadoz.net <mailto:bzeeb-lists@lists.zabbadoz.net>> 
>>>> wrote:
>>>>> Hi,
>>>>> 
>>>>> sdhci_fsl_fdt0: Desired SD/MMC freq: 50000000, actual: 50000000; base 
>>>>> 700000000 prescale 1 divisor 14
>>>>> GEOM: new disk sdda0
>>>>> sdda0 at sdhci_slot0 bus 0 scbus0 target 0 lun 0
>>>>> sdda0: Relative addr: 00000002
>>>>> Card features: <MMC Memory High-Capacity>
>>>>> Card random: unblocking device.
>>>>> GEOM: new disk sdda0boot0
>>>>> memory OCR: 00ff8080
>>>>> sdda0: Serial Number .......
>>>>> sdda0: MMCHC .................................. by 17 0x0000
>>>>> GEOM: new disk sdda0boot1
>>>>> uhub0: 2 ports with 2 removable, self powered
>>>>> 
>>>>> at which point basically anything hangs.  In auto-boot it is
>>>>> before/during file-system checks.
>>>>> In single user mode camcontrol devlist will show sdda0
>>>>> but
>>>>> 
>>>>> root@:/ # gpart show sdda0
>>>>> load: 6.06  cmd: gpart 24 [g_waitfor_event] 1.28r 0.00u 0.00s 0% 2088k
>>>>> {forever}
>>>>> 
>>>>> 
>>>>> Unclear at which point I broke to debugger and this is where it seems to
>>>>> hang:
>>>>> 
>>>>> db> trace 100088
>>>>> Tracing pid 4 tid 100088 td 0xffff0000dc527000
>>>>> ipi_stop() at ipi_stop+0x34
>>>>> arm_gic_v3_intr() at arm_gic_v3_intr+0xe4
>>>>> intr_irq_handler() at intr_irq_handler+0x80
>>>>> handle_el1h_irq() at handle_el1h_irq+0x14
>>>>> --- interrupt
>>>>> spinlock_exit() at spinlock_exit+0x44
>>>>> callout_reset_sbt_on() at callout_reset_sbt_on+0x210
>>>>> sdhci_cam_action() at sdhci_cam_action+0x284
>>>>> xpt_run_devq() at xpt_run_devq+0x4c8
>>>>> xpt_action_default() at xpt_action_default+0x470
>>>>> sddastart() at sddastart+0x1bc
>>>>> xpt_run_allocq() at xpt_run_allocq+0xa8
>>>>> xpt_done_process() at xpt_done_process+0x610
>>>>> xpt_done_td() at xpt_done_td+0x1a8
>>>>> fork_exit() at fork_exit+0x8c
>>>>> fork_trampoline() at fork_trampoline+0x18
>>>>> 
>>>>> 
>>>>> Anyone an idea?
>>>> 
>>>> 
>>>> 
>>>> Looks like deadlock with another thread. Anybody else in the time keeping 
>>>> / callout code?
>>> 
>>> I think this is related to the MMC driver having issues (MMCCAM or not).
>>> If I try to use a MMC sdcard on any of my rk35X8 boards as the disk device 
>>> it will eventually hang on first access to the MMC controlled media.
>>> I thought I had an issue here with my dev setup but clealy I'm not alone 
>>> :)
>> 
>> SDCard on RK356X don't use sdhci but dwmmc so it's not related to what
>> bz@ is seeing.
>> That being said I have no problem using dwmmc as the root device on my
>> nanopi r5s or quartz64.
>
> For what is worth my current feeling seems to be it is related to the
> boot[01] disks on the eMMC.

okay, I quickly tried the funny bit to skip them (no disk created).
Th errors from the sdda stopped after about 25-ish times.  I didn't
check the commands if they were the same.

But now it looks like this:

# ls -l /dev/*da*
crw-r-----  1 root operator 0x50 Dec 19 10:32 /dev/nda0
crw-r-----  1 root operator 0x55 Dec 19 10:32 /dev/sdda0
# gpart show sdda0 
gpart: No such geom: sdda0.
# gpart show nda0 
gpart: No such geom: nda0.
# gpart create -s GPT -l 67108864 sdda0			# -l is from D33168 and not the issue here
sdhci_fsl_fdt0-slot0: sdhci_cam_request: ccb 0 ccb 0xffffa0000e440800 curcmd 0 req 0
sdhci_fsl_fdt0-slot0: sdhci_start_command: curcmd 0 cmd 0xffffa0000e4408d0 cmd_done 1 flags 0x000035
sdhci_fsl_fdt0-slot0: sdhci_req_done: curcmd 0xffffa0000e4408d0 ccb 0xffffa0000e440800 cmd_done 1 cmd.flags 0x000035 cmd.error 1
gpart: Input/output error
# gpart create -s GPT sdda0
gpart: geom 'sdda0': File exists
# gpart show sdda0 
=>   131104  122011576  sdda0  GPT  (58G)
      131104  122011576         - free -  (58G)
# shutdown -r now
...
Login:
...
# gpart show sdda0
gpart: No such geom: sdda0.


Something obviously non-obvious must be strange here.  I should try
another device though I know this works under Linux.

Should I try legacy mmc again?



> I see geom tasting on boot0 but the consumer for boot1 never shows up in
> ddb> show geom
> I disabled the graid and then the same observation moved on to gpart.
>
> Also once the error starts the fsl is never ecovering; eventually the
> ccb and curcmd stay the same pointers even.  It seems to just roto-tile,
> which makes me wonder if some error propagation is missing/gone.
>
> If I enable kern.cam.boot_delay="30000" and have my root on an md(4)
> I get to Login: -- strangely but then the nda and the sdda show up and
> then typing gpart show or whatever else geom-ish a few commands go
> through and then we are in the error again.
>
> I haven't been able to dig much further; no other locks held in debug
> kernels (just a malloc WAITOK complaint early on during "attach").
>
> I'd still be happy to hear for more possible cases; especially if other
> sdhci devices are working with MMCCAM?  It kept me from doing the actual
> work I wanted to do with mmccam over the holidays sadly.
>
>
> Feature request: somehow I wished we could enable/disable FDT/OFW based
> devices like we do for PCI with devctl ... can we?  Like have it
> disabled in FDT at boot but later enable/probe/attach...
>
>
> With SD cards and dwmmc I had mostly mixed results in the past; they
> worked for quite a while but after 600 days of uptime they were gone
> (problem probably long fixed but I am at 900 days now for the last
> running RK device and then won't bother for a long while I hope).
>
>

-- 
Bjoern A. Zeeb                                                     r15:7