Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?
- Reply: Kornel_Dulęba : "Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?"
- In reply to: Mark Millard via freebsd-arm : "Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 12 Dec 2021 21:38:19 UTC
On 2021-Dec-12, at 00:59, Mark Millard <marklmi@yahoo.com> wrote:
> On 2021-Dec-12, at 00:29, Mark Millard via freebsd-arm <freebsd-arm@freebsd.org> wrote:
>
>> On 2021-Dec-11, at 16:19, Mark Millard <marklmi@yahoo.com> wrote:
>>
>>> [I've cut out the history: just presenting some new evidence.]
>>>
>>> First, a little context from getting to the db> prompt.
>>>
>>> db> ps
>>> pid ppid pgrp uid state wmesg wchan cmd
>>> 18 0 0 0 DL syncer 0xffff000000eca5a8 [syncer]
>>> 17 0 0 0 DL vlruwt 0xffffa00007d2ea60 [vnlru]
>>> 16 0 0 0 DL (threaded) [bufdaemon]
>>> 100089 D qsleep 0xffff000000ec9478 [bufdaemon]
>>> 100092 D - 0xffff000000c11100 [bufspacedaemon-0]
>>> 100093 D - 0xffff000000c21680 [bufspacedaemon-1]
>>> 9 0 0 0 DL psleep 0xffff000000ef0650 [vmdaemon]
>>> 8 0 0 0 DL (threaded) [pagedaemon]
>>> 100087 D psleep 0xffff000000ee2b38 [dom0]
>>> 100094 D launds 0xffff000000ee2b44 [laundry: dom0]
>>> 100095 D umarcl 0xffff0000007b38d8 [uma]
>>> 7 0 0 0 DL mmcsd d 0xffffa00007b72e00 [mmcsd0boot1: mmc/sd]
>>> 6 0 0 0 DL mmcsd d 0xffffa00007b71300 [mmcsd0boot0: mmc/sd]
>>> 5 0 0 0 DL mmcreq 0xffff00009b5d0710 [mmcsd0: mmc/sd card]
>>> 4 0 0 0 DL - 0xffff000000ccc020 [rand_harvestq]
>>> 15 0 0 0 DL (threaded) [usb]
>>> . . .
>>>
>>> and "mmcreq" is from the while loop in:
>>>
>>> static int
>>> mmc_wait_for_req(struct mmc_softc *sc, struct mmc_request *req)
>>> {
>>>
>>> req->done = mmc_wakeup;
>>> req->done_data = sc;
>>> if (__predict_false(mmc_debug > 1)) {
>>> device_printf(sc->dev, "REQUEST: CMD%d arg %#x flags %#x",
>>> req->cmd->opcode, req->cmd->arg, req->cmd->flags);
>>> if (req->cmd->data) {
>>> printf(" data %d\n", (int)req->cmd->data->len);
>>> } else
>>> printf("\n");
>>> }
>>> MMCBR_REQUEST(device_get_parent(sc->dev), sc->dev, req);
>>> MMC_LOCK(sc);
>>> while ((req->flags & MMC_REQ_DONE) == 0)
>>> msleep(req, &sc->sc_mtx, 0, "mmcreq", 0);
>>> MMC_UNLOCK(sc);
>>> if (__predict_false(mmc_debug > 2 || (mmc_debug > 0 &&
>>> req->cmd->error != MMC_ERR_NONE)))
>>> device_printf(sc->dev, "CMD%d RESULT: %d\n",
>>> req->cmd->opcode, req->cmd->error);
>>> return (0);
>>> }
>>>
>>> So it appears that the error report:
>>>
>>> mmcsd0: Error indicated: 4 Failed
>>>
>>> ends up associated with (req->flags & MMC_REQ_DONE) == 0 staying
>>> true in the above code: an unbounded loop with MMC_LOCK(sc) active.
>>> The "4" in the error report seems to be from:
>>>
>>> #define MMC_ERR_FAILED 4
>>>
>>> It looks like there are some problems with handling errors, problems
>>> such that it gets stuck looping (no panic, no progress).
>>>
>>> That seems to be separate from why the MMC_ERR_FAILED was generated
>>> in the first place. So: 2 problems, not just one. Thus it may be a
>>> good context for tackling the looping problem with a known example
>>> failure to look at.
>>>
>>>
>>>
>>> Just for reference, I tried "boot -v" with debug.verbose_sysinit=1 in place,
>>> just to capture and report the tail of the output for the boot failure.
>>>
>>> . . .
>>> subsystem f000000
>>> release_aps(0)... Release APs...done
>>> done.
>>> intr_irq_shuffle(0)... Trying to mount root from ufs:/dev/gpt/Rock64root []...
>>> done.
>>> netisr_start(0)... done.
>>> taskqgroup_bind_softirq(0)... done.
>>> GEOM: new disk mmcsd0
>>> GEOM: new disk mmcsd0boot0
>>> GEOM: new disk mmcsd0boot1
>>> smp_after_idle_runnable(0)... done.
>>> taskqgroup_bind_if_config_tqg(0)... done.
>>> taskqgroup_bind_if_io_tqg(0)... done.
>>> tmr_setup_user_access(0)... done.
>>> subsystem f000001
>>> mmcsd0: Error indicated: 4 Failed
>>> epoch_init_smp(0)... done.
>>> subsystem f100000
>>> racctd_init(0)... done.
>>> subsystem fffffff
>>> start_periodic_resettodr(0)... done.
>>> oktousecallout(0)... done.
>>> clknode_finish(0)... Unresolved linked clock found: hdmi_phy
>>> Unresolved linked clock found: usb480m_phy
>>> done.
>>> regulator_constraint(0)... done.
>>> regulator_shutdown(0)... regulator: shutting down unused regulators
>>> regulator: shutting down vcc_sd... busy
>>> done.
>>> uhub0: 1 port with 1 removable, self powered
>>> uhub2: 2 ports with 2 removable, self powered
>>> uhub3: 1 port with 1 removable, self powered
>>> uhub1: 1 port with 1 removable, self powered
>>> ugen4.2: <Samsung PSSD T7 Touch> at usbus4
>>> umass0 on uhub2
>>> umass0: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 1> on usbus4
>>> umass0: SCSI over Bulk-Only; quirks = 0x0000
>>> umass0:0:0: Attached to scbus0
>>> pass0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>>> pass0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device
>>> pass0: Serial Number REPLACED
>>> pass0: 400.000MB/s transfers
>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>>> da0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device
>>> da0: Serial Number REPLACED
>>> da0: 400.000MB/s transfers
>>> da0: 953869MB (1953525168 512 byte sectors)
>>> da0: quirks=0x2<NO_6_BYTE>
>>> da0: Delete methods: <NONE(*),ZERO>
>>> random: unblocking device.
>>>
>>> No more output after that.
>>
>> As for why MMC_ERR_FAILED results, the following code diff is
>> intended to suggest what I think may be incomplete about sticking
>> to what the device-specific code supports vs. does not support
>> (not supporting HS200 here). The code does compile in my context
>> but is untested.
>
> It is now tested (at least to be a useful hack): no longer am I
> running an older 1400042 kernel. For reference,
>
> # uname -apKU
> FreeBSD Rock64_RPi_4_3_2v1p2 14.0-CURRENT FreeBSD 14.0-CURRENT #18 main-n251456-22c4ab6cb015-dirty: Sun Dec 12 00:34:53 PST 2021 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA53-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA53 arm64 aarch64 1400043 1400043
>
> And it reports during the boot (other than the "REPLACED"):
>
> mmcsd0: 125GB <MMCHC DJNB4R 0.7 SN REPLACED MFG 06/2016 by 21 0x0000> at mmc0 52.0MHz/8bit/1016-block
>
> So it no longer sets up a mode that the rk3328-specific-code does not
> actually support.
>
> (Nothing that I've done here deals with the looping issue when there
> is a MMC_ERR_FAILED or the like.)
>
>> The email handling may mess up some leading
>> whitespace --but, again, I'm only trying to suggest a type of
>> change.
>>
>> # git -C /usr/main-src/ diff /usr/main-src/sys/dev/mmc
>> diff --git a/sys/dev/mmc/mmc.c b/sys/dev/mmc/mmc.c
>> index 9c73dfd57ce0..dffd1c382684 100644
>> --- a/sys/dev/mmc/mmc.c
>> +++ b/sys/dev/mmc/mmc.c
>> @@ -59,6 +59,7 @@ __FBSDID("$FreeBSD$");
>> #include <sys/param.h>
>> #include <sys/systm.h>
>> #include <sys/kernel.h>
>> +#include <sys/kobj.h>
>> #include <sys/malloc.h>
>> #include <sys/lock.h>
>> #include <sys/module.h>
>> @@ -1512,6 +1513,8 @@ mmc_timing_to_string(enum mmc_bus_timing timing)
>> static bool
>> mmc_host_timing(device_t dev, enum mmc_bus_timing timing)
>> {
>> + kobjop_desc_t kobj_desc;
>> + kobj_method_t *kobj_method;
>> int host_caps;
>>
>> host_caps = mmcbr_get_caps(dev);
>> @@ -1543,14 +1546,37 @@ mmc_host_timing(device_t dev, enum mmc_bus_timing timing)
>> case bus_timing_mmc_ddr52:
>> return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_DDR52));
>> case bus_timing_mmc_hs200:
>> - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_120) ||
>> - HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_180));
>> case bus_timing_mmc_hs400:
>> - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_120) ||
>> - HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_180));
>> case bus_timing_mmc_hs400es:
>> - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400 |
>> - MMC_CAP_MMC_ENH_STROBE));
>> + /*
>> + * Disable eMMC modes that require use of
>> + * MMC_SEND_TUNING_BLOCK_HS200 to set things up if either the
>> + * tune or re-tune method is the default NULL implementation.
>> + */
>> + kobj_desc = &mmcbr_tune_desc;
>> + kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL,
>> + kobj_desc);
>> + if (kobj_method == &kobj_desc->deflt)
>> + return (false);
>> + kobj_desc = &mmcbr_retune_desc;
>> + kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL,
>> + kobj_desc);
>> + if (kobj_method == &kobj_desc->deflt) {
>> + return (false);
>> + }
>> +
>> + /*
>> + * Otherwise track the host capabilities.
>> + */
>> + if (timing == bus_timing_mmc_hs200)
>> + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_120) ||
>> + HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_180));
>> + if (timing == bus_timing_mmc_hs400)
>> + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_120) ||
>> + HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_180));
>> + if (timing == bus_timing_mmc_hs400es)
>> + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400 |
>> + MMC_CAP_MMC_ENH_STROBE));
>> }
>>
>> #undef HOST_TIMING_CAP
>>
>>
>> In other words: have mmc_host_timing avoid returning true for some
>> combinations that definitely do not have sufficient software support
>> present at the time. (So far as I can tell, the rk3328's get the
>> NULL-implementations as things are.)
>>
>> I expect that this sort of thing would go back to using
>> MMC_CAP_MMC_DDR52 for the rk3328's, as an example. Working, but in a
>> slower mode, the same mode as FreeBSD was previously using.
>>
>> A possible incompleteness in the suggestion is that there is also a
>> drive-strength setting involved. If that also had "kobj" interfacing
>> and NULL-implementation possibilities, then in the future there would
>> be more to test for possibly forcing return-false than I did above.
>>
>> Hopefully this sort of thing would help, possibly more than just for
>> rk3328's.
>
>
As for what was happening without my patch . . .
sys/dev/mmc/mmcbr_if.m defines:
static int
null_retune(device_t brdev __unused, device_t reqdev __unused,
bool reset __unused)
{
return (0);
}
static int
null_tune(device_t brdev __unused, device_t reqdev __unused,
bool hs400 __unused)
{
return (0);
}
. . .
#
# Called by the mmcbus with the bridge claimed to execute initial tuning.
#
METHOD int tune {
device_t brdev;
device_t reqdev;
bool hs400;
} DEFAULT null_tune;
#
# Called by the mmcbus with the bridge claimed to execute re-tuning.
#
METHOD int retune {
device_t brdev;
device_t reqdev;
bool reset;
} DEFAULT null_retune;
. . .
It is these success-reporting no-op routines that were being
used to attempt the tuning: so there was no tuning done.
The code that I added detects that these routines would be
used and avoids allowing contexts that would involve putting
them to use with HS200 mode.
I'll note that there is another such null_* routine that the
code (even with my patch) does not deal with avoiding the use
of:
. . .
static int
null_switch_vccq(device_t brdev __unused, device_t reqdev __unused)
{
return (0);
}
. . .
#
# Called by the mmcbus to switch the signaling voltage (VCCQ).
#
METHOD int switch_vccq {
device_t brdev;
device_t reqdev;
} DEFAULT null_switch_vccq;
. . .
/usr/main-src/sys/dev/sdhci/sdhci.c has somewhat analogous code for
somewhat analogous null_* routines. null_set_uhs_timing for that is
from sys/dev/sdhci/sdhci_if.m (but the other two are again the above
null_tune and null_retune routines, so not repeated here):
. . .
static void
null_set_uhs_timing(device_t brdev __unused,
struct sdhci_slot *slot __unused)
{
}
. . .
METHOD void set_uhs_timing {
device_t brdev;
struct sdhci_slot *slot;
} DEFAULT null_set_uhs_timing;
. . .
sdhci_init_slot(device_t dev, struct sdhci_slot *slot, int num)
in sdhci.c looks like (in part):
. . .
/*
* Disable UHS-I and eMMC modes if the set_uhs_timing method is the
* default NULL implementation.
*/
kobj_desc = &sdhci_set_uhs_timing_desc;
kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL,
kobj_desc);
if (kobj_method == &kobj_desc->deflt)
host_caps &= ~(MMC_CAP_UHS_SDR12 | MMC_CAP_UHS_SDR25 |
MMC_CAP_UHS_SDR50 | MMC_CAP_UHS_DDR50 | MMC_CAP_UHS_SDR104 |
MMC_CAP_MMC_DDR52 | MMC_CAP_MMC_HS200 | MMC_CAP_MMC_HS400);
#define SDHCI_CAP_MODES_TUNING(caps2) \
(((caps2) & SDHCI_TUNE_SDR50 ? MMC_CAP_UHS_SDR50 : 0) | \
MMC_CAP_UHS_DDR50 | MMC_CAP_UHS_SDR104 | MMC_CAP_MMC_HS200 | \
MMC_CAP_MMC_HS400)
/*
* Disable UHS-I and eMMC modes that require (re-)tuning if either
* the tune or re-tune method is the default NULL implementation.
*/
kobj_desc = &mmcbr_tune_desc;
kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL,
kobj_desc);
if (kobj_method == &kobj_desc->deflt)
goto no_tuning;
kobj_desc = &mmcbr_retune_desc;
kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL,
kobj_desc);
if (kobj_method == &kobj_desc->deflt) {
no_tuning:
host_caps &= ~(SDHCI_CAP_MODES_TUNING(caps2));
}
. . .
What I've done in my patch is analogous to what the the code shown
after the #define SDHCI_CAP_MODES_TUNING above does, translated to
fit the mmc's pre-existing code structure.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)