Under usefdt=1 type of context for G5 "4 core" (system total): unload then boot /boot/kernel/kernel fails very early with: panic: Translation map has incorrect cell count (260/256)
Mark Millard
marklmi at yahoo.com
Wed Jan 30 00:30:19 UTC 2019
On 2019-Jan-28, at 23:42, Mark Millard <marklmi at yahoo.com> wrote:
> [The loader here has been modified to always do the usefdt=1 code.
> The modern VM_MAX_KERNEL_ADDRESS is in the kernel.]
>
> In trying to boot for other experiments I tried at the loader prompt:
>
> unload
> load /boot/kernel/kernel
> boot
>
> and:
>
> unload
> boot -v /boot/kernel/kernel
>
> Such combinations produced (typed from a picture of the screen,
> the missing text was really missing):
>
> GDB: no debug ports present
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> panic: Translation map has incorrect cell count (260/256)
Note that 260 is a multiple of 5 and 4 but 256 is not a multiple
of 5. Being a multiple of 5 is important for "acells==2" contexts.
Going through some of the code, first noting some context:
struct ofw_map {
cell_t om_va;
cell_t om_len;
uint64_t om_pa;
cell_t om_mode;
};
This has 4 fields but one appears to be bigger than cell_t
if I understand right. This gets into if:
OF_getencprop(OF_finddevice("/"), "#address-cells", &acells,
sizeof(acells));
indicates 2 cells for om_pa (via acells==2) or just 1 cell. So
when acells==2 there are 5 cells per, otherwise there are 4.
Now to the routine in question:
static void
moea64_add_ofw_mappings(mmu_t mmup, phandle_t mmu, size_t sz)
{
struct ofw_map translations[sz/(4*sizeof(cell_t))]; /*>= 4 cells per */
For acells==2 the above potentially over allocates translations some.
Later it turns out that if it does that may prevent out of bounds
writing into translations .
pcell_t acells, trans_cells[sz/sizeof(cell_t)];
It turns out that sz/sizeof(cell_t) at this point ended up as a
multiple of 4 but not of 5, despite acells==2 being the case.
Later this leads to out of bounds accesses for trans_cells .
struct pvo_entry *pvo;
register_t msr;
vm_offset_t off;
vm_paddr_t pa_base;
int i, j;
bzero(translations, sz);
OF_getencprop(OF_finddevice("/"), "#address-cells", &acells,
sizeof(acells));
This ended up with acells==2 --so 5 cells per translation entry.
if (OF_getencprop(mmu, "translations", trans_cells, sz) == -1)
panic("moea64_bootstrap: can't get ofw translations");
For acells==2: sz/sizeof(cell_t) not being a multiple of 5 means
either some unused trans_cell entries or access outside the
trans_cells array in the loop below, depending on the loop test
used.
CTR0(KTR_PMAP, "moea64_add_ofw_mappings: translations");
sz /= sizeof(cell_t);
sz here ended up as 256, not a multiple of 5.
for (i = 0, j = 0; i < sz; j++) {
The i < sz test is testing the first of the 4 or 5 being
below sz instead of testing the last of the 4 or 5. This
leads to out of bounds accesses into trans_cells below for
the 256 with acells==2 case. It also means that the last
translations[j] ends up with some garbage content.
translations[j].om_va = trans_cells[i++];
translations[j].om_len = trans_cells[i++];
translations[j].om_pa = trans_cells[i++];
if (acells == 2) {
translations[j].om_pa <<= 32;
translations[j].om_pa |= trans_cells[i++];
}
translations[j].om_mode = trans_cells[i++];
}
The loop test for the above loop just looks wrong to me for avoid
bad accesses.
KASSERT(i == sz, ("Translations map has incorrect cell count (%d/%zd)",
i, sz));
Having sz [ the original sz/sizeof(cell_t) ] not be a multiple of 5
for acells==2 would fail this test even if the loop stopped without
going out of bounds on trans_cells (so i==255 instead of i==260).
For the original sz figure having sz/sizeof(cell_t) == 256, the original
sz figure was not a multiple of 5 [presuming sizeof(cell_t) is not a
multiple of 5].
My guess is that the original sz value was wrong after the unload and
boot (reloading) and the loop structure should avoid going out of bounds
on trans_cells anyway.
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> 0xc000000000f9e kdb_backtrace+0x60
> 0xc000000000f9e vpanic+0x258
> 0xc000000000f9e panic+0x3c
> 0xc000000000f9e moea64_late_bootstrap+0x2c0
> 0xc000000000f9e moea64_bootstrap_native+0x20c
> 0xc000000000f9e pmap_bootstrap_0xc8
> 0xc000000000f9e powerpc_init+0x440
> 0xc000000000f9efc0: at
>
> (And that is all.)
The below tried to fix the loop to avoid out of bounds
trans_cells accesses and to allow my context to not panic
for the "multiple of 4 but not of 5" issue for the unload
then reload/boot sequence. The bias was for the code
changes to be easy to follow, given the earlier commentary.
# svnlite diff /usr/src/sys/powerpc/aim/mmu_oea64.c | more Index: /usr/src/sys/powerpc/aim/mmu_oea64.c
===================================================================
--- /usr/src/sys/powerpc/aim/mmu_oea64.c (revision 341836)
+++ /usr/src/sys/powerpc/aim/mmu_oea64.c (working copy)
@@ -502,7 +502,7 @@
register_t msr;
vm_offset_t off;
vm_paddr_t pa_base;
- int i, j;
+ int i, istep, j;
bzero(translations, sz);
OF_getencprop(OF_finddevice("/"), "#address-cells", &acells,
@@ -512,7 +512,8 @@
CTR0(KTR_PMAP, "moea64_add_ofw_mappings: translations");
sz /= sizeof(cell_t);
- for (i = 0, j = 0; i < sz; j++) {
+ istep = (acells == 2) ? 5 : 4;
+ for (i = 0, j = 0; i+istep-1 < sz; j++) {
translations[j].om_va = trans_cells[i++];
translations[j].om_len = trans_cells[i++];
translations[j].om_pa = trans_cells[i++];
@@ -522,7 +523,7 @@
}
translations[j].om_mode = trans_cells[i++];
}
- KASSERT(i == sz, ("Translations map has incorrect cell count (%d/%zd)",
+ KASSERT(i+sz%istep == sz, ("Translations map has incorrect cell count (%d/%zd)",
i, sz));
sz = j;
I do not claim the KASSERT should officially change but I expect that the
loop should.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-ppc
mailing list