HEADS UP: ... [Better info for __syncicache exceptions at a dcbst: %DAR holding address not covered by PTEG's]

Mon Feb 9 17:37:23 UTC 2015

objdump -h /boot/kernel10.1S/kernel shows a hole in the memory map between 25 (kern_conf) and 26 (.data) below:

/boot/kernel10.1S/kernel:     file format elf64-powerpc

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         007e5f84  0000000000100190  0000000000100190  00000190  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .interp       0000000d  00000000008e6114  00000000008e6114  007e6114  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .hash         00023598  00000000008e6128  00000000008e6128  007e6128  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
...
 25 kern_conf     000008d1  0000000000be6240  0000000000be6240  00ae6240  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 26 .data         000635b6  0000000000bf0000  0000000000bf0000  00af0000  2**16
                  CONTENTS, ALLOC, LOAD, DATA

Note that 0xbe6240+0x8d1 == 0xbe6b11 and 0xbe7000 is on a different page, and there are several more pages in the hole before 0xbf0000.

Also far more than "code" areas are having __syncicache's dcbst executed.


By contrast for the working kernel10.1RE kern_conf and .data are close together, .data being on the very next page (0xbd600+0x8d1=0xbdfed1 vs. 0xbe0000, a difference of 0x12f):


 25 kern_conf     000008d1  0000000000bdf600  0000000000bdf600  00adf600  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 26 .data         000635b6  0000000000be0000  0000000000be0000  00ae0000  2**16
                  CONTENTS, ALLOC, LOAD, DATA


===
Mark Millard
markmi at dsl-only.net

On 2015-Feb-9, at 08:28 AM, Mark Millard <markmi at dsl-only.net> wrote:

An early copy/paste error propagated throughout my figures and messes things up. So I try again...

Also it turns out openfirmware has a "word" for dumping the registers for a crash: .registers

So I can coordinate some register values with __syncicache operation and that is better information than I tried to give before anyway. I used a failing direct boot attempt for the kernel10.1S that has len = 0x10175d0.

01c27834 <__syncicache>      lis     r9,452
01c27838 <__syncicache+0x4>  lwz     r10,-25800(r9)

Note: cacheline_size = r10 above ends up being 0x20.
Yep the code is treating the 970MP processor as having a very small cache line size: 32. My understanding is that it is really 128. So each line is dcbst'd 4 times (offsets 0, 32, 64, and 96).

01c2783c <__syncicache+0x8>  addi    r11,r10,-1

Note: cacheline_size-1 = r11 above (0x1F).

[So fixing both my initial copy/paste of the from value and the cacheline_size-1 figures:

off == from & (cacheline_size-1)
off == 0x100000 & 0x1F (in all 3 cases)
off == 0x0

So in essence that old material should be ignored.]

01c27840 <__syncicache+0xc>  and     r0,r3,r11
01c27844 <__syncicache+0x10> add     r0,r0,r4
01c27848 <__syncicache+0x14> andc    r8,r3,r11
01c2784c <__syncicache+0x18> subf    r0,r10,r0
01c27850 <__syncicache+0x1c> neg     r7,r10
01c27854 <__syncicache+0x20> mr      r9,r8
01c27858 <__syncicache+0x24> mr      r11,r0
01c2785c <__syncicache+0x28> dcbst   r0,r9

Note: p = r9 above.
The register dump shows r9=0xbe7000 and shows DAR=0xbe7000.
It also shows SRR0=0x1c2785c and interrupt vector 0x300.
DSISR=0x40000000 (not found in the primary or secondary PTEG).
And:
r10=0x20, r6=0x5305d0, r11=5305b0, r7=0xffffffffffffffe0 (so -0x20)
(r6,r11 together are related to l in the C source.)

01c27860 <__syncicache+0x2c> add     r11,r11,r7
01c27864 <__syncicache+0x30> add.    r6,r10,r11
01c27868 <__syncicache+0x34> add     r9,r9,r10
01c2786c <__syncicache+0x38> bgt+    01c2785c <__syncicache+0x28>
01c27870 <__syncicache+0x3c> sync    
01c27874 <__syncicache+0x40> icbi    r0,r8
01c27878 <__syncicache+0x44> add     r0,r0,r7
01c2787c <__syncicache+0x48> add.    r9,r0,r10
01c27880 <__syncicache+0x4c> add     r8,r8,r10
01c27884 <__syncicache+0x50> bgt+    01c27874 <__syncicache+0x40>
01c27888 <__syncicache+0x54> sync    
01c2788c <__syncicache+0x58> isync
01c27890 <__syncicache+0x5c> blr


So the problem seems to be not at the ends of the range but in the middle of it: a hole in the address range with no PTEG coverage.


===
Mark Millard
markmi at dsl-only.net

On 2015-Feb-8, at 04:20 PM, Mark Millard <markmi at dsl-only.net> wrote:

[Side note: My prior comment about having additional signed/unsigned conversion code related to "int len" in __syncicache was wrong. That the calling argument (*result)->f_size has size_t as a type ends up just depending on the value not being so big as to cause problems when it is converted to "int len" for __syncicache's len parameter.]

%SRR0 being 0x1c2785c is a loader code address and does not change unless one switches/updates the loader.

objdump shows the following for the unstripped variant of the loader that is in use (/usr/obj/usr/home/markmi/src_10_1_releng/sys/boot/powerpc/ofw/loader):

01c27834 <__syncicache>      lis     r9,452
01c27838 <__syncicache+0x4>  lwz     r10,-25800(r9)
01c2783c <__syncicache+0x8>  addi    r11,r10,-1
01c27840 <__syncicache+0xc>  and     r0,r3,r11
01c27844 <__syncicache+0x10> add     r0,r0,r4
01c27848 <__syncicache+0x14> andc    r8,r3,r11
01c2784c <__syncicache+0x18> subf    r0,r10,r0
01c27850 <__syncicache+0x1c> neg     r7,r10
01c27854 <__syncicache+0x20> mr      r9,r8
01c27858 <__syncicache+0x24> mr      r11,r0
01c2785c <__syncicache+0x28> dcbst   r0,r9
01c27860 <__syncicache+0x2c> add     r11,r11,r7
01c27864 <__syncicache+0x30> add.    r6,r10,r11
01c27868 <__syncicache+0x34> add     r9,r9,r10
01c2786c <__syncicache+0x38> bgt+    01c2785c <__syncicache+0x28>
01c27870 <__syncicache+0x3c> sync    
01c27874 <__syncicache+0x40> icbi    r0,r8
01c27878 <__syncicache+0x44> add     r0,r0,r7
01c2787c <__syncicache+0x48> add.    r9,r0,r10
01c27880 <__syncicache+0x4c> add     r8,r8,r10
01c27884 <__syncicache+0x50> bgt+    01c27874 <__syncicache+0x40>
01c27888 <__syncicache+0x54> sync    
01c2788c <__syncicache+0x58> isync
01c27890 <__syncicache+0x5c> blr

So the address in %SRR0 that is reported is for:

01c2785c <__syncicache+0x28> dcbst   r0,r9

The source shows:

void
__syncicache(void *from, int len)
{
      register_t l, off;
      char    *p;

      off = (uintptr_t)from & (cacheline_size - 1);
      l = len += off;
      p = (char *)from - off;

      do {
              __asm __volatile ("dcbst 0,%0" :: "r"(p));
              p += cacheline_size;
      } while ((l -= cacheline_size) > 0);
      __asm __volatile ("sync");
      p = (char *)from - off;
      do {
              __asm __volatile ("icbi 0,%0" :: "r"(p));
              p += cacheline_size;
      } while ((len -= cacheline_size) > 0);
      __asm __volatile ("sync; isync");
}

Comparing the 3 examples of dcbst use for off and initial len values (using cacheline_size == 128 so cacheline_size-1 == 0x7F):

off == 0x1c35b48 & 0x7F (in all 3 cases)
off == 0x48

len == 0x1014b80 is cache-line-multiple sized: This is the case that works fine.
from+len   == 0x1c35b48 + 0x1014b80
from+len   == 0x2c4A6C8
from+len-1 == 0x2c4A6C7
start at 0x1c35b00
end with 0x2c4A680 (spans through 0x2c4A6FF)

len == 0x1014bb0 is NOT cache-line-multiple sized: FAILS!
from+len   == 0x1c35b48 + 0x1014b80
from+len   == 0x2c4A6F8
from+len-1 == 0x2c4A6F7
start at 0x1c35b00
end with 0x2c4A680 (spans through 0x2c4A6FF)
NOTE: Same start/end range as above but fails by causing an exception!

len == 0x10175d0 is NOT cache-line-multiple sized: FAILS!
from+len   == 0x1c35b48 + 0x10175d0
from+len   == 0x2c4D118
from+len-1 == 0x2c4D117
start at 0x1c35b00
end with 0x2c4D100 (spans through 0x2c4A67F)

The 3 cases have the same number of "extra before from" in the cache-lines.
But the 3 cases have varying numbers of "extra after from+len-1".

Only for len being a multiple of the cache-line-size would "extra before from" + "extra after from+len-1" be the cache-line-size (or zero if from was cache-line-aligned).

My guess is that something is configured based on "extra before from" + "extra after from+len-1" being the cache-line-size (or zero if from was cache-line-aligned) when it does not apply. So the processor ends up rejecting some bytes during a dcbst instruction for len values that are not multiples of the cache-line size.

For __syncicache's use from sys/boot/ofw/libofw/ppc64_elf_freebsd.c I've not figured out what all is configured related to the the range [from,from+len) (excludes from+len) --i.e. the range [from,from+len-1] since 0<len. where

from == (void *) (*result)->f_addr
len  == (*result)->f_size



Blindly listing things that can cause dcbst to exception (quotes form PowerISA_203_Final_Public.pdf):

> A Data Storage interrupt occurs when no higher priority exception exists and a data access cannot be per- formed for any of the following reasons. Data address translation is enabled (MSRDR=1) and the virtual address of any byte of the storage location specified by a Load, Store, icbi, dcbz, dcbst, dcbf[l], eciwx, or ecowx instruction cannot be translated to a real address. 

> A Data Segment interrupt occurs when no higher prior- ity exception exists and a data access cannot be per- formed because data address translation is enabled and the effective address of any byte of the storage location specified by a Load, Store, icbi, dcbz, dcbst, dcbf[l] eciwx, or ecowx instruction cannot be trans- lated to a virtual address. 

> dcbf, dcbfep, dcbst, and dcbstep instructions are treated as Loads with respect to protection. Flushing or storing a line from the cache is not considered a Store since the store has already been done to update the cache and the dcbf, dcbfep, dcbst, or dcbstep instruction is only updating the copy in main storage. As a Load, they can cause Read Access Control exception type Data Storage interrupts. 

(A table indicates Yes for Read Protection Violation and No for Write Protection Violation.)

> On implementations on which a Machine Check interrupt can be caused by referring to an invalid real address, executing a dcbz, dcbzep, or dcba instruction can cause a delayed Machine Check interrupt by establishing in the data cache a block that is associated with an invalid real address. See Section 3.2 of Book II. A Machine Check interrupt can eventually occur if and when a subsequent attempt is made to write that block to main storage, for example as the result of executing an instruc- tion that causes a cache miss for which the block is the target for replacement or as the result of exe- cuting a dcbst, dcbstep, dcbf, or dcbfep instruc- tion. 



The dcbst related details for the code are (picking a failing example's argument values):

from  == 0x1c35b48
len   == 0x1014bb0 (example)

starting at 0x1c35b00
ending with 0x2c4A680 (but it will span through 0x2c4A6FF)

The code's calculations are based on:

off == (uintptr_t)from & (cacheline_size - 1)
off == 0x1c35b48 & 0x7F
off == 0x48

(len as in the argument here, not later updates)
len+off == 0x1014bb0 + 0x48
len+off == 0x1014BF8
l == 0x1014BF8 (initially)

Note that len+off is not a multiple of the cache size. Repeated subtractions of the cache size will end up with a non-zero remainder.

(p as it is initialized here, not later updates)
p   == from - off
p   == 0x1c35b48 - 0x48
p   == 0x1c35b00
p+l == 0x1c35b00+0x1014BF8
p+l == 0x2c4A6F8

Note that 0x2c4A6F8 this is not cache aligned.
Note that as the dcbst loop progresses p+l is constant (other than in the middle of updating the pair).

So when...
l==0xF8: p==0x2c4A600
l==0x78: p==0x2c4A680
l<0: no dcbst for p

It appears that the dcbst loop does span the range of cache-lines that it should.

But apparently the processor is rejecting some bytes from that range when full cache-line sized units are counted. I would guess the rejection is on the l=0x78 side of things that varies from example to example.



===
Mark Millard
markmiatdsl-only.net

On 2015-Feb-8, at 01:16 AM, Mark Millard <markmi at dsl-only.net> wrote:

I've narrowed down greatly where the crashes happen, which need not be where root-cause is: that could be earlier.

In the following code [I'm using 10.1-RELEASE-p5 for reference here]

$ svnlite diff sys/boot/ofw/libofw/ppc64_elf_freebsd.c 
Index: sys/boot/ofw/libofw/ppc64_elf_freebsd.c
===================================================================

--- sys/boot/ofw/libofw/ppc64_elf_freebsd.c	(revision 277808)
+++ sys/boot/ofw/libofw/ppc64_elf_freebsd.c	(working copy)
@@ -59,7 +59,11 @@
	 * be done by the kernel after relocation.
	 */
	if (!strcmp((*result)->f_type, "elf kernel"))
+{
+printf("ppc64_ofw_elf_loadfile before __syncicache\n");
		__syncicache((void *) (*result)->f_addr, (*result)->f_size);
+printf("ppc64_ofw_elf_loadfile after __syncicache\n");
+}
	return (0);
}

for a directly-bootable (no crash) kernel-build both printf's are displayed. (The above code part of /boot/loader .)

But for the kernels that I build that fail to directly boot only the first of the two printf's is displayed when a direct boot is attempted: Openfirmware's notice with %SRR0 and %SRR1 shows up after that instead of the text from the second printf.

Based on that much it looks like the crash is either in evaluating the arguments to __syncicache or happens during __syncicache's execution, not after.



Changing the first printf to something like the sequence:

printf("ppc64_ofw_elf_loadfile before __syncicache\n");
printf("(void*)result: %p\n",(void*)result);
printf("(void*)(*result): %p\n",(void*)(*result));
printf("(void*)(*result)->f_addr: %p\n",(void*)(*result)->f_addr);
printf("(*result)->f_size : 0x%lx\n",(*result)->f_size);

Shows that all of the stages print before the crash happens, answering the question about evaluation of the arguments: there is no problem evaluating them.

So the crashes are strictly during __syncicache's activity.

Only (*result)->f_size varies between the 3 examples that I've used:

10.1-RELEASE-p5 variant without VERBOSE_SYSINIT (and BOOTVERBOSE, BOOTHOWTO). (boots fine)
10.1-RELEASE-p5 variant with VERBOSE_SYSINIT (and BOOTVERBOSE, BOOTHOWTO). (crashes)
10.1-STABLE (-r278028) variant without VERBOSE_SYSINIT (and BOOTVERBOSE, BOOTHOWTO). (crashes)

What the above printf's reported was:

(void*)result:    0x1c35b48
(void*)(*result):   0x1ebc0
(*result)->f_addr: 0x100000

10.1-RELEASE-p5 variant without VERBOSE_SYSINIT: (*result)->f_size:        0x1014b80
10.1-RELEASE-p5 variant with VERBOSE_SYSINIT: (*result)->f_size:           0x1014bb0
10.1-STABLE (-r278028) variant without VERBOSE_SYSINIT: (*result)->f_size: 0x10175d0

(Listed in increasing order.)

As I remember for the last two the crash report listed %SRR0: 0x1c2785c for both. The "without VERBOSE_SYSINIT" one above does not crash and for it the "ppc64_ofw_elf_loadfile after __syncicache" message shows up as it should.

===
Mark Millard
markmi at dsl-only.net

On 2015-Feb-7, at 09:43 AM, Mark Millard <markmi at dsl-only.net> wrote:

Correction to earlier Email: VERBOSE_SYSINIT with DDB (and GDB) all enabled (indirectly booted via using kernel10.1RE) got 0x1c277ec for the %SRR0 value, not 0x1c277fc. So slightly different than Kernel10.1S's 0x1c277fc (for this 10.1-STABLE variant). (I looked at the wrong notes when composing the original Email.)

More comparisons of kernel build options:

VERBOSE_SYSINIT enabled with DDB (and GDB) disabled still has the booting problem for my 10.1-RELEASE-p5 variant. It also still has the 0x1c277ec for the %SRR0 value.


For VERBOSE_SYSINIT disabled (DDB and GDB enabled) directly booted...

Preloaded elf kernel "/boot/kernel/kernel" at 0x1106000.
...
real memory  = 17152118784 (16357 MB)
available KVA = 7222611967 (6888 MB)
Physical memory chunk(s):
0x0000000000024000 - 0x00000000000fffff, 901120 bytes (220 pages)
0x0000000001115000 - 0x00000000017fffff, 7254016 bytes (1771 pages)
0x0000000001814000 - 0x0000000001bfffff, 4112384 bytes (1004 pages)
0x0000000001c3d000 - 0x0000000001c3cfff, 0 bytes (0 pages)
0x0000000004cbd000 - 0x000000000fffffff, 187969536 bytes (45891 pages)
0x0000000020000000 - 0x000000007f5effff, 1600061440 bytes (390640 pages)
0x0000000100000000 - 0x0000000466827fff, 14604730368 bytes (3565608 pages)
0x0000000200000000 - 0x00000001ffffffff, 0 bytes (0 pages)
0x0000000300000000 - 0x00000002ffffffff, 0 bytes (0 pages)
0x0000000400000000 - 0x00000003ffffffff, 0 bytes (0 pages)
avail memory = 16374190080 (15615 MB)

So 0x1c277ec is between the two:

0x0000000001814000 - 0x0000000001bfffff, 4112384 bytes (1004 pages)
0x0000000001c3d000 - 0x0000000001c3cfff, 0 bytes (0 pages)

(But I do not know what most of the regions and holes are supposed to be.)

VERBOSE_SYSINIT, DDB, and GDB enabled but indirectly booted via kernel10.1RE (via /boot/loader.conf's kernel="kernel10.1RE"), stopping, unloading, then doing "boot kernel":

Preloaded elf kernel "/boot/kernel/kernel" at 0x1116000.
...
real memory  = 17152118784 (16357 MB)
available KVA = 7222611967 (6888 MB)
Physical memory chunk(s):
0x0000000000024000 - 0x00000000000fffff, 901120 bytes (220 pages)
0x0000000001105000 - 0x0000000001114fff, 65536 bytes (16 pages)
0x0000000001125000 - 0x00000000017fffff, 7188480 bytes (1755 pages)
0x0000000001814000 - 0x0000000001bfffff, 4112384 bytes (1004 pages)
0x0000000001c3d000 - 0x0000000001c3cfff, 0 bytes (0 pages)
0x0000000004cbd000 - 0x000000000fffffff, 187969536 bytes (45891 pages)
0x0000000020000000 - 0x000000007f5effff, 1600061440 bytes (390640 pages)
0x0000000100000000 - 0x0000000466827fff, 14604730368 bytes (3565608 pages)
0x0000000200000000 - 0x00000001ffffffff, 0 bytes (0 pages)
0x0000000300000000 - 0x00000002ffffffff, 0 bytes (0 pages)
0x0000000400000000 - 0x00000003ffffffff, 0 bytes (0 pages)
avail memory = 16374190080 (15615 MB)


===
Mark Millard
markmi at dsl-only.net

On 2015-Feb-7, at 03:49 AM, Mark Millard <markmi at dsl-only.net> wrote:

Nathan, you had the below written about my problems with booting my builds of, say, 10.1-STABLE  (kernel="kernel10.1S" in /boot/loaderl.conf) without involving the kernel from my build of 10.1-RELEASE-p5 (kernel="kernel10.1RE" or sometimes kernel="kernel" in /boot/loader.conf), where kernel="kernel10.1RE" in /boot/loader.conf boots just fine...

> So this has to be some kind of icache issue. If you unload and reload 
> the *same* kernel, does it also help?
> -Nathan

(Part of the evidence was: Using kernel="kernel10.1RE" in /boot/loader.conf, stopping at the 10sec prompt, unloading, and doing "boot kernel 10.1S" lets my 10.1-STABLE builds boot that will not boot directly.)


Well I've got a little more information from a different direction: A way to create the problem when building my 10.1-RELEASE-p5 kernel is to enable VERBOSE_SYSINIT. More specifically the comparison/contrast I've done so far is...



I added the following 3 lines to my GENERIC64vtsc for my 10.1-RELEASE-p5 source tree (no other changes elsewhere at all)

options 	VERBOSE_SYSINIT
options 	BOOTVERBOSE=1
options 	BOOTHOWTO=RB_VERBOSE

and rebuilt kernel the via KERNCONF=GENERIC64vtsc INSTKERNNAME=kernel the resulting kernel load fails if referenced by /boot/loader.conf via kernel="kernel" line. The %SRR0 address value listed is the same as for kernel10.1S: 1c277fc. But booting using kernel="kernel10.1RE" in /boot/loader.conf, stopping at the 10sec wait, unloading, and typing "boot kernel" boots fine --just like "boot kernel10.1S".

Note: GENERIC64vtsc has option DDB enabled (and GBD too). (This is associated my with my information gathering for early G5 boot crashes/hangups.)

Note: This is the first time I've ever tried any of those 3 options. My kernel10.1S build was not based on them.

Then I changed the 3 lines by just commenting out the first of the 3 that I had added

#options 	VERBOSE_SYSINIT
options 	BOOTVERBOSE=1
options 	BOOTHOWTO=RB_VERBOSE

and rebuilt via KERNCONF=GENERIC64vtsc INSTKERNNAME=kernel again. The resulting /boot/kernel/... boots just fine when kernel="kernel" is used in /boot/loader.conf : no need for using kernel10.1RE or for stopping to do anything special.



===
Mark Millard
markmi at dsl-only.net