[Bug 228056] powerpc64: MCE on POWER9 machine (AC922)

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Tue May 8 00:52:50 UTC 2018


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=228056

            Bug ID: 228056
           Summary: powerpc64: MCE on POWER9 machine (AC922)
           Product: Base System
           Version: CURRENT
          Hardware: powerpc
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: breno.leitao at gmail.com

I am creating this bug to track my progress on investigating the bootstrap of
FreeBSD on a AC922 (POWER9) machine.

When I boot HEAD, I found the following MCE:

 KDB: debugger backends: ddb
 KDB: current backend: ddb
 Copyright (c) 1992-2018 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
 FreeBSD is a registered trademark of The FreeBSD Foundation.
 FreeBSD 12.0-CURRENT #152 66f063557f2(master)-dirty: Tue May  8 01:17:52 CET 
2018
    root at free8:/usr/obj/root/kernel/freebsd/powerpc.powerpc64/sys/BRENO powerpc
 gcc version 4.2.1 20070831 patched [FreeBSD]
 WARNING: WITNESS option enabled, expect reduced performance.
 WARNING: DIAGNOSTIC option enabled, expect reduced performance.
 Entering uma_startup with 44 boot pages configured
 startup_alloc from "UMA Kegs", 41 boot pages left
 startup_alloc from "UMA Zones", 40 boot pages left
 startup_alloc from "UMA Zones", 38 boot pages left
 startup_alloc from "UMA Zones", 36 boot pages left
 start at c000000001e30100
 KERNEL BASE at 100100
 sum is  c000000001d30000

 fatal kernel trap:

   exception       = 0x200 (machine check)
   srr0            = 0xc00000000255d284 (0x82d284)
   srr1            = 0x9000000000201032
   current msr     = 0x9000000000000032
   lr              = 0xc00000000255d278 (0x82d278)
   curthread       = 0xc000000002e2bbc0
          pid = 0, comm = 

 [ thread pid 0 tid 0 ]
 Stopped at      0xc00000000255d284


 Digging further, this is where it is breaking:

     82d264:       7f c3 f3 78     mr      r3,r30
     82d268:       7e e4 bb 78     mr      r4,r23
     82d26c:       7f 65 db 78     mr      r5,r27
     82d270:       7f 86 e3 78     mr      r6,r28
     82d274:       4b ff f8 49     bl      82cabc <.keg_alloc_slab>             
     82d278:       7c 7d 1b 79     mr.     r29,r3
     82d27c:       41 a2 00 94     beq+    82d310 <.keg_fetch_slab+0x2cc>
     82d280:       7f bc eb 78     mr      r28,r29
->>  82d284:       e8 1d 00 00     ld      r0,0(r29)
     82d288:       7f a0 f0 00     cmpd    cr7,r0,r30     


At this place, r29 contains:

  db> print $r29
  c00003fffffddf90


Looking at that code, I think we are here:

               slab = keg_alloc_slab(keg, zone, domain, allocflags);
                /*
                 * If we got a slab here it's safe to mark it partially used
                 * and return.  We assume that the caller is going to remove    
                 * at least one item.
                 */
                if (slab) {
       ->>               MPASS(slab->us_keg == keg);

where 'slab' is at r29 and 'us_keg' should be the very first (0) field. Keg
should be r30:

  > print $r30  
  c00003fffffd7000

The problem seem to be when the code is dereferencing slab(r29), which seems to
be causing the MCE.

This is the content of the value r30:

  db> x $r30      
  0xc00003fffffd7000:     c0000000
  db> 
  0xc00003fffffd7004:     2af8bb8

But I am not able to dereference d29:

  db> x $r29 
  0xc00003fffffddf90: (machine halts)

I am wondering why accessing this page is causing this problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list