G5 Quad Fans full speed after 1 min

Mark Millard marklmi at yahoo.com
Sun Jan 19 21:01:02 UTC 2020


On 2020-Jan-19, at 03:13, Mark Millard <marklmi at yahoo.com> wrote:
> 
> On 2020-Jan-19, at 00:38, Francis Little <oggy at farscape.co.uk> wrote:
> 
>> Hi,
>> 
>> My G5 Quad is running current from a few days ago, but this issue has been
>> happening for a long time.
>> 
>> After about 1 min of uptime, the fans go full speed.
>> 
>> As soon as I query anything like CPU temp or fan rpm with sysctl, the fans
>> return to a normal speed.
>> 
>> 1 min later the fans go full speed again.
>> 
>> I've been working round this for some time with a cron job that runs sysctl
>> with one of the cpu temp sensors to calm the system.
> 
> QUOTING an old message:
>   The mftb figures on the various cores can be so far apart that
>   threads can end-up stuck sleeping, such as syncr, pmac_thermal,
>   and buf*deamon*  threads. (This can really mess things up by
>   not updating the storage correctly.) Such is still true of the 
>   ELFv2 context.
> 
>   (Most folks notice this via shutdown timeouts and the fans
>   going fast unnecessarily. But it is involved earlier as well.)
> END QUOTE
> 
> Nothing in the boot sequence is forcing the CPUs/Cores to
> see any particular time relationship to each other and on
> the multi-socket PowerMacs it can vary widely (G4 and G5).
> Sometimes it will happen to end up okay, other times not.
> 
> (I've no access to a single-socket, multi-core PowerMac,
> so I just do not know for that kind of context.)
> 
> I run with patched boot-code that has cross-cpu/core time
> observations and adjustments to non-bsp time to see the
> bsp time as between the start and end of a round trip to
> the bsp from each non-bsp to get the bsp's time. It is
> based on the mid-point of the start and end times for
> the non-bsp's round trip vs. the bsp's returned time.
> With at most 4 cores, each non-bsp is done in sequence.
> The code only does this on PowerMacs, having no access
> to other types of PowerPC examples to test.
> 
> I've not seen this type of problem since mid 2019-May on
> any of:
> 
> 1 G5 with 2 sockets, 1 core each
> 2 G5's, 2 sockets, 2 cores each
> 2 G4's, 2 sockets, 1 core each
> 
> (The mid-May time frame is when I adjusted the code to
> deal with the faster time increment on the slower
> 32-bit processors for the model that I have access to.
> I had to be more careful to avoid biasing the approximate
> symmetry to be less symmetric. On the G5's its been
> longer since I've seen this problem, based on earlier
> source code.)
> 
> Unfortunately the "lab" the machines are in is powered
> down currently.
> 
> FYI: Prior to this technique, I had a pure hack that
> was observed to avoid the problem. But it changed code
> used all the time --code that I did not want to have
> any hack's in if I could avoid it.
> 
> FYI: I also run with other PowerMac related patches.
> Generally this mftb() time adjustment is one of the
> newest patches, possibly the newest. So my test
> context may be biased by the other patches.
> 
>> If I boot to OS X 10.5 and load the system, the fans are stable.
> 
> I've not done any investigation of the issue for the
> older contexts. But, if I remember right, I did see
> the problem on occasion back in that time frame.
> 
>> Does anyone else get this?
> 
> My understanding is everyone booting a fairly modern
> standard FreeBSD gets this sometimes for the kind of
> context that you specified. (I'm not sure of the
> variability if the frequency of the problem happening
> for that kind of context.)
> 
> I certainly saw it before I investigated avoiding it.

I got access to the sources with the patches.
I'll cover both 32-bit and 64-bit for how I
have avoided the thread-stuck-sleeping problems,
including for pmac_thermal.


In the following you likely want to avoid my
use of ap_pcpu being volatile (not a pointer
to something volatile). I use volatile to
document that I do not want the code
generation to avoid storing or accessing
RAM for what is volatile.

-extern void *ap_pcpu;
+extern void * volatile ap_pcpu;

There is one example of this in the diff's
that I show.

I also put powerpc_sync(); soon after any
ap_pcpu = ... assignment. I do not show
here such declarations or powerpc_sync()
use that are not in the sources tied to
the mftb related time measurements and
adjustments at boot time. (ap_pcpu is
sometimes not alone in being a variable
that an added powerpc_sync() is for. An
example will occur below.)

The additional powerpc_sync() use is for
gathering evidence that a sufficient context
exists, not trying to be an example of a
minimalist sufficient implementation. isync()
use and possibly more have a similar status
here.


First: I'm paranoid about interrupts messing
up times, so I use a modified mttb(...):
(whitespace may not be well preserved)

Index: /usr/src/sys/powerpc/include/cpufunc.h
===================================================================
--- /usr/src/sys/powerpc/include/cpufunc.h      (revision 356426)
+++ /usr/src/sys/powerpc/include/cpufunc.h      (working copy)
@@ -155,15 +155,8 @@
        return (tb);
 }

-static __inline void
-mttb(u_quad_t time)
-{
+// mttb is listed after intr_disable and intr_restore.
    
-       mtspr(TBR_TBWL, 0);
-       mtspr(TBR_TBWU, (uint32_t)(time >> 32));
-       mtspr(TBR_TBWL, (uint32_t)(time & 0xffffffff));
-}
-
 static __inline void
 eieio(void)
 {
@@ -202,6 +195,19 @@
        mtmsr(msr);
 }
     
+static __inline void
+mttb(u_quad_t time)
+{
+       const uint32_t   high= time>>32;
+       const uint32_t   low=  time&0xffffffffu;
+       
+       const register_t predisable_msr= intr_disable();
+       mtspr(TBR_TBWL, 0);
+       mtspr(TBR_TBWU, high);
+       mtspr(TBR_TBWL, low);
+       intr_restore(predisable_msr);
+}
+
 static __inline struct pcpu *
 get_pcpu(void)
 {


I have platform_powermac.c enable what I call
alternate_timebase_sync_style. I have no place
else doing so: PowerMac's are my only PowerPC
test environment.

Be WARNED this code is one of various places
involved in changing ap_pcpu to be volatile
(not point to volatile). You likely do not want
that --and, if you use it, various other places
would need to track. Still, I show my code as
it is here.

Index: /usr/src/sys/powerpc/powermac/platform_powermac.c
===================================================================
--- /usr/src/sys/powerpc/powermac/platform_powermac.c   (revision 356426)
+++ /usr/src/sys/powerpc/powermac/platform_powermac.c   (working copy)
@@ -55,7 +55,7 @@

 #include "platform_if.h"

-extern void *ap_pcpu;
+extern void * volatile ap_pcpu;

 static int powermac_probe(platform_t);
 static int powermac_attach(platform_t);
@@ -333,6 +333,9 @@
        return (powermac_smp_fill_cpuref(cpuref, bsp));
 }

+// platform_powermac.c is implicitly an AIM context: no explicit AIM test.
+extern volatile int alternate_timebase_sync_style; // 0 indicates old style; 1 indicates new style
+
 static int
 powermac_smp_start_cpu(platform_t plat, struct pcpu *pc)
 {
@@ -367,6 +370,13 @@

        ap_pcpu = pc;

+       // platform_powermac.c is implicitly an AIM context: no explicit AIM test.
+       // Part of: Attempt a better-than-historical approximately
+       //          equal timebase value for ap vs. bsp
+       alternate_timebase_sync_style= 1; // So: new style for PowerMacs
+
+       powerpc_sync(); // for ap_pcpu and alternate_timebase_sync_style
+
        if (rstvec_virtbase == NULL)
                rstvec_virtbase = pmap_mapdev(0x80000000, PAGE_SIZE);



There is more use of intr_disable() in the code directly
targeting measuring and adjusting time: some of the
code in mp_machdep.c . The protocol for going back and
forth with the bsp here is based on memory flags.
machdep_ap_bootstrap and cpu_mp_unleash have comments
about the protocol and its use.

In setting this up I had to be careful to make code that
avoided optimizations from calculating various things
earlier than appropriate. There is a comment about that
in machdep_ap_bootstrap.

Index: /usr/src/sys/powerpc/powerpc/mp_machdep.c
===================================================================
--- /usr/src/sys/powerpc/powerpc/mp_machdep.c   (revision 356426)
+++ /usr/src/sys/powerpc/powerpc/mp_machdep.c   (working copy)
@@ -69,6 +69,88 @@
 static struct mtx ap_boot_mtx;
 struct pcb stoppcbs[MAXCPU];

+#if defined(AIM)
+// Part of: Attempt a better-than-historical approximately
+//          equal timebase value for ap vs. bsp
+
+volatile int alternate_timebase_sync_style= 0; // 0 indicates old style; 1 indicates new style.
+volatile uint64_t bsp_timebase_sample= 0u;
+
+volatile unsigned int from_bsp_status_flag= 0u;
+// stages: 0u, 1u (bsp ready to start), 2u (bsp tbr value available to ap)
+
+volatile unsigned int from_ap_status_flag= 0u;
+// stages: 0u, 1u (ap ready for bsp tbr value to be found and sent)
+#endif
+
+static __inline uint64_t
+mftb_with_no_pointer_use(void)
+{
+#ifdef __powerpc64__
+       uint64_t tb; // not used for 32-bit powerpc
+       __asm __volatile ("mftb %0" : "=r"(tb));
+       return tb;
+#else
+       uint32_t tbu; // not pointer into tb
+       uint32_t tbl; // not pointer into tb
+
+       do {
+               tbu= mfspr(TBR_TBU);
+               tbl= mfspr(TBR_TBL);
+       } while (tbu != mfspr(TBR_TBU));
+
+       // The construction of the unint64_t value does bias the mttb some
+       // for the round-trip-start side of things.
+       //
+       // The pointers into tb technique would involve a pair of memory
+       // writes and a pair of memory reads instead, the writes being
+       // in the loop.
+       return ((uint64_t)tbu<<32) | tbl;
+#endif
+}
+
+static __inline uint64_t
+mftb_plus_delta(volatile uint64_t* bsp_tbr, int64_t ap_midpoint)
+       // The return value is used in the mttb as the argument.
+{      
+#ifdef __powerpc64__
+       uint64_t tb; // not used for 32-bit powerpc
+       __asm __volatile ("mftb %0" : "=r"(tb));
+       // The construction of the unint64_t value does bias the mttb some:
+       // it assignes an earlier time than hoped for, given these later
+       // calculations.
+       return tb + ((int64_t)*bsp_tbr - ap_midpoint);
+#else
+       // Establishes delta_for_approx_match_to_bsp_tbr_values such that:
+       //           ap_midpoint+delta_for_approx_match_to_bsp_tbr_values==*bsp_tbr
+       int64_t  delta_for_approx_match_to_bsp_tbr_values;
+       uint32_t tbu;       // not pointer into tb
+       uint32_t tbl;       // not pointer into tb
+
+       do {
+               // The below in-loop style is for avoiding the loop
+               // vs. ap_midpoint's calculation being reversed in
+               // the code generated: volatile is is being put to
+               // use here.
+               delta_for_approx_match_to_bsp_tbr_values= (int64_t)*bsp_tbr-ap_midpoint;
+
+               tbu= mfspr(TBR_TBU);
+               tbl= mfspr(TBR_TBL);
+       } while (tbu != mfspr(TBR_TBU));
+
+       // The construction of the unint64_t value does bias the mttb some:
+       // it assignes an earlier time than hoped for, given these later
+       // calculations. Easily observable on the example 7455 based PowerMac
+       // G4. (Faster than G5 tbr increment rate but a slower processor,)
+       // But the overall process is still an improvement.
+       //
+       // The pointers into tb technique would involve a pair of memory
+       // writes and a pair of memory reads instead, the writes being
+       // in the loop. The "+ . . ." would still be involved.
+       return ( ((uint64_t)tbu<<32) | tbl ) + delta_for_approx_match_to_bsp_tbr_values;
+#endif
+}
+
 void
 machdep_ap_bootstrap(void)
 {
@@ -76,19 +158,76 @@
        PCPU_SET(awake, 1);
        __asm __volatile("msync; isync");

+#if defined(AIM)
+       powerpc_sync();
+       isync();
+       if (1==alternate_timebase_sync_style)
+       {
+               // Part of: Attempt a better-than-historical approximately
+               //          equal timebase value for ap vs. bsp
+
+               // No claim to deal with overflow/wraparound of tbr, or even
+               // of the upper bit being on.
+
+               register_t oldmsr= intr_disable();
+
+               while (1u!=from_bsp_status_flag)
+                       ; // spin waiting for bsp to flag that its ready to start.
+
+               //  Start to measure a round trip:: to the bsp and back.
+
+               isync(); // Be sure below mftb() result is not from earlier speculative execution.
+               uint64_t const start_round_trip_time_on_ap= mftb_with_no_pointer_use();
+               atomic_store_rel_int(&from_ap_status_flag, 1u); // bsp waits for such before its mftb().
+
+               while (2u!=from_bsp_status_flag)
+                       ; // spin waiting for bsp's tbr value
+
+               // Mid-point of ap round trip and the bsp timebase value should be approximately equal
+               // when the tbr's are well matched, absent interruptions on both sides.
+
+               isync(); // Be sure below mftb() result is not from earlier speculative execution.
+               uint64_t const end_round_trip_time_on_ap= mftb_with_no_pointer_use();
+               isync(); // Be sure above mftb() result is not from overlapping with the following.
+
+               int64_t const approx_round_trip_tbr_delta_on_ap
+                               = (int64_t)end_round_trip_time_on_ap - (int64_t)start_round_trip_time_on_ap;
+               int64_t const ap_midpoint_value
+                               = (int64_t)start_round_trip_time_on_ap + approx_round_trip_tbr_delta_on_ap/2;
+
+               // The mftb_plus_delta use is for helping to the control the code order relative
+               // to tbr access. Such issues are notable for the 7455 based 2-socket PowerMacs,
+               // for example. Faster tbr increment rate than the G5's but slower processors
+               // and such. Still, overall this definitely helps such contexts compared to the
+               // historical style of timebase synchronization.
+               isync(); // Be sure below mftb() result is not from earlier speculative execution.
+               mttb(mftb_plus_delta(&bsp_timebase_sample,ap_midpoint_value));
+
+               atomic_store_rel_int(&from_bsp_status_flag, 0u); // Get ready for next ap in bsp loop
+               atomic_store_rel_int(&from_ap_status_flag, 0u);  // Flag bsp that this ap is done
+
+               mtmsr(oldmsr);
+       }
+#endif
+
        while (ap_letgo == 0)
                nop_prio_vlow();
        nop_prio_medium();

-       /*
-        * Set timebase as soon as possible to meet an implicit rendezvous
-        * from cpu_mp_unleash(), which sets ap_letgo and then immediately
-        * sets timebase.
-        *
-        * Note that this is instrinsically racy and is only relevant on
-        * platforms that do not support better mechanisms.
-        */
-       platform_smp_timebase_sync(ap_timebase, 1);
+#if defined(AIM)
+       if (0==alternate_timebase_sync_style)
+#endif
+       {
+               /*
+                * Set timebase as soon as possible to meet an implicit rendezvous
+                * from cpu_mp_unleash(), which sets ap_letgo and then immediately
+                * sets timebase.
+                *
+                * Note that this is instrinsically racy and is only relevant on
+                * platforms that do not support better mechanisms.
+                */
+               platform_smp_timebase_sync(ap_timebase, 1);
+       }

        /* Give platform code a chance to do anything else necessary */
        platform_smp_ap_init();
@@ -261,6 +400,34 @@
                                    pc->pc_cpuid, (uintmax_t)pc->pc_hwref,
                                    pc->pc_awake);
                        smp_cpus++;
+
+#if defined(AIM)
+                       // Part of: Attempt a better-than-historical approximately
+                       //          equal timebase value for ap vs. bsp
+                       powerpc_sync();
+                       isync();
+                       if (1==alternate_timebase_sync_style)
+                       {
+                               register_t oldmsr= intr_disable();
+
+                               atomic_store_rel_int(&from_bsp_status_flag, 1u); // bsp ready to start.
+
+                               while (1u!=from_ap_status_flag)
+                                       ; // spin waiting for ap to flag: time to send a tbr.
+
+                               isync(); // Be sure below mftb() result is not from earlier.
+                               bsp_timebase_sample= mftb_with_no_pointer_use();
+                               atomic_store_rel_int(&from_bsp_status_flag, 2u); // bsp tbr available.
+
+                               // Most of the rest of the usage is in machdep_ap_bootstrap,
+                               // other than controling alternate_timebase_sync_style value.
+
+                               while (0u!=from_ap_status_flag)
+                                       ; // spin waiting for ap to be done with the sample.
+
+                               mtmsr(oldmsr);
+                       }
+#endif
                } else
                        CPU_SET(pc->pc_cpuid, &stopped_cpus);
        }
@@ -267,14 +434,22 @@

        ap_awake = 1;

-       /* Provide our current DEC and TB values for APs */
-       ap_timebase = mftb() + 10;
-       __asm __volatile("msync; isync");
+#if defined(AIM)
+       if (0==alternate_timebase_sync_style)
+#endif
+       {
+               /* Provide our current DEC and TB values for APs */
+               ap_timebase = mftb() + 10;
+               __asm __volatile("msync; isync");
+       }

        /* Let APs continue */
        atomic_store_rel_int(&ap_letgo, 1);

-       platform_smp_timebase_sync(ap_timebase, 0);
+#if defined(AIM)
+       if (0==alternate_timebase_sync_style)
+#endif
+               platform_smp_timebase_sync(ap_timebase, 0);

        while (ap_awake < smp_cpus)
                ;



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-ppc mailing list