CPU time accounting broken on 8-STABLE machine after a few hours of uptime

Don Lewis truckman at FreeBSD.org
Wed Sep 29 08:41:42 UTC 2010


On 29 Sep, Jeremy Chadwick wrote:
> On Wed, Sep 29, 2010 at 12:39:49AM -0700, Don Lewis wrote:
>> On 29 Sep, Jeremy Chadwick wrote:
>> 
>> > Given all the information here, in addition to the other portion of the
>> > thread (indicating ntpd reports extreme offset between the system clock
>> > and its stratum 1 source), I would say the motherboard is faulty or
>> > there is a system device which is behaving badly (possibly something
>> > pertaining to interrupts, but I don't know how to debug this on a low
>> > level).
>> 
>> Possible, but I haven't run into any problems running -CURRENT on this
>> box with an SMP kernel.
>> 
>> > Can you boot verbosely and provide all of the output here or somewhere
>> > on the web?
>> 
>> <http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-verbose.txt>
>> 
>> > If possible, I would start by replacing the mainboard.  The board looks
>> > to be a consumer-level board (I see an nfe(4) controller, for example).
>> 
>> It's an Abit AN-M2 HD.  The RAM is ECC.  I haven't seen any machine
>> check errors in the logs.  I'll run prime95 as soon as I have a chance.
> 
> Thanks for the verbose boot.  Since it works on -CURRENT, can you
> provide a verbose boot from that as well?  Possibly someone made some
> changes between RELENG_8 and HEAD which fixed an issue, which could be
> MFC'd.

Even when I saw the wierd ntp stepping problem and the calcru messages,
the system was still stable enough to build hundreds of ports.  In the
most recent case, I built 800+ ports over several days without any other
hiccups.

It could also be a difference between SMP and !SMP.  I just found a bug
that causes an immediate panic if lock profiling is enabled on a !SMP
kernel.  This bug also exists in -CURRENT.  Here's the patch:

Index: sys/sys/mutex.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/mutex.h,v
retrieving revision 1.105.2.1
diff -u -r1.105.2.1 mutex.h
--- sys/sys/mutex.h	3 Aug 2009 08:13:06 -0000	1.105.2.1
+++ sys/sys/mutex.h	29 Sep 2010 06:58:52 -0000
@@ -251,8 +251,11 @@
 #define _rel_spin_lock(mp) do {						\
 	if (mtx_recursed((mp)))						\
 		(mp)->mtx_recurse--;					\
-	else								\
+	else {								\
 		(mp)->mtx_lock = MTX_UNOWNED;				\
+		LOCKSTAT_PROFILE_RELEASE_LOCK(LS_MTX_SPIN_UNLOCK_RELEASE, \
+			mp);						\
+	}                                                               \
 	spinlock_exit();						\
 } while (0)
 #endif /* SMP */


After applying the above patch, I enabled lock profiling and got the
following results when I ran "make index":
<http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE_lock_profile.txt>

I didn't see anything strange happening this time.  I don't know if I
got lucky, or the change in kernel options "fixed" the bug.



More information about the freebsd-stable mailing list