CPU time accounting broken on 8-STABLE machine after a few
hours of uptime
Don Lewis
truckman at FreeBSD.org
Wed Sep 29 08:41:42 UTC 2010
On 29 Sep, Jeremy Chadwick wrote:
> On Wed, Sep 29, 2010 at 12:39:49AM -0700, Don Lewis wrote:
>> On 29 Sep, Jeremy Chadwick wrote:
>>
>> > Given all the information here, in addition to the other portion of the
>> > thread (indicating ntpd reports extreme offset between the system clock
>> > and its stratum 1 source), I would say the motherboard is faulty or
>> > there is a system device which is behaving badly (possibly something
>> > pertaining to interrupts, but I don't know how to debug this on a low
>> > level).
>>
>> Possible, but I haven't run into any problems running -CURRENT on this
>> box with an SMP kernel.
>>
>> > Can you boot verbosely and provide all of the output here or somewhere
>> > on the web?
>>
>> <http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-verbose.txt>
>>
>> > If possible, I would start by replacing the mainboard. The board looks
>> > to be a consumer-level board (I see an nfe(4) controller, for example).
>>
>> It's an Abit AN-M2 HD. The RAM is ECC. I haven't seen any machine
>> check errors in the logs. I'll run prime95 as soon as I have a chance.
>
> Thanks for the verbose boot. Since it works on -CURRENT, can you
> provide a verbose boot from that as well? Possibly someone made some
> changes between RELENG_8 and HEAD which fixed an issue, which could be
> MFC'd.
Even when I saw the wierd ntp stepping problem and the calcru messages,
the system was still stable enough to build hundreds of ports. In the
most recent case, I built 800+ ports over several days without any other
hiccups.
It could also be a difference between SMP and !SMP. I just found a bug
that causes an immediate panic if lock profiling is enabled on a !SMP
kernel. This bug also exists in -CURRENT. Here's the patch:
Index: sys/sys/mutex.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/mutex.h,v
retrieving revision 1.105.2.1
diff -u -r1.105.2.1 mutex.h
--- sys/sys/mutex.h 3 Aug 2009 08:13:06 -0000 1.105.2.1
+++ sys/sys/mutex.h 29 Sep 2010 06:58:52 -0000
@@ -251,8 +251,11 @@
#define _rel_spin_lock(mp) do { \
if (mtx_recursed((mp))) \
(mp)->mtx_recurse--; \
- else \
+ else { \
(mp)->mtx_lock = MTX_UNOWNED; \
+ LOCKSTAT_PROFILE_RELEASE_LOCK(LS_MTX_SPIN_UNLOCK_RELEASE, \
+ mp); \
+ } \
spinlock_exit(); \
} while (0)
#endif /* SMP */
After applying the above patch, I enabled lock profiling and got the
following results when I ran "make index":
<http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE_lock_profile.txt>
I didn't see anything strange happening this time. I don't know if I
got lucky, or the change in kernel options "fixed" the bug.
More information about the freebsd-stable
mailing list