Strange timing problems with BETA7
amon at sockar.homeip.net
Thu Oct 14 02:47:11 PDT 2004
I'm having very strange stability problems with BETA7 which seems related to timing/clock :
Hardware is a Netra t 1125 with 2 CPU.
After a fresh reboot, when I do a standard ping on any ip adress, the
interval between the pings is not constant and is generally lower than the 1 second it
should be by default.
I sometimes also get negative latencies with ping or traceroute :
# ping 22.214.171.124
PING 126.96.36.199 (188.8.131.52): 56 data bytes
64 bytes from 184.108.40.206: icmp_seq=0 ttl=60 time=-432.827 ms
64 bytes from 220.127.116.11: icmp_seq=1 ttl=60 time=1.955 ms
# traceroute 18.104.22.168
traceroute to 22.214.171.124 (126.96.36.199), 64 hops max, 52 byte packets
1 gi0-12-swr102-mix-courbevoie (188.8.131.52) 436.046 ms 0.733 ms 0.611 ms
2 gi0-2-3-edou.nerim.net (184.108.40.206) 0.619 ms -434.763 ms 435.882 ms
3 gi0-3-32-svenny.nerim.net (220.127.116.11) 1.737 ms 1.435 ms 1.715 ms
After a few hours of activity (this box is an ftp server), the kernel gives this kind
of message :
calcru: negative runtime of -893918 usec for pid 1344 (pure-ftpd)
calcru: negative runtime of -761379 usec for pid 1339 (pure-ftpd)
calcru: negative runtime of -1687109 usec for pid 1337 (pure-ftpd)
calcru: negative runtime of -295856 usec for pid 7 (pagedaemon)
calcru: runtime went backwards from 162673274 usec to 159978646 usec for pid 29 (intr2017: hme0)
calcru: runtime went backwards from 33673531 usec to 30674086 usec for pid 4 (g_down)
calcru: runtime went backwards from 102734677682 usec to 102731983847 usec for pid 12 (idle: cpu0)
calcru: runtime went backwards from 102678868452 usec to 102678764016 usec for pid 11 (idle: cpu1)
At this point, doing a netstat -Iw 1 gives nothing but the fields header. In the
same fashion, pinging any ip address gives a single reply and the ping command
is then stuck. (both processes are in select() state when they are stuck and
are interruptible with ^C)
When doing a reboot after a few hours of uptime, the reboot process seems to
get stuck after killing all the running processes, I never see the kernel
shutdown messages and have to power cycle the box.
Some apps seem to have problems with timing too :
wget gives randomly :
Assertion failed: (msecs >= 0), function calc_rate, file retr.c, line 262.
Abort trap (core dumped)
This started when I upgraded from 5.2.1 to BETA3 and the problem is still
present in BETA7 (last cvsup from Oct 5).
I reseted the date according to the heads up about the mk48txx commit.
I tried mpsafenet=0 with same result. My kernel config is pretty much like
GENERIC except that I'm using SCHED_4BSD, maxusers 512 and ZERO_COPY_SOCKETS
(no WITNESS, no INVARIANTS).
Any ideas on this ? Can this be a hardware problem ?
More information about the freebsd-sparc64