suspect problems on -current with pthread_cond_*()

Poul-Henning Kamp phk at phk.freebsd.dk
Wed Oct 6 18:13:58 UTC 2010


Hi Guys,

I updated my machine to current (9.0-CURRENT #0 r213377M: Mon Oct
4) (previous version from april sometime) and have started to see
weird new problems with Varnish regression tests.

It's pretty hard to get a trace on the problem, but from what I
have found out until now, it is related to the very first operation(s)
on a pthread_cond_t and the typical indication is a 100% cpu-spin
inside libthr.

I can reproduce the problem in approx 5 minutes by running the
automated Varnish regression tests in >=8 parallel streams repeatedly[1]
but due to the nature/complexity of varnish, I have not been able to
get a debugger to give me a useful backtrace yet.

I only use pthread_cond_t's in two isolated places and I am going to
muck about with them now, to see if I can affect the issue in any way
(higher/lower failure rate etc).

Any insights ?

Poul-Henning

PS: I'll arrive in Karlsruhe friday morning...

[1] It is an easy test to set up:

	svn co http://www.varnish-cache.org/svn/trunk
	cd trunk/varnish-cache
	sh autogen.des
	make
	cd varnish-cache/bin/varnishtest
	while gmake -j 12 -f Makefile.kristian check
	do
		true
	done

	Look for test-failures with
		"HTTP rx failed (poll: Unknown error: 0)"

	A couple of the test cases may fail under high load
	for other reasons, in particular m00001.vtc and
	c00002.vtc.

	The varnishtest driver program can also be hit, but this
	happens much more seldom, that usually leaves a core dump
	with a useless backtrace.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


More information about the freebsd-threads mailing list