Fingerpointing about broken Ada tasking starting with FreeBSD 9.0 threading

Alfred Perlstein alfred at freebsd.org
Fri Jul 20 15:08:03 UTC 2012


* John Marino <freebsdml at marino.st> [120720 00:59] wrote:
> On 7/19/2012 23:23, Alfred Perlstein wrote:
> >Hey John,
> >
> >I find the best way to figure stuff like this out would be to
> >instrument the code.
> >
> >I think what could happen here is simply adding a FILE,LINE to the struct
> >thread and have THR_CRITICAL_ENTER record the last place it was called
> >by stuffing the current __FILE__ and __LINE__ into those variables.
> >
> >Then when you hit that assertion you can dump the last place.
> >
> >The only problem you could face with such a system is a false positive
> >if the code goes multiple levels deep, you'll probably want to clear
> >the data there when you see a THR_CRITICAL_LEAVE.
> >
> >Then if in your assertion you see that it's clear/NULL then you want to
> >probably implement a static stack and use (thrd)->critical_count and
> >(thrd)->locklevel as indecies to respective traceback stacks.
> >
> >It really shouldn't take more than a few hours to write the instrumentation
> >code and I could see it staying inside the code under a PTHREAD_HEAVY_DEBUG
> >flag if needed.
> 
> Hi Alfred,
> Thanks for providing some techniques that can perhaps help track down 
> what's going on.
> 
> I'm still interested in the big picture, though.

There is no big picture unless you take the time to diagnose what
is happening.   There is a bug somewhere.  Talking about "big
picture" doesn't mean anything.

The bug could be due to any of the reasons you described, or due
to other reasons.  What needs to be done is some investigation into
what is triggering the bug and then determine if it's a bug, false
positive, corruption or something else.

> We've got a package that runs on FreeBSD 6, 7, 8 and broke on 9.
> Similarly the "critical_count" property is at the expected 0 value on 
> thread exit on DragonFly.
> 
> The new thread panic caused a regression -
> Was it necessary to put this panic there?
> What are the consequences of continuing?
> Were these resources being "held" before when a mutex was used and just 
> not detected? (which implies consequences are not high so why panic?)
> Has there been other fallout from this change?
> 
> I'm guessing your first inclination would be to blame GNAT, and say that 
> if the crit count is wrong, something must not be getting cleaned up and 
> you may be right, but the fact remains that software that builds and 
> runs on FreeBSD 6, 7, and 8 doesn't run on 9.  I assume that was unintended.
> 
> John
> 

It sounds like you're advocating for just removing an assertion without
proving it's a false positive.  I don't think that will work out unfortunately.

-- 
- Alfred Perlstein
.- VMOA #5191, 03 vmax, 92 gs500, 85 ch250, 07 zx10
.- FreeBSD committer


More information about the freebsd-threads mailing list