Question about rtld-elf. Anyone?.. Anyone?

Terry Lambert tlambert2 at mindspring.com
Wed Apr 30 18:35:51 PDT 2003


Daniel Eischen wrote:
> On Wed, 30 Apr 2003, Terry Lambert wrote:
> > Daniel Eischen wrote:
> > > As an experiment, I made the dlfoo calls in rtld-elf weak
> > > (__dlfoo -> dlfoo) and then overrode them in libpthread
> > > and protected them with mutexes.
> > >
> > > I can get mozilla to work about 1/2 of the time now, but
> > > it still gets stuck in the same state the other 1/2 of
> > > the time.  This is a bit of an improvement, and seems to
> > > indicate (at least to me) that rtld-elf is the culprit.
> >
> > Is there maybe a way to get the thread that was running when
> > the process was involuntarily preempted run first, instead of
> > running it based on priority?  Netscape, at least, made this
> > assumption for Java and Javascript pages.
> 
> Yes, but that breaks other things.  The threads library does
> scheduling based on POSIX scheduling semantics.  We don't
> have support for scheduling similar to what you find in
> the kernel.  This is probably why libthr doesn't have a
> problem with mozilla.

Kind of my point... I think it's Mozilla that has a problem,
not the threads library.  I also think that putting the sync
primitive in the dlopen() code making the problem less worse
is probably more a result of serialization through the
scheduler over preemption, than something that's being fixed
by the lock itself.

Changing the scheduling would let you confirm or refute that,
without having to go through and try to lock everything up,
only to find out the reduced problem is a side effect, and you
can't make the side effect cure the problem 100% of the time.

At that point, it's probably time to start looking at fixing
Mozilla, instead of fixing the threads library.

BTW: I'm aware of a number of programs that have the problem
of "all the world is Linux, Sun, or Windows", and assume that
threads will run to completion without being preempted, so long
as they have work to do, and had the quantum first.

I expect that if they got threads running on SMP systems, with
multiple threads running simultaneously, you'll see the same
problm with libthr, and with Linux and with Solaris, with these
applications... the only thing that it's doing is the same as
the dlopen() locks, which is narrowing the race window, rather
than eliminating it, in these cases.


> > Alternately, you may try disabling Java* in Mozilla, and see
> > if that keeps you from crashing.
> >
> > Also try not moving the mouse until everything is loaded, and
> > see if that saves you, too.
> 
> When it hangs, there's no windowing, so no mouse.

I mean load a page which would ordinary cause it to hang, and
don't move the mose at all while you wait for it to load.  No
expose events means nothing else to run means effectively
single threaded.

The Java* disabling suggestion is well worth following up; I
don't know if Mozilla does things exactly the same, but the
GIF rendering used to be in a thread that was not reentrancy
safe.  Disabling bot of these effectively disables that engine
in Netscape; perhaps it would do the same in Mozilla.

-- Terry


More information about the freebsd-threads mailing list