cvs commit: src/sys/kern kern_mutex.c

Tue Jun 5 17:12:50 UTC 2007

2007/6/5, Kostik Belousov <kostikbel at gmail.com>:
> On Tue, Jun 05, 2007 at 05:43:03PM +0200, Attilio Rao wrote:
> > 2007/6/5, Attilio Rao <attilio at freebsd.org>:
> > >2007/6/5, Bruce Evans <brde at optusnet.com.au>:
> > >>
> > >> I get a "spin lock held too long" panic during (an interrupt in?) acpi
> > >> initialization on booting non-PREEMPTION SCHED_4BSD SMP.  Haven't tried
> > >> other cases.
> > >
> > >Do you have a backtrace or any other debugging stuffs available?
> >
> > Mmm, I think I got the bug.
> > basically, in kern_mutex.c::_mtx_unlock_sleep(), in the not-preemptive
> > case what happens at some point is:
> >
> > td = curthread;
> > if (td->td_critnest > 0 || td1->td_priority >= td->td_priority)
> >         return;
> >
> > thread_lock(td1);
> > if (!TD_IS_RUNNING(td1)) {
> > ...
> >
> > mi_switch(SW_INVOL, NULL);
> > ...
> > }
> > thread_unlock(td1);
> >
> > Which is wrong beacause td1 is not curthread and really curthread
> > should be locked too when context switching.
> >
> > To a first look the idea is that td and td1 should be locked both, but
> > I just want more time to look better at it.
>
> With the following patch, I get the
>         panic: mutex sched lock recursed at .../kern_synch.c:379
> with backtrace
> mi_switch
> _mtx_unlock_sleep
> _mtx_unlock_flags
> vm_pagezero
>
>
> diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c
> index e0592a8..bf44229 100644
> --- a/sys/kern/kern_mutex.c
> +++ b/sys/kern/kern_mutex.c
> @@ -670,7 +670,9 @@ _mtx_unlock_sleep(struct mtx *m, int opts, const char *file, int line)
>                             "_mtx_unlock_sleep: %p switching out lock=%p", m,
>                             (void *)m->mtx_lock);
>
> +               thread_lock(td);
>                 mi_switch(SW_INVOL, NULL);
> +               thread_unlock(td);
>                 if (LOCK_LOG_TEST(&m->lock_object, opts))
>                         CTR2(KTR_LOCK, "_mtx_unlock_sleep: %p resuming lock=%p",
>                             m, (void *)m->mtx_lock);

After I back home I realized there is only sched_lock currently... not
yet per-cpu locks :)

So we should just disable this code for the moment (or forever).

Attilio

-- 
Peace can only be achieved by understanding - A. Einstein