Bug in calcru in he 6.2 and 6.3 kernels

Murty, Ravi ravi.murty at intel.com
Sun Jul 20 13:51:25 UTC 2008


Has anyone identified the issue(s) that might be broken in the ULE
scheduler in 6.2? I am running a rather simple test - creates 8 threads
and runs it on an 8 CPU system (not a whole lot running on the system).
When I run it with ULE, it runs slow, very slow sometimes - it's almost
like the threads aren't picked to run. When I switch to 4BSD, things run
fine. I was wondering if there is something I could look at? I realize
it is broken, but I've added lots of stuff to the scheduler (for our
project) which I'd have to migrate to ULE in 7.0. I'd like to figure out
what might be going on in 6.2 before I spend the time to migrate to 7.0.

Thanks
Ravi


-----Original Message-----
From: Kris Kennaway [mailto:kris at FreeBSD.org] 
Sent: Monday, July 07, 2008 2:04 PM
To: d at delphij.net
Cc: Murty, Ravi; freebsd-hackers at freebsd.org
Subject: Re: Bug in calcru in he 6.2 and 6.3 kernels

Xin LI wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Kris Kennaway wrote:
> | Murty, Ravi wrote:
> |> Hello everyone,
> |>
> |>
> |>
> |> Finally found what my last problem was. We were running top in a
loop
> |> and running some workloads that called sched_bind() to bind threads
to
> |> specific CPUs. The problem was that (and I am using ULE) sched_bind
> |> calls a function to notify another CPU of a thread and then
mi_switches
> |> out of it. Since mi_switch sets the "oncpu" field of the thread to
NOCPU
> |> and given the thread is still running, calcru would come in and
assert
> |> the fact that "If I am running I better no be on NOCPU".. It
appears
> |> that in other parts of the kernel (e.g. forward_signal) this is
> |> acceptable (i.e. it is okay to be running and oncpu is NOCPU).
> |>
> |>
> |> Thanks
> |> Ravi
> |
> | Don't use ULE in 6.x, it's broken and will not be fixed.
> 
> Perhaps we should mark it as broken using #error?  After all the ULE
> changes in 7.x is amazing and we do not want to have users to obtain
bad
> impressions from the 6.x versions...
> 
> I am not sure but some explicit warning message saying "ULE has been
> revamped in FreeBSD 7.x+ and will not be MFC'ed back to 6.x, please
use
> SCHED_4BSD or upgrade to 7.x." seems to be better than having them to
> pursue the mailing list archive...

I would agree with this; if you're happy running unstable and broken 
scheduler code, you're surely able to update to 7.0 and run stable and 
working scheduler code :)

We should run it past re@ first since it's a change to a stable branch, 
but it's experimental code so I don't see an issue.

Kris


More information about the freebsd-hackers mailing list