Multi processor locking problem under 7.0

John Baldwin jhb at freebsd.org
Tue Jan 29 16:06:30 PST 2008


On Tuesday 29 January 2008 03:26:44 pm Paul wrote:
> 
> >I have several systems of two different types running 7.0. One is an IBM
> >3550 and the other a Dell 2950. The IBMs more than the Dells
> >consistently seem to have a kernel locking problem during dump.
> >Specifically, if I execute this command:
> >
> >         dump 0uaLCf 64 /dev/null /usr
> >
> >Dump consistently stops in Phase IV. However, if I set
> >machdep.hlt_logical_cpus=1, dump does not stop. At the end of this
> >message is my boot information.
> >
> >When logical_cpus=0, the following is typical of what is displayed by
> >top when dump stops:
> >
> >   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
> >COMMAND
> >   926 root        1   4    0 75476K 71744K sbwait 0   0:04  0.00% dump
> >   928 root        1  20    0 75348K 67740K pause  1   0:02  0.00% dump
> >   929 root        1  20    0 75348K 67740K pause  1   0:02  0.00% dump
> >   927 root        1  20    0 75348K 67740K pause  1   0:02  0.00% dump
> >   919 root        1   8    0 75348K 67144K wait   0   0:00  0.00% dump
> >
> >Fooling around a bit I have found that if I truss dump, the dump
> >continues. On the Dells, if I force disk activity during the dump, such
> >as executing a ls -lR /usr > /dev/null, the dump finishes.
> >
> >I am unsure how to proceed in debugging this problem. It has been around
> >for a while but I am now installing the IBMs and the dump problem is a
> >no-starter. Please contact me directly on how to proceed.
> 
> I have noticed something similar on my Intel test box.
> 
> When compiling many ports in the tree that is updated on 7.0RC1 with 
> a S5000pal with 2 Quadcore Xeons the process just STOPS. I am using 
> the install disk and have not updated to the latest cvsup release yet 
> (I am trying to make the world now with fingers crossed :)  ) I tried 
> it with just one quadcore and the same problem happens.
> 
> There are no errors on the screen but it no longer proceeds with the 
> port build. When I suspend the process and restart the make in the 
> same session it has no problem getting past this impasse and with a 
> few suspends the make finishes without error. It does not happen 
> every time which is very odd.
> 
> Based on your description above it seems like it may be the same problem.
> 
> What do you think?

If you have threads blocked on "vmo_de" then upgrade to the latest RELENG_7 or 
RELENG_7_0 (specifically the sys/kern/subr_sleepqueue.c file) and try again.

-- 
John Baldwin


More information about the freebsd-amd64 mailing list