Freezes with 6.0 and 7-CURRENT when working with many symlinks/dirs

Attila Nagy bra at fsn.hu
Fri Oct 28 12:52:44 PDT 2005


Hello,

I'm struggling with this bug for a while now. I have a fully 
reproduceable freeze with both RELENG_6 and HEAD in amd64 mode (I could 
not try with i386).

It strikes when I want to synchonise a large pool of 
symlinks/directories from another machine to this FreeBSD one.
The total number of files is about 6-10 million.

The freeze occurs randomly, either when rsync deletes a massive amount 
of symlinks, or directories on the local machine, or when it starts to 
create them. But it freezes, no matter what I do.

The machine itself is a HP DL380G4 (two Xeons, HTT on), which has an 
additional SmartArray 6402 controller (ciss0: the SmartArray 6i on the 
motherboard and ciss1 the 6402). I would like to sync onto ciss1, that's 
where the activity happens.

Under "freeze" I mean the machine stops working, I can not ping, ssh 
sessions disconnect and the console hungs. I can do two things in this 
stage. Turning MP_WATCHDOG on catches this and enters the debugger and 
when I issue an NMI I get the same effect (of course :).

I've tried the following to workaround or locate the source of this problem:
- turn HTT off
- turn softupdates off
- turn ACPI off (with the beastie menu)
- turn preemption off
- debug.mpsafevfs=0 and debug.mpsafenet=0
- turn dirhash off
all without success.

I have nfsd and quota enabled, but currently the former is not in use.
The synchronised directories and files are in the ownership of many, non 
existend (not in /etc/master.passwd) uids and I have quota for most of 
those uids.

I could collect three traces, some of them are a little bit mangled by 
the ILO (ssh access to the console).

http://people.fsn.hu/~bra/freebsd/crash-20051028/

crash1 and crash2 is from the in-kernel debugger, crash3 is after the 
MP_WATCHDOG fired and a call doadump and kgdb kernel /var/crash/vmcore...

Any ideas what else should I try, or what should I do in the debugger to 
make it easier to find where the problem is?

Thanks,
-- 
Attila Nagy                                   e-mail: Attila.Nagy at fsn.hu
Adopt a directory on our free software         phone: +3630 306 6758
server! http://www.fsn.hu/?f=brick


More information about the freebsd-current mailing list