arla-devel port for FreeBSD (was: Patches to get Arla running on FreeBSD 8-CURRENT)

Tomas Olsson tol at stacken.kth.se
Fri Mar 7 06:45:17 UTC 2008


On Thu, 2008-03-06 at 20:49 -0600, Alec Kloss wrote:
> Anyway, Tomas, or others, do you have any hints for me about how
> best to start diagnosing and maybe fixing issues?  The most
> repeatable way I've found to get bad behavior is to rsync -a
> /usr/src and /usr/obj into AFS.  After 30 seconds or so of this,
> I'll start getting messages like these:
> 
>  lockmgr: thread 0xc6970840 unlocking unheld lock
>  lockmgr: thread 0xc6970840 unlocking unheld lock
>  lockmgr: thread 0xc6970840 unlocking unheld lock
>  lockmgr: thread 0xc6970840 unlocking unheld lock
>  lockmgr: thread 0xc6970840 unlocking unheld lock
> 
> on the console.  Eventually, rsync will block and generally things
> will decay.  Overnight, I'm going to script the console while
> attempting this with nnpfsdeb almost-all set.  This is, of course,
> a lot slower than arla normally runs, but I'm hoping someone may be
> able to see the source of the trouble.  I'll post the console
> somewhere tomorrow.  
> 
> Anyway, any hints about debugging arla would be welcome.
> 
Some random thoughts:
 * If you don't have it yet, get a debug kernel with full vfs sanity
checking etc.
 * Set a breakpoint (or panic) at the lockmgr printf and inspect stack
trace and other live threads.
 * See if you can run into similar problems using arla's tests, if
you're lucky there will be a faster way to trigger it.
 * Perhaps you can cut down on almost-all. Not sure how much. Of course,
there's always the risk that timing changes with nnpfsdebug on.
 * try arlad --tracefile=foo.trace (in the cache dir) and cat it to
nnpfs/readtrace.py to decipher it when you're done. It's fast and gives
a complete log of arlad-nnpfs communication.

Hope this helps
		/t



More information about the freebsd-afs mailing list