NFS deadlock on 9.2-Beta1

Michael Tratz michael at esosoft.com
Mon Jul 29 20:44:50 UTC 2013


On Jul 27, 2013, at 11:25 PM, Konstantin Belousov <kostikbel at gmail.com> wrote:

> On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote:
>> Let's assume the pid which started the deadlock is 14001 (it will be a different pid when we get the results, because the machine has been restarted)
>> 
>> I type:
>> 
>> show proc 14001
>> 
>> I get the thread numbers from that output and type:
>> 
>> show thread xxxxx
>> 
>> for each one.
>> 
>> And a trace for each thread with the command?
>> 
>> tr xxxx
>> 
>> Anything else I should try to get or do? Or is that not the data at all you are looking for?
>> 
> Yes, everything else which is listed in the 'debugging deadlocks' page
> must be provided, otherwise the deadlock cannot be tracked.
> 
> The investigator should be able to see the whole deadlock chain (loop)
> to make any useful advance.

Ok, I have made some excellent progress in debugging the NFS deadlock.

Rick! You are genius. :-) You found the right commit r250907 (dated May 22) is the definitely the problem.

Here is how I did the testing: One machine received a kernel before r250907, the second machine received a kernel after r250907. Sure enough within a few hours the machine with r250907 went into the usual deadlock state. The machine without that commit kept on working fine. Then I went back to the latest revision (r253726), but leaving r250907 out. The machines have been running happy and rock solid without any deadlocks. I have expanded the testing to 3 machines now and no reports of any issues.

I guess now Konstantin has to figure out why that commit is causing the deadlock. Lovely! :-) I will get that information as soon as possible. I'm a little behind with normal work load, but I expect to have the data by Tuesday evening or Wednesday.

Thanks again!!

Michael



More information about the freebsd-stable mailing list