NFS deadlock on 9.2-Beta1
Michael Tratz
michael at esosoft.com
Mon Jul 29 20:44:50 UTC 2013
On Jul 27, 2013, at 11:25 PM, Konstantin Belousov <kostikbel at gmail.com> wrote:
> On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote:
>> Let's assume the pid which started the deadlock is 14001 (it will be a different pid when we get the results, because the machine has been restarted)
>>
>> I type:
>>
>> show proc 14001
>>
>> I get the thread numbers from that output and type:
>>
>> show thread xxxxx
>>
>> for each one.
>>
>> And a trace for each thread with the command?
>>
>> tr xxxx
>>
>> Anything else I should try to get or do? Or is that not the data at all you are looking for?
>>
> Yes, everything else which is listed in the 'debugging deadlocks' page
> must be provided, otherwise the deadlock cannot be tracked.
>
> The investigator should be able to see the whole deadlock chain (loop)
> to make any useful advance.
Ok, I have made some excellent progress in debugging the NFS deadlock.
Rick! You are genius. :-) You found the right commit r250907 (dated May 22) is the definitely the problem.
Here is how I did the testing: One machine received a kernel before r250907, the second machine received a kernel after r250907. Sure enough within a few hours the machine with r250907 went into the usual deadlock state. The machine without that commit kept on working fine. Then I went back to the latest revision (r253726), but leaving r250907 out. The machines have been running happy and rock solid without any deadlocks. I have expanded the testing to 3 machines now and no reports of any issues.
I guess now Konstantin has to figure out why that commit is causing the deadlock. Lovely! :-) I will get that information as soon as possible. I'm a little behind with normal work load, but I expect to have the data by Tuesday evening or Wednesday.
Thanks again!!
Michael
More information about the freebsd-stable
mailing list