NFS deadlock on 9.2-Beta1

Adrian Chadd adrian at freebsd.org
Sun Aug 25 11:42:34 UTC 2013


Hi,

Does -HEAD have this same problem?

If so, we should likely just revert the patch entirely from -HEAD and -9
until it's resolved.



-adrian



On 24 August 2013 23:51, Michael Tratz <michael at esosoft.com> wrote:

>
> On Aug 15, 2013, at 2:39 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
>
> > Michael Tratz wrote:
> >>
> >> On Jul 27, 2013, at 11:25 PM, Konstantin Belousov
> >> <kostikbel at gmail.com> wrote:
> >>
> >>> On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote:
> >>>> Let's assume the pid which started the deadlock is 14001 (it will
> >>>> be a different pid when we get the results, because the machine
> >>>> has been restarted)
> >>>>
> >>>> I type:
> >>>>
> >>>> show proc 14001
> >>>>
> >>>> I get the thread numbers from that output and type:
> >>>>
> >>>> show thread xxxxx
> >>>>
> >>>> for each one.
> >>>>
> >>>> And a trace for each thread with the command?
> >>>>
> >>>> tr xxxx
> >>>>
> >>>> Anything else I should try to get or do? Or is that not the data
> >>>> at all you are looking for?
> >>>>
> >>> Yes, everything else which is listed in the 'debugging deadlocks'
> >>> page
> >>> must be provided, otherwise the deadlock cannot be tracked.
> >>>
> >>> The investigator should be able to see the whole deadlock chain
> >>> (loop)
> >>> to make any useful advance.
> >>
> >> Ok, I have made some excellent progress in debugging the NFS
> >> deadlock.
> >>
> >> Rick! You are genius. :-) You found the right commit r250907 (dated
> >> May 22) is the definitely the problem.
> >>
> >> Here is how I did the testing: One machine received a kernel before
> >> r250907, the second machine received a kernel after r250907. Sure
> >> enough within a few hours the machine with r250907 went into the
> >> usual deadlock state. The machine without that commit kept on
> >> working fine. Then I went back to the latest revision (r253726), but
> >> leaving r250907 out. The machines have been running happy and rock
> >> solid without any deadlocks. I have expanded the testing to 3
> >> machines now and no reports of any issues.
> >>
> >> I guess now Konstantin has to figure out why that commit is causing
> >> the deadlock. Lovely! :-) I will get that information as soon as
> >> possible. I'm a little behind with normal work load, but I expect to
> >> have the data by Tuesday evening or Wednesday.
> >>
> > Have you been able to pass the debugging info on to Kostik?
> >
> > It would be really nice to get this fixed for FreeBSD9.2.
> >
> > Thanks for your help with this, rick
>
> Sorry Rick, I wasn't able to get you guys that info quickly enough. I
> thought I would have enough time, before my own wedding and honeymoon came
> along, but everything went a little crazy and stressful. I didn't think it
> would be this nuts. :-)
>
> I'm caught up with everything and from what I can see from the discussions
> is that we know now what the problem is.
>
> I can report that the machines which I have had without r250907 have been
> running without any problems for 27+ days.
>
> If you need me to test any new patches, please let me know. If I should
> test with the partial merge of r253927 I'll be happy to do so.
>
> Thanks,
>
> Michael
>
>
>
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>


More information about the freebsd-stable mailing list