9-STABLE -> NFS -> NetAPP:

Rick Macklem rmacklem at uoguelph.ca
Fri Feb 15 15:21:31 UTC 2013


Konstantin Belousov wrote:
> On Fri, Feb 15, 2013 at 08:44:43AM -0500, John Baldwin wrote:
> > On Thursday, February 14, 2013 10:05:56 pm Rick Macklem wrote:
> > > Marc Fournier wrote:
> > > > On 2013-02-13, at 3:54 PM, Rick Macklem <rmacklem at uoguelph.ca>
> > > > wrote:
> > > >
> > > > >>
> > > > > The pid that is in "T" state for the "ps auxlH".
> > > >
> > > > Different server, last kernel update on Jan 22nd, https process
> > > > this
> > > > time instead of du last time.
> > > >
> > > > I've attached:
> > > >
> > > > ps auxlH
> > > > ps auxlH of just the processes that are in TJ state (6 httpd
> > > > servers)
> > > > procstat output for each of the 6 process
> > > >
> > > >
> > > >
> > > >
> > > > They are included as attachments ??? if these don't make it
> > > > through, let
> > > > me know, just figured I'd try and keep it compact ...
> > > Well, I've looked at this call path a little closer:
> > > 16693 104135 httpd - mi_switch+0x186
> > thread_suspend_check+0x19f sleepq_catch_signals+0x1c5
> > >   sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763
> > clnt_reconnect_call+0xfb newnfs_request+0xadb
> > >   nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56
> > nfs_access+0x306 vn_open_cred+0x5a8
> > >   kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7
> > >
> > > I am probably way off, since I am not familiar with this stuff,
> > > but it
> > > seems to me that thread_suspend_check() should just return 0 for
> > > the
> > > case where stop_allowed == SIG_STOP_NOT_ALLOWED (TDF_SBDRY flag
> > > set)
> > > instead of sitting in the loop and doing a mi_switch(). I'm not
> > > even
> > > sure if it should call thread_suspend_check() for this case, but
> > > there
> > > are cases in thread_suspend_check() that I don't understand.
> > >
> > > Although I don't really understand thread_suspend_check(), I've
> > > attached
> > > a simple patch that might be a starting point for fixing this?
> > >
> > > I wouldn't recommend trying the patch until kib and/or jhb weigh
> > > in
> > > on whether it makes any sense.
> >
> > I think this is the right idea, but in HEAD with the sigdeferstop()
> > changes it
> > should just check for TDF_SBDRY instead of adding a new parameter. I
> > think
> > checking for TDF_SBDRY will work even in 9 (and will make the patch
> > smaller).
> > Also, I think this is only needed for stop signals. Other suspend
> > requests
> > will eventually resume the thread, it is only stop signals that can
> > cause the
> > thread to get stuck indefinitely (since it depends on the user
> > sending
> > SIGCONT).
> >
> > Marc, are you using SIGSTOP?
> >
> > Index: kern_thread.c
> > ===================================================================
> > --- kern_thread.c (revision 246122)
> > +++ kern_thread.c (working copy)
> > @@ -795,6 +795,17 @@ thread_suspend_check(int return_instead)
> >  			return (ERESTART);
> >
> >  		/*
> > + * Ignore suspend requests for stop signals if they
> > + * are deferred.
> > + */
> > + if (P_SHOULDSTOP(p) == P_STOPPED_SIG &&
> > + td->td_flags & TDF_SBDRY) {
> > + KASSERT(return_instead,
> > + ("TDF_SBDRY set for unsafe thread_suspend_check"));
> > + return (0);
> > + }
> > +
> > + /*
> >  		 * If the process is waiting for us to exit,
> >  		 * this thread should just suicide.
> >  		 * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE.
> 
> This looks correct.
Righto. Thanks jhb and kib for looking at this.

Btw John, PBDRY still gets set for sleeps in the sys/rpc code. However,
as far as I can tell, it just sets TDF_SBDRY when it is already set
and seems harmless. (Since this code is supposed to be generic and not
specific to NFS, maybe it should stay that way?)

Also, since PBDRY on the sleeps sets TDF_SBDRY, I think the above patch
is ok for stable/9 without your recent head patch.

Maybe Marc can test the above patch?

Thanks everyone for your help, rick


More information about the freebsd-stable mailing list