9-STABLE -> NFS -> NetAPP:

John Baldwin jhb at freebsd.org
Fri Feb 15 15:52:21 UTC 2013


On Friday, February 15, 2013 10:21:11 am Rick Macklem wrote:
> Konstantin Belousov wrote:
> > On Fri, Feb 15, 2013 at 08:44:43AM -0500, John Baldwin wrote:
> > > On Thursday, February 14, 2013 10:05:56 pm Rick Macklem wrote:
> > > > Marc Fournier wrote:
> > > > > On 2013-02-13, at 3:54 PM, Rick Macklem <rmacklem at uoguelph.ca>
> > > > > wrote:
> > > > >
> > > > > >>
> > > > > > The pid that is in "T" state for the "ps auxlH".
> > > > >
> > > > > Different server, last kernel update on Jan 22nd, https process
> > > > > this
> > > > > time instead of du last time.
> > > > >
> > > > > I've attached:
> > > > >
> > > > > ps auxlH
> > > > > ps auxlH of just the processes that are in TJ state (6 httpd
> > > > > servers)
> > > > > procstat output for each of the 6 process
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > They are included as attachments ??? if these don't make it
> > > > > through, let
> > > > > me know, just figured I'd try and keep it compact ...
> > > > Well, I've looked at this call path a little closer:
> > > > 16693 104135 httpd - mi_switch+0x186
> > > thread_suspend_check+0x19f sleepq_catch_signals+0x1c5
> > > >   sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763
> > > clnt_reconnect_call+0xfb newnfs_request+0xadb
> > > >   nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56
> > > nfs_access+0x306 vn_open_cred+0x5a8
> > > >   kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7
> > > >
> > > > I am probably way off, since I am not familiar with this stuff,
> > > > but it
> > > > seems to me that thread_suspend_check() should just return 0 for
> > > > the
> > > > case where stop_allowed == SIG_STOP_NOT_ALLOWED (TDF_SBDRY flag
> > > > set)
> > > > instead of sitting in the loop and doing a mi_switch(). I'm not
> > > > even
> > > > sure if it should call thread_suspend_check() for this case, but
> > > > there
> > > > are cases in thread_suspend_check() that I don't understand.
> > > >
> > > > Although I don't really understand thread_suspend_check(), I've
> > > > attached
> > > > a simple patch that might be a starting point for fixing this?
> > > >
> > > > I wouldn't recommend trying the patch until kib and/or jhb weigh
> > > > in
> > > > on whether it makes any sense.
> > >
> > > I think this is the right idea, but in HEAD with the sigdeferstop()
> > > changes it
> > > should just check for TDF_SBDRY instead of adding a new parameter. I
> > > think
> > > checking for TDF_SBDRY will work even in 9 (and will make the patch
> > > smaller).
> > > Also, I think this is only needed for stop signals. Other suspend
> > > requests
> > > will eventually resume the thread, it is only stop signals that can
> > > cause the
> > > thread to get stuck indefinitely (since it depends on the user
> > > sending
> > > SIGCONT).
> > >
> > > Marc, are you using SIGSTOP?
> > >
> > > Index: kern_thread.c
> > > ===================================================================
> > > --- kern_thread.c (revision 246122)
> > > +++ kern_thread.c (working copy)
> > > @@ -795,6 +795,17 @@ thread_suspend_check(int return_instead)
> > >  			return (ERESTART);
> > >
> > >  		/*
> > > + * Ignore suspend requests for stop signals if they
> > > + * are deferred.
> > > + */
> > > + if (P_SHOULDSTOP(p) == P_STOPPED_SIG &&
> > > + td->td_flags & TDF_SBDRY) {
> > > + KASSERT(return_instead,
> > > + ("TDF_SBDRY set for unsafe thread_suspend_check"));
> > > + return (0);
> > > + }
> > > +
> > > + /*
> > >  		 * If the process is waiting for us to exit,
> > >  		 * this thread should just suicide.
> > >  		 * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE.
> > 
> > This looks correct.
> Righto. Thanks jhb and kib for looking at this.
> 
> Btw John, PBDRY still gets set for sleeps in the sys/rpc code. However,
> as far as I can tell, it just sets TDF_SBDRY when it is already set
> and seems harmless. (Since this code is supposed to be generic and not
> specific to NFS, maybe it should stay that way?)

In HEAD PBDRY is now a nop and the existing sigdeferstop() stuff should
cover the calls in sys/rpc.

> Also, since PBDRY on the sleeps sets TDF_SBDRY, I think the above patch
> is ok for stable/9 without your recent head patch.

Yep, exactly.

> Thanks everyone for your help, rick

Thanks for your debugging!

-- 
John Baldwin


More information about the freebsd-stable mailing list