svn commit: r346507 - in projects/fuse2/sys: kern sys

Tue Apr 23 18:41:03 UTC 2019

On Tue, Apr 23, 2019 at 10:47:58AM -0600, Alan Somers wrote:
> On Tue, Apr 23, 2019 at 10:08 AM Konstantin Belousov
> <kostikbel at gmail.com> wrote:
> >
> > On Tue, Apr 23, 2019 at 08:50:21AM -0600, Alan Somers wrote:
> > > On Tue, Apr 23, 2019 at 6:47 AM Konstantin Belousov <kostikbel at gmail.com> wrote:
> > > >
> > > > On Mon, Apr 22, 2019 at 11:27:54AM -0600, Alan Somers wrote:
> > > > > On Mon, Apr 22, 2019 at 11:10 AM Konstantin Belousov
> > > > > <kostikbel at gmail.com> wrote:
> > > > > >
> > > > > > On Sun, Apr 21, 2019 at 11:04:06PM +0000, Alan Somers wrote:
> > > > > > > Author: asomers
> > > > > > > Date: Sun Apr 21 23:04:06 2019
> > > > > > > New Revision: 346507
> > > > > > > URL: https://svnweb.freebsd.org/changeset/base/346507
> > > > > > >
> > > > > > > Log:
> > > > > > >   fusefs: commit missing files from r346387
> > > > > > >
> > > > > > >   PR:         346357
> > > > > > >   Sponsored by:       The FreeBSD Foundation
> > > > > > >
> > > > > > > Modified:
> > > > > > >   projects/fuse2/sys/kern/kern_sig.c
> > > > > > >   projects/fuse2/sys/sys/signalvar.h
> > > > > > >
> > > > > > > Modified: projects/fuse2/sys/kern/kern_sig.c
> > > > > > > ==============================================================================
> > > > > > > --- projects/fuse2/sys/kern/kern_sig.c        Sun Apr 21 22:53:51 2019        (r346506)
> > > > > > > +++ projects/fuse2/sys/kern/kern_sig.c        Sun Apr 21 23:04:06 2019        (r346507)
> > > > > > > @@ -929,6 +929,23 @@ osigreturn(struct thread *td, struct osigreturn_args *
> > > > > > >  #endif
> > > > > > >  #endif /* COMPAT_43 */
> > > > > > >
> > > > > > > +/* Will this signal be fatal to the current process ? */
> > > > > > > +bool
> > > > > > > +sig_isfatal(struct proc *p, int sig)
> > > > > > > +{
> > > > > > > +     intptr_t act;
> > > > > > > +
> > > > > > > +     act = (intptr_t)p->p_sigacts->ps_sigact[_SIG_IDX(sig)];
> > > > > > > +     if ((intptr_t)SIG_DFL == act) {
> > > > > > > +             int prop;
> > > > > > This is against style.
> > > > > >
> > > > > > > +
> > > > > > > +             prop = sigprop(sig);
> > > > > > > +             return (0 != (prop & (SIGPROP_KILL | SIGPROP_CORE)));
> > > > > > > +     } else {
> > > > > > > +             return (false);
> > > > > > > +     }
> > > > > > > +}
> > > > > > Either your function lacks asserts about the owned locks, or it is racy.
> > > > >
> > > > > Good point.  I'll add lock assertions.
> > > > >
> > > > > >
> > > > > > Said that, is the comment above describes the intent ? The
> > > > > > implementation is too naive. Just for example, blocked signals with
> > > > > > default disposition do not result in the termination. On the other hand,
> > > > > > blocked ignored traps cause immediate termination.
> > > > >
> > > > > I'm using this in a context where the signal has already been
> > > > > delivered and caught.  So it can't be blocked, and it can't be a trap.
> > > > >
> > > > > >
> > > > > > Overall, I do not believe that it is possible to implement that without
> > > > > > duplicating the code of tdsendsignal() and trapsignal(), i.e. you should
> > > > > > additionally provide the originating context, besides a signal number.
> > > > >
> > > > > Do you still believe that even though it doesn't need to consider
> > > > > blocked signals and traps?
> > > > Generally yes, but lets see the specifics of the use.
> > > >
> > > > >
> > > > > >
> > > > > > What you are trying to do there ?
> > > > >
> > > > > It's in a situation where a syscall can't simply return EINTR or
> > > > > ERESTART.  I need to do some extra work to interrupt the syscall (ask
> > > > > the FUSE daemon to interrupt the associated FUSE operation).  If the
> > > > > signal will be fatal, then there's no point in waiting for the FUSE
> > > > > daemon to reply and I can simply return EINTR.  However, if the signal
> > > > > is not fatal, then I need to wait to see if the FUSE daemon to
> > > > > acknowledge the interrupt or else complete the operation like normal.
> > > > In what context does the interruption happen ?  Is it for a thread of the
> > > > fuse daemon, or normal process ?
> > >
> > > It's usually a normal process, but it could be another kernel thread,
> > > like aiod.  It will never be the fuse daemon.
> > >
> > > >
> > > > Can you point out the specific fragment of code where the function is used ?
> > >
> > > Here's the function that uses it:
> > > https://svnweb.freebsd.org/base/projects/fuse2/sys/fs/fuse/fuse_ipc.c?view=markup#l429
> >
> > This is even more confusing.
> >
> > I do not understand the kern_sigprocmask() calls around msleep(9) with
> > empty SIG_BLOCK mask, this looks like a nop.  What is the purpose ?
> 
> The mask is only empty the first time through.  If a signal is
> received, then it gets added to blockedset and the msleep is retried.
And then it breaks the normal signal delivery rules in multithreaded
programs.  More, if the pending signal is non-blockable, you would get it
reported over and over again, despite other higher numbered signals
present in the queue.

> 
> >
> > Then, calling cursig() has much more implications than just returning
> > the lowest pending signal number, for instance it redirects the signal
> > to debugger, or reacts to SIGSTOP. Do you have any resources owned at
> > this time, most likely a vnode lock ?
> 
> Probably.  Is there a better way to get the pending signal number?
No, because this operation is not architecturally correct.

> 
> >
> > What if another thread in the mt process terminates the process, and now
> > waits for this one to reach the safe boundary ?
> 
> If this thread gets a fatal signal, then it will return ASAP.  If it
> doesn't, then it will wait for the FUSE daemon to respond to the
> outstanding operation or for the operation to time out, which is what
> it always did prior to my recent commit.
But you do not know if the signal is 'fatal' until the signal is
delivered. Even if you write absolutely correct function that evaluates
the signal against the current disposition, you drop the ps_mtx, which
allows the disposition to change when the signal is actually delivered.

And, BTW, the function does not deal with traps correctly.

That said, see yet another reference to the intr/hard NFS options below.
Only user can decide if the mount for the filesystem should have intr
or hard semantic.  It might be that some filesystem daemons might provide
the preferrable mode.  Then I think that the analog of hard should be
the default.

> 
> >
> > Overall, it is not a filesystem business to even look into the signal
> > state of the process.  You should react to EINTR/ERESTART from msleep(9)
> > at most.
> >
> > NFS approach, although not very good, is definitely better than this.
> > NFS provides intr mount option vs. hard (default) mounts.   If intr
> > option is specified, then VOPs sleep with PCATCH, and pass EINTR/ERESTART
> > up, orphaning the request.  For hard mounts, PCATCH is not used.
> >
> > You would need to complete in-flight fuse daemon requests even if orphaned
> > by the requested thread due to the interruption.
> 
> I think the NFS approach works because NFS is stateless and its
> operations are idempotent.  But that's not true of FUSE.  For example,
> if a process writing to tape via sysutils/ltfs gets signaled, it needs
> to know whether or not its last write operation completed, and when.
NFS is not stateless, for instance second write over the same region of
the file is not safe. But NFS is an important example because it clearly
demostrates that it is impossible to provide a completely satisfying
policy at the kernel. It is up to the user to either make hard mount and
deal with hangs while server is unresponsive, or specify intr and then
get non-POSIX semantic for some operations.

> 
> >
> > There is very delicate TDF_SBDRY mechanism which defers stopping of the
> > sleeping threads which owns resources like vnode locks, and which controls
> > controls the reaction to the pending signal (selection of EINTR vs ERESTART,
> > which is important for advisory locks because they must not auto-restart).
> > It is used by NFS to avoid situation where the thread inside msleep(PCATCH)
> > is stopped by SIGSTOP, and then vnode lock cascade kills all filesystem
> > accesses.  See sigdeferstop()/sigallowstop().
> 
> Thanks for the tip.  I'll try those.  I'll defer stops before calling
> cursig.  Or do I need to defer stops whenever I might have a vnode
> lock?  If so, then I should probably just block stop signals.

You should enter deferred-stop mode if you are handling signals, i.e. use
interruptible sleeps msleep(PCATCH) or sx_xlock_sig(), and the sleep is
taken while you own some critical system resource, for filesystems most
common case is the vnode lock.  Any resource which indefinite long ownership
can cause problems for the system qualify.

SIGSTOP must not be blocked, and kern_sigprocmask() would not allow you to
block neither SIGSTOP nor SIGKILL.